After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 667651 - creating a thread = fatal error if under a scheduler other than SCHED_OTHER
creating a thread = fatal error if under a scheduler other than SCHED_OTHER
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: gthread
2.30.x
Other Linux
: Normal normal
: ---
Assigned To: Simon McVittie
gtkdev
Depends on:
Blocks: 667866
 
 
Reported: 2012-01-10 18:51 UTC by Simon McVittie
Modified: 2013-04-24 17:27 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[1/2] Do not attempt to set new POSIX threads' priorities (3.05 KB, patch)
2012-01-13 13:10 UTC, Simon McVittie
none Details | Review
[2/2] Add regression test for GNOME#667651 (3.89 KB, patch)
2012-01-13 13:10 UTC, Simon McVittie
none Details | Review
[2/2 v2] Add regression test for GNOME#667651 (6.21 KB, patch)
2012-01-16 13:17 UTC, Simon McVittie
none Details | Review
[2/2 v3] Add regresssion test for GNOME#667651 (6.85 KB, patch)
2012-01-23 18:29 UTC, Simon McVittie
reviewed Details | Review

Description Simon McVittie 2012-01-10 18:51:01 UTC
(This is related to Bug #501237, but more serious, since it's a crash rather than a missing abstraction.)

Under at least GLib 2.12 to 2.28, and possibly other versions (probably 2.30 but not 2.31), if an application sets SCHED_RR or SCHED_FIFO (realtime scheduling) and then starts a thread, the call to pthread_attr_setschedparam() fails, and the application aborts via g_error(). (For those with access to the appropriate bug tracker, this is NB#47178.) The same problem was recently reported in Debian, <http://bugs.debian.org/637654>.

For instance, this was a problem for telepathy-stream-engine on Maemo, which wants realtime scheduling (for maximum quality of voice/video calls), and uses GStreamer (which starts many threads).

Maemo's GLib has had an undocumented patch since 2006 [1] which addressed this; I've only recently been able to dig out the details of why it was applied in the first place. However, that patch is incomplete: it still doesn't work when coming from a SCHED_IDLE or SCHED_BATCH (ultra-low-priority) thread. (For those with access to the appropriate bug tracker, this is NB#296256.)

The SCHED_IDLE/SCHED_BATCH thing is arguably a bug in glibc (<http://sourceware.org/bugzilla/show_bug.cgi?id=7013>) but good luck getting that fixed...

For comparison, I've been directed to how Qt does this [2], which is considerably more elaborate (hopefully more elaborate than we need...) 

I'll try to put together some test cases which illustrate the problem, and confirm whether 2.30 and/or 2.31 still have this bug.

[1] Maemo patch from Jussi Laako: <https://gitorious.org/gnome-essentials/glib/blobs/maemo/debian/patches/50-gthread.patch>
[2] <http://qt.gitorious.org/qt/qt/blobs/4.7/src/corelib/thread/qthread_unix.cpp>
[3] <https://qt.gitorious.org/qt/qt/commit/f3405a516ac30fc7dee1021cc6f34ca03dd08d97>
Comment 1 Simon McVittie 2012-01-13 13:08:50 UTC
This bug also exists in the 2.30 stable branch.

It turns out that the priority-setting in g_thread_create has never worked as designed on POSIX-compliant systems. Instead, new threads always inherit their parent thread's scheduling policy and sched_priority. That actually seems to me to be more desirable than how it was designed, to be honest - as designed, each new thread would have gone back from real-time or idle priority to normal priority, which would basically mean that no GStreamer app could usefully have real-time privileges.

Given that this is how GLib has always behaved, I think it's safe to say that every GLib application works fine with threads that inherit their parent's priority. So, I'd be inclined to carry over the removal of thread priorities from the 2.32 branch to 2.30, by deleting the broken code that pretends to make priority-setting happen.

----

Further explanation of why this happens, please check my reasoning:

Scheduler priority in POSIX consists of two parts, a *policy* and a *parameter*. The policy can be normal scheduling (SCHED_OTHER), real-time scheduling (SCHED_RR or SCHED_FIFO), or an OS-specific extension. Linux has two OS-specific extensions, SCHED_BATCH and SCHED_IDLE, both of which are lower priority than SCHED_OTHER.

The parameter is a struct whose contents can be interpreted differently for each policy. POSIX requires it to contain at least sched_priority, an integer; under Linux+glibc, it contains that integer and nothing else.

The contents of the struct, and in particular the allowed values for sched_priority, are allowed to vary from policy to policy. In Linux, SCHED_RR and SCHED_FIFO have priorities ranging from 1 to 99 (where larger numbers are more urgent), while SCHED_OTHER, SCHED_BATCH and SCHED_IDLE always have priority 0.

When unpatched GLib <= 2.30 creates a thread, it initializes a fresh pthread_attr_t and leaves its policy set to the default (which is SCHED_OTHER). It then sets the sched_priority according to a lookup table, g_thread_priority_map.

Now, g_thread_priority_map is a map from GLib constants like G_THREAD_PRIORITY_LOW to OS-specific sched_priority values. On startup (actually, in g_thread_init()), GLib fills it like this:

    {
      G_THREAD_PRIORITY_LOW => (the lowest SCHED_OTHER priority),
      G_THREAD_PRIORITY_NORMAL => (the priority of the thread
                                   that first called g_thread_init()),
      G_THREAD_PRIORITY_HIGH => (2/3 of the way between NORMAL and
                                 URGENT),
      G_THREAD_PRIORITY_URGENT => (the highest SCHED_OTHER priority),
    }

If the main thread had a real-time policy at the time it called g_thread_init(), this bug happens: the sched_priority corresponding to NORMAL will be more than zero. This isn't appropriate for the scheduling policy that GLib will choose (which was always SCHED_OTHER), while HIGH priority is a mixture, so it isn't necessarily appropriate either. This led to a crash. For instance, if telepathy-stream-engine started up with (say) real-time priority 42, we'd get this nonsensical lookup table:

    { LOW => 0, NORMAL => 42, HIGH => 14, URGENT => 0 }

In practice, everyone uses G_THREAD_PRIORITY_NORMAL, so GLib would usually try to use (SCHED_OTHER, sched_priority = 42) which isn't valid. GLib sees the EINVAL result, isn't prepared to deal with it, and aborts.

To fix that bug, and also get all of telepathy-stream-engine's threads running under the real-time policy (it uses GStreamer, so media handling is spread between threads), Jussi's patch changed the logic so the new thread would get the policy from the parent thread, set the sched_priority from the lookup table, *truncate out-of-range values to comply with that policy* (which is what prevented the EINVAL error), and use that (priority, policy) tuple. That patch appeared to work, so it was applied to the Maemo package and everyone got on with their lives.

However, there were still three issues:

1) realtime processes' lookup table still makes no sense, with priorities
   that aren't even in the right order

2) as mentioned in Comment #0, pthread_attr_setschedpolicy wrongly
   doesn't allow SCHED_IDLE (or SCHED_BATCH for that matter), failing with
   EINVAL

3) if you read the pthread_create source code in glibc, it turns out that
   using pthread_attr_setschedpolicy() and pthread_attr_setschedparam()
   has absolutely no effect (other than potentially failing with EINVAL)
   unless you also call pthread_attr_setinheritsched(PTHREAD_EXPLICIT_SCHED)
   to turn off scheduling inheritance. So in fact, we didn't need to
   go to the effort of implementing explicit policy inheritance - the threads
   would all have inherited the main thread's policy and sched_priority
   anyway!

----

(FYI, nice values as set by nice(1) and setpriority(3) are something
different - it appears they only apply to whole processes, not to
individual threads. GLib never changes the nice value.)
Comment 2 Simon McVittie 2012-01-13 13:10:21 UTC
Created attachment 205187 [details] [review]
[1/2] Do not attempt to set new POSIX threads' priorities

It turns out that GLib has never actually been able to set new threads'
priorities on POSIX systems, because POSIX specifies that new threads
inherit their parent's priority, ignoring the scheduling policy and
parameter stored in the pthread_attr_t, unless you also set
PTHREAD_EXPLICIT_SCHED in the pthread_attr_t (which we didn't).
The default behaviour is PTHREAD_INHERIT_SCHED.

In the absence of actually being able to set a priority, the only
thing we could possibly achieve by calling pthread_attr_setschedparam
was to crash when we set an invalid sched_priority and got EINVAL back.

Unfortunately, in the presence of scheduling policies with different
allowed ranges for sched_priority (i.e. at least modern Linux), the
g_thread_priority_map lookup table was populated inconsistently:
the LOW and URGENT sched_priority were set to something appropriate
for SCHED_OTHER (i.e. always 0 on Linux), while the NORMAL sched_priority
was set to something appropriate for the main thread's scheduling
priority at the time it called g_thread_init() (i.e. always nonzero for
real-time processes on Linux), and the HIGH sched_priority was a mixture
of the two. For instance, in a real-time process which had SCHED_RR
with sched_priority = 42 at the time it called g_thread_init(),
the lookup table would be:

    { LOW => 0, NORMAL => 42, HIGH => 14, URGENT => 0 }

which is clearly nonsense.

Since in practice GLib on POSIX has always allowed the default
policy/priority inheritance to take place, we can safely conclude
that every currently-working POSIX application is happy with this,
and drop the code that pretends to behave otherwise.

Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667651
Bug-Debian: http://bugs.debian.org/637654
Bug-NB: NB#47178
Bug-NB: NB#296256
Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Comment 3 Simon McVittie 2012-01-13 13:10:48 UTC
Created attachment 205188 [details] [review]
[2/2] Add regression test for GNOME#667651

On current Linux, this is only useful when run as root (strictly
speaking: with CAP_SYS_NICE), but running it with insufficient privilege
is harmless.

Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667651
Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Comment 4 Sami Rosendahl 2012-01-16 09:11:49 UTC
Review of attachment 205188 [details] [review]:

- The test should also cover SCHED_BATCH and SCHED_IDLE policies when available.
- The test verifies only policy inheritance, not priority inheritance.

::: gthread/tests/realtime.c
@@ +55,3 @@
+  GThread *thread;
+  guint i;
+  int min_prio;

Local variables i and min_prio are not used in the function.
Comment 5 Simon McVittie 2012-01-16 13:17:07 UTC
Created attachment 205359 [details] [review]
[2/2 v2] Add regression test for GNOME#667651

The actually problematic cases (SCHED_RR and SCHED_FIFO) usually need
CAP_SYS_NICE on normal Linux, but we can at least run the test for the
other cases. Run manually, e.g. with "sudo gthread/tests/sched",
to test the situation that would actually crash.

If RLIMIT_RTPRIO and RLIMIT_RTTIME are sufficiently high, it may be
possible to test SCHED_RR and SCHED_FIFO as non-root.

Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667651
Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>

---

(In reply to comment #4)
> Review of attachment 205188 [details] [review]:
> 
> - The test should also cover SCHED_BATCH and SCHED_IDLE policies when
> available.

A fair point - although that doesn't actually crash on Linux anyway, because SCHED_BATCH and SCHED_IDLE have the same range of valid priorities as SCHED_OTHER (0 to 0).

> - The test verifies only policy inheritance, not priority inheritance.

Not true: this:

    pthread_getschedparam (pthread_self (), &policy, &param)
    ...
    g_assert_cmpint (sched_get_priority_min (policy), ==, param.sched_priority)

is run in both main and child threads, and asserts that the priority is what the main thread said it should be.

In an attempt to make this clearer, I've swapped the left and right sides of the ==, and used the median priority (min+max)/2 instead of the minimum priority.

> ::: gthread/tests/realtime.c
> @@ +55,3 @@
> +  guint i;
> +  int min_prio;
> 
> Local variables i and min_prio are not used in the function.

Dropped.

I've also switched the test to GTester, which I'd previously avoided, thinking that g_test_init() called g_thread_init() - it doesn't, so the bug can be reproduced even under GTester.
Comment 6 Sami Rosendahl 2012-01-18 11:21:18 UTC
Review of attachment 205359 [details] [review]:

Looks good to me, my comments for [2/2] have been addressed. (NB: Visual review only haven't yet executed the test)
Comment 7 Sami Rosendahl 2012-01-18 11:31:09 UTC
(In reply to comment #5)
> > - The test verifies only policy inheritance, not priority inheritance.
> 
> Not true: this:
> 
>     pthread_getschedparam (pthread_self (), &policy, &param)
>     ...
>     g_assert_cmpint (sched_get_priority_min (policy), ==, param.sched_priority)
> 
> is run in both main and child threads, and asserts that the priority is what
> the main thread said it should be.
> 
> In an attempt to make this clearer, I've swapped the left and right sides of
> the ==, and used the median priority (min+max)/2 instead of the minimum
> priority.

I agree that the priority inheritance was tested for the minimum priority value. The median is definitively better than the previously used minimum value, but it is still a single value and a problem in inheriting e.g. the maximum value would go unnoticed by the test. What about testing all three min, median and max values for each policy?
Comment 8 Simon McVittie 2012-01-23 18:29:01 UTC
Created attachment 205914 [details] [review]
[2/2 v3] Add regresssion test for GNOME#667651

The actually problematic cases (SCHED_RR and SCHED_FIFO) usually need
CAP_SYS_NICE on normal Linux, but we can at least run the test for the
other cases. Run manually, e.g. with "sudo gthread/tests/sched",
to test the situation that would actually crash.

If RLIMIT_RTPRIO and RLIMIT_RTTIME are sufficiently high, it may be
possible to test SCHED_RR and SCHED_FIFO as non-root.

Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667651
Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>

---

This is becoming more of a test of the kernel than a test of GLib (we're not really testing that GLib applies inherited priority - because it doesn't - but only that it gets out of the way and allows it to be inherited), but it does demonstrate that we're definitely inheriting policy and priority.
Comment 9 Sami Rosendahl 2012-02-06 11:12:05 UTC
Review of attachment 205914 [details] [review]:

Looks good to me.
Comment 10 Simon McVittie 2013-04-24 17:27:08 UTC
2.30 is pretty old now, and I doubt anyone is going to apply these patches, so, resolving as obsolete. Fixed in 2.32.