Bug 739122 – glib not handling -1 return ("no limit") from sysconf (_SC_THREAD_STACK_MIN)

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 739122 - glib not handling -1 return ("no limit") from sysconf (_SC_THREAD_STACK_MIN)


Summary:	glib not handling -1 return ("no limit") from sysconf (_SC_THREAD_STACK_MIN)


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	gthread
Version:	2.42.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2014-10-24 12:29 UTC by Mattias Ellert
Modified:	2015-05-29 01:55 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch that checks that sysconf return value is positive before using it. (625 bytes, patch) 2014-10-24 12:29 UTC, Mattias Ellert	committed	Details \| Review

Description Mattias Ellert 2014-10-24 12:29:53 UTC

Created attachment 289269 [details] [review]
Patch that checks that sysconf return value is positive before using it.

The bug is due to that when the return value of sysconf (a long) is compared to the variable stack_size (an unsigned long) the values have to be converted to a common type in order to do the comparison. This common type is chosen by the compiler to be unsigned long - which is correct behaviour according to the C language standard. When sysconf returns -1 this is therefore converted to ULONG_MAX in order to do the comparison and this is of course then always considered to be the higher value. So the stack_size allocated for the thread becomes -1 bytes. When the thread starts and allocates memory in this non-existing stack it overwrites memory already in use by other threads.

The patch checks that the value returned by sysconf is positive before comparing it to the variable stack_size.

Bug forwarded from Debian BTS:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=731547

The bug affects the Debian GNU/Hurd port, since sysconf (_SC_THREAD_STACK_MIN) returns -1 on GNU/Hurd.

Comment 1 Simon McVittie 2015-05-04 12:57:19 UTC

(In reply to Mattias Ellert from comment #0)
> The bug

What is the bug in question? GLib maintainers shouldn't need to rummage in the Debian BTS to find out why this change was necessary.

From context, it seems the answer might be something like this:

"""
If a thread is allocated with a specified stack size by g_thread_create_full(), the actual stack size that should be used is sysconf (_SC_THREAD_STACK_MIN) or the specified size, whichever is larger.

However, GNU/Hurd returns -1 when asked for its minimum stack size; the sysconf API defines this to mean "there is no definite limit".

Because of C signed/unsigned arithmetic rules, this -1 is treated as the maximum value of type 'unsigned long', so the MAX() macro returns ULONG_MAX and the thread's required stack size is set to an impossibly large value which cannot be successfully allocated.
"""

Some questions that that raises:

Is it really true that GNU/Hurd has no minimum stack size for threads? You're not going to get very far with no stack: wouldn't it be more realistic to use some reasonable value that supports nice-to-have features like calling a trivial function?

pthread_attr_setstacksize() takes a size_t argument; ULONG_MAX == SIZE_MAX on all platforms currently supported by GNU/Hurd, AFAICS. Why does GNU/Hurd's libc respond to a requested stack size of ULONG_MAX by crashing, and not by (for instance) trying to allocate ULONG_MAX bytes of address-space, failing to do so, and either using a more sensible stack size or reporting ENOMEM instead?

Comment 2 Mattias Ellert 2015-05-05 10:57:50 UTC

(In reply to Simon McVittie from comment #1)
> (In reply to Mattias Ellert from comment #0)
> > The bug
> 
> What is the bug in question? GLib maintainers shouldn't need to rummage in
> the Debian BTS to find out why this change was necessary.

All the relevant parts of the Debian BTS bug report was forwarded to this bugreport. The parts complaining about "glib crashes for unknown reason here is a stack trace I don't understand" was no longer relevant once the bug was fully understood and the line of code responsible for the failure was identified. The rather vague original complaint in BTS was omitted because repeating it would only obscure the issue.

> From context, it seems the answer might be something like this:
> 
> """
> If a thread is allocated with a specified stack size by
> g_thread_create_full(), the actual stack size that should be used is sysconf
> (_SC_THREAD_STACK_MIN) or the specified size, whichever is larger.
> 
> However, GNU/Hurd returns -1 when asked for its minimum stack size; the
> sysconf API defines this to mean "there is no definite limit".
> 
> Because of C signed/unsigned arithmetic rules, this -1 is treated as the
> maximum value of type 'unsigned long', so the MAX() macro returns ULONG_MAX
> and the thread's required stack size is set to an impossibly large value
> which cannot be successfully allocated.
> """
> 
> Some questions that that raises:
> 
> Is it really true that GNU/Hurd has no minimum stack size for threads?
> You're not going to get very far with no stack: wouldn't it be more
> realistic to use some reasonable value that supports nice-to-have features
> like calling a trivial function?
> 
> pthread_attr_setstacksize() takes a size_t argument; ULONG_MAX == SIZE_MAX
> on all platforms currently supported by GNU/Hurd, AFAICS. Why does
> GNU/Hurd's libc respond to a requested stack size of ULONG_MAX by crashing,
> and not by (for instance) trying to allocate ULONG_MAX bytes of
> address-space, failing to do so, and either using a more sensible stack size
> or reporting ENOMEM instead?

GNU/Hurd has a default stack size that is used if you do not request a specific size. But there is no lower limit, you can set the stack size smaller than the default. The default is not the minimum. The default is not 0.

When the memory is allocated for a new thread, this memory is not only the memory used for the thread's stack, but it includes some additional memory. When the size of this memory is calculated, the size of the additional memory is added to the requested size of the stack. If the requested size of the stack is ULONG_MAX, adding the size of the additional memory needed for the thread will overflow, and the result will be the size of the additional memory minus one byte. This size is then allocated for the thread. This means that the size allocated for the stack is -1 byte. When the thread starts writing to its stack it will start writing to unallocated memory.

Comment 3 Simon McVittie 2015-05-05 17:54:26 UTC

The only caller of this function with a nonzero stack_size is the deprecated g_thread_create_full(), which has a stack_size argument and passes it on directly. If the GLib maintainers believe that it is not useful for applications to specify a stack size (which is presumably the case, since there doesn't seem to be a non-deprecated API to do so), perhaps it would be better for g_system_thread_new() to ignore the stack_size altogether, resulting in the OS default always being used? Then it wouldn't need to call sysconf() at all.

(In reply to Mattias Ellert from comment #2)
> When the memory is allocated for a new thread, this memory is not only the
> memory used for the thread's stack, but it includes some additional memory.
> When the size of this memory is calculated, the size of the additional
> memory is added to the requested size of the stack. If the requested size of
> the stack is ULONG_MAX, adding the size of the additional memory needed for
> the thread will overflow

This seems like a lower-layer (libc?) bug, independent of what is going on in GLib.

Comment 4 Mattias Ellert 2015-05-05 20:26:54 UTC

(In reply to Simon McVittie from comment #3)
> The only caller of this function with a nonzero stack_size is the deprecated
> g_thread_create_full(), which has a stack_size argument and passes it on
> directly. If the GLib maintainers believe that it is not useful for
> applications to specify a stack size (which is presumably the case, since
> there doesn't seem to be a non-deprecated API to do so), perhaps it would be
> better for g_system_thread_new() to ignore the stack_size altogether,
> resulting in the OS default always being used? Then it wouldn't need to call
> sysconf() at all.

The code that triggered this failure and was the reason for filing the bug report very deliberately uses the version of Glib::Thread::create() from glibmm that allows setting the stack size. The glibmm function in turn calls the glib function. There are comments in the code saying "It seems like MacOS has very small stack per thread by default. So it's safer to have bigger one defined."

So I do not think ignoring the stack size argument is the right thing to do.

> (In reply to Mattias Ellert from comment #2)
> > When the memory is allocated for a new thread, this memory is not only the
> > memory used for the thread's stack, but it includes some additional memory.
> > When the size of this memory is calculated, the size of the additional
> > memory is added to the requested size of the stack. If the requested size of
> > the stack is ULONG_MAX, adding the size of the additional memory needed for
> > the thread will overflow
> 
> This seems like a lower-layer (libc?) bug, independent of what is going on
> in GLib.

You are arguing that libc is failing in the wrong way when given unreasonable input arguments when creating a new thread. You expect it to fail with ENOMEM and instead it segfaults. Even if libc did have some protection for this kind of ridiculously large stack size requests and returned an error instead of segfaulting, you would still have to change the glib code not to send inputs to libc that triggers this error code. This change would be the same change that would be needed in order not to trigger the segfault without this protection in libc.

Comment 5 Mattias Ellert 2015-05-13 08:25:05 UTC

I am surprised by the strong opposition against fixing this bug. I really do not understand it. According to the POSIX standard a return value of -1 from sysconf is a valid result with a well defined meaning. Any code that wants to be POSIX compliant must do a reasonable thing when it encounters a -1 return value from sysconf. Requesting a thread stack size of 4 GiB on a 32 bit system like the current glib2 code does in not a reasonable thing.

Comment 6 Simon McVittie 2015-05-13 10:31:42 UTC

(In reply to Mattias Ellert from comment #5)
> I am surprised by the strong opposition against fixing this bug.

To be clear, I am not opposed to fixing this bug. However, I am not an upstream GLib maintainer and cannot approve your patch. If an actual GLib maintainer could opine on this, that would of course be very useful.

In the absence of upstream review for this corner-case, making Hurd claim to have a minimum stack size seems like a plausible way to avoid the issue. It seems like quite a common pattern that Hurd sets implementation-defined values in a way that is technically valid, but far from pragmatic - we've been here before with PATH_MAX, which Linux defines at an arbitrary semi-large value (4096) even though there is no hard limit, but Hurd pointedly doesn't define.

It seems disingenuous to say there is no minimum stack size at all, given that any non-trivial function being run in a thread is going to want a nested function call and/or a local variable.

With Debian hat on, I'm somewhat wary of applying patches that touch platform-independent code if they have little chance of getting applied upstream; and if we do, I want to be completely sure why they're necessary, which is one reason why I'm taking a "devil's advocate" approach.

> Any code that wants to be POSIX compliant must do a reasonable thing 
> when it encounters a -1 return value from sysconf.

I don't think "work on every possible POSIX-compliant platform" is actually one of GLib's goals. It aims to work on platforms that are widely-used in practice, whether they are broadly POSIX-compliant (Linux, *BSD, Mac OS X) or not at all (Windows).

Comment 7 Matthias Clasen 2015-05-14 03:02:22 UTC

Review of attachment 289269 [details] [review]:

Looks ok to me.

Comment 8 Matthias Clasen 2015-05-14 03:02:24 UTC

Review of attachment 289269 [details] [review]:

Looks ok to me.

Comment 9 Mattias Ellert 2015-05-28 05:04:48 UTC

What happens next, and when?

Two weeks ago the patch attached to this bug report was marked "accepted, commit now". I understood this to mean that this bug would finally be fixed, maybe not instantly but within a couple of days or a week. Now two weeks has passed.

Have I misunderstood the meaning of "accepted, commit now"?