After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 657891 - spawn-multithreaded test hangs occasionally on recent Linux kernels/glibc
spawn-multithreaded test hangs occasionally on recent Linux kernels/glibc
Status: RESOLVED NOTGNOME
Product: glib
Classification: Platform
Component: mainloop
unspecified
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2011-09-01 03:09 UTC by Allison Karlitskaya (desrt)
Modified: 2011-09-16 21:23 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
proof of libc/kernel bug (723 bytes, text/plain)
2011-09-01 16:16 UTC, Allison Karlitskaya (desrt)
Details

Description Allison Karlitskaya (desrt) 2011-09-01 03:09:48 UTC
I'm on Fedora 15.  Nothing special here.

Occasionally spawn-multithreaded hangs in a rather fantastic way.  Try this:


~/code/glib/gthread/tests$ while true; do ./spawn-multithreaded ; done
/gthread/spawn-sync: OK
/gthread/spawn-async: OK
/gthread/spawn-sync: OK
/gthread/spawn-async: OK
/gthread/spawn-sync: OK

eventually you'll get a

/gthread/spawn-sync: 

or sometimes with async.

Looking at 'ps', you see:

desrt    10546  0.1  0.0 3905000 2188 pts/8    Sl+  23:05   0:00 /home/desrt/code/glib/gthread/tests/.libs/lt-spawn-multithreaded
desrt    10722 99.3  0.1 4437740 11760 pts/8   R+   23:05   2:03 /home/desrt/code/glib/gthread/tests/.libs/lt-spawn-multithreaded

with the running process eating 100% of CPU.  The 4 gigs of virtual memory is pretty impressive too.

Attempting to attach gdb results in gdb growing to about a gig in size, and then starting to consume 100% CPU itself.  No help there.

I thought that maybe strace would help, but when I run it under strace, the crash doesn't seem to happen.
Comment 1 Allison Karlitskaya (desrt) 2011-09-01 03:10:50 UTC
When it happens in the async case, you often also see a lot of this:

24189 pts/8    Sl+    0:00 /home/desrt/code/glib/gthread/tests/.libs/lt-spawn-multithreaded
24562 pts/8    R+     0:27 /home/desrt/code/glib/gthread/tests/.libs/lt-spawn-multithreaded
24565 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24566 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24567 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24568 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24572 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24573 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24575 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24576 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24578 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24579 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24582 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
24589 pts/8    Z+     0:00 [test-spawn-echo] <defunct>
Comment 2 Colin Walters 2011-09-01 15:16:47 UTC
See also https://bugzilla.gnome.org/show_bug.cgi?id=652072#c17
Comment 3 Allison Karlitskaya (desrt) 2011-09-01 16:16:49 UTC
Created attachment 195399 [details]
proof of libc/kernel bug

Here's a program written against pure pthreads that demonstrates the problem.

It takes a lot longer to crash than the GLib version, so I'm guessing GLib makes some timing issues more favourable... but the bug is clearly here.


Compile with -pthread.
Comment 4 Allison Karlitskaya (desrt) 2011-09-01 16:44:38 UTC
This bug is present on 32 bits of F15 as well as 64.

The bug is present on F16 as well with kernel 3.0.0-1.fc16.x86_64 and glibc-2.14.90-4.x86_64.

The bug is present on Ubuntu Oneiric alphas with kernel 3.0.0-9-generic and (e)glibc 2.13-17ubuntu2.
Comment 5 Allison Karlitskaya (desrt) 2011-09-01 17:33:48 UTC
More:

 - using -static appears to have the side effect of solving the problem

 - replacing fork() with syscall (SYS_fork) appears to solve the problem

 - replacing the execv() with a direct exit(0) does not solve the problem
   but seems to change the frequency of the occurrence
Comment 6 Allison Karlitskaya (desrt) 2011-09-16 21:09:57 UTC
As per the glibc website, I filed bugs against the distributions:

 - https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/838975

After more than 2 weeks and a few IRC pokes, no love from the Ubuntu guys.  After getting bored of waiting, I filed a bug against Fedora too:

 - https://bugzilla.redhat.com/show_bug.cgi?id=737387

After a few days, it looks like there's a fixed package in F16.
Comment 7 Allison Karlitskaya (desrt) 2011-09-16 21:23:09 UTC
The test is 100% fine with the updated glibc installed.