GNOME Bugzilla – Bug 691168
version > glib-2.30.3 dead lock on uclibc systems since poll_waiting was punted
Last modified: 2013-01-06 02:43:53 UTC
Since poll_waiting was removed with the following commits: 1) GMain: simplify logic for g_wakeup_acknowledge() 7eae486179e2799c369ed9ffcea663bf9161ce79 2) gmain: get rid of poll_waiting 1c8c408c51c85230356cc95c06f2e1bd3f376624 most executables linking against glib on a uclibc system dead locked. Some debugging shows that these processes (or one of their threads) are deadlocking on g_wakeup_acknowledge(). Eg. gqview's bt has:
+ Trace 231336
Reverting the above commits "fixes" the problem but returns to the bad situation in bug #320888 and bug #583511. I'm not sure how to fix it properly, but at least as a diagnostic, one can comment out the read until empty fd, and things "work" again: //while (read (wakeup->fds[0], buffer, sizeof buffer) == sizeof buffer); --- Steps to reproduce: 1. build a uclibc system with NPTL support. The arch doesn't matter as this happens on at least x86_64, i686 and armv7a. (or grab a prebuilt stage3 from <gentoo-mirror>/experimental/<arch>/uclibc) 2. build glib configured as follows --- the above images have it already built ./configure --prefix=/usr --build=x86_64-gentoo-linux-uclibc --host=x86_64-gentoo-linux-uclibc --mandir=/usr/share/man --infodir=/usr/share/i nfo --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking --enable-xattr --disable-fam --disable-seli nux --enable-static --disable-dtrace --disable-systemtap --enable-regex --with-pcre=internal --with-threads=posix 3. Build some windowing system (eg XFCE4) and test any GUI program linking against glib (eg eog, gqview, gimp, etc) 4. These programs start and a window might appear (for qgview you get a blank frame, for eog you get nothing), but then the program "freezes".
For the record, here is the full backtrace for gqview which is the cleanest of the one's I've tested ... single threaded and not very deep:
+ Trace 231337
so it seems like the wakeup fd is not getting made non-blocking... poke around in g_wakeup_new() and see which codepath is running, and what happens in the g_unix_set_fd_nonblocking() calls? Also, does anything fail with "make check" in glib/glib/tests? (particularly the mainloop and gwakeup tests)
I wonder if c70072180557c0a897da0d96ef2cf4f5398ddd3b fixes it...before we were kind of tripping undefined behavior.
(In reply to comment #2) > so it seems like the wakeup fd is not getting made non-blocking... poke around > in g_wakeup_new() and see which codepath is running, and what happens in the > g_unix_set_fd_nonblocking() calls? > > Also, does anything fail with "make check" in glib/glib/tests? (particularly > the mainloop and gwakeup tests) That's it, the wakeup fd is not getting O_NONBLOCK. Its a problem in uclibc, and I've reduced it to this: #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <sys/eventfd.h> int main() { int f = eventfd(0, EFD_NONBLOCK); printf("Got O_NONBLOCK = %d\n", fcntl(f, F_GETFL) & O_NONBLOCK ? 1 : 0 ); fcntl(f, F_SETFL, O_NONBLOCK); printf("Got O_NONBLOCK = %d\n", fcntl(f, F_GETFL) & O_NONBLOCK ? 1 : 0 ); close(f); } On a glibc system this prints out: Got O_NONBLOCK = 1 Got O_NONBLOCK = 1 while on a uclibc system this prints out: Got O_NONBLOCK = 0 Got O_NONBLOCK = 1 I'm not sure there's anything for glib to do here. I'll pursue it with uclibc. As a diagnostic, something like this also "fixes" the problem in glib: GWakeup * g_wakeup_new (void) { GError *error = NULL; GWakeup *wakeup; wakeup = g_slice_new (GWakeup); /* try eventfd first, if we think we can */ #if defined (HAVE_EVENTFD) #ifndef TEST_EVENTFD_FALLBACK wakeup->fds[0] = eventfd (0, EFD_CLOEXEC | EFD_NONBLOCK); fcntl( wakeup->fds[0], F_SETFL, O_NONBLOCK); #else wakeup->fds[0] = -1; #endif
Okay this was fixed already in uclibc with the following commit: http://git.uclibc.org/uClibc/commit/?id=e118373cbb58ba5ffa5fb6670957678d5b87cdb9 Sorry for the noise.