After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 691168 - version > glib-2.30.3 dead lock on uclibc systems since poll_waiting was punted
version > glib-2.30.3 dead lock on uclibc systems since poll_waiting was punted
Status: RESOLVED INVALID
Product: glib
Classification: Platform
Component: gthread
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2013-01-05 02:26 UTC by basile@opensource.dyc.edu
Modified: 2013-01-06 02:43 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description basile@opensource.dyc.edu 2013-01-05 02:26:47 UTC
Since poll_waiting was removed with the following commits:

1) GMain: simplify logic for g_wakeup_acknowledge()
   7eae486179e2799c369ed9ffcea663bf9161ce79

2) gmain: get rid of poll_waiting
   1c8c408c51c85230356cc95c06f2e1bd3f376624

most executables linking against glib on a uclibc system dead locked.  Some debugging shows that these processes (or one of their threads) are deadlocking on g_wakeup_acknowledge().

Eg. gqview's bt has:

  • #0 __read_nocancel
    from /lib/libc.so.0
  • #1 g_wakeup_acknowledge
    at gwakeup.c line 212
  • #0 read
    from /lib/libc.so.0
  • #1 g_wakeup_acknowledge
    at gwakeup.c line 212
  • #0 read
    from /lib/libc.so.0
  • #1 g_wakeup_acknowledge
    at gwakeup.c line 212

Reverting the above commits "fixes" the problem but returns to the bad situation in bug #320888 and bug #583511.

I'm not sure how to fix it properly, but at least as a diagnostic, one can comment out the read until empty fd, and things "work" again:

  //while (read (wakeup->fds[0], buffer, sizeof buffer) == sizeof buffer);

---
Steps to reproduce:

1. build a uclibc system with NPTL support.  The arch doesn't matter as this happens on at least x86_64, i686 and armv7a.  (or grab a prebuilt stage3 from <gentoo-mirror>/experimental/<arch>/uclibc)

2. build glib configured as follows --- the above images have it already built

./configure --prefix=/usr --build=x86_64-gentoo-linux-uclibc --host=x86_64-gentoo-linux-uclibc --mandir=/usr/share/man --infodir=/usr/share/i
nfo --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking --enable-xattr --disable-fam --disable-seli
nux --enable-static --disable-dtrace --disable-systemtap --enable-regex --with-pcre=internal --with-threads=posix

3. Build some windowing system (eg XFCE4) and test any GUI program linking against glib (eg eog, gqview, gimp, etc)

4. These programs start and a window might appear (for qgview you get a blank frame, for eog you get nothing), but then the program "freezes".
Comment 1 basile@opensource.dyc.edu 2013-01-05 02:40:56 UTC
For the record, here is the full backtrace for gqview which is the cleanest of the one's I've tested ... single threaded and not very deep:

  • #0 read
    from /lib/libc.so.0
  • #1 g_wakeup_acknowledge
    at gwakeup.c line 212
  • #2 g_main_context_check
    at gmain.c line 2980
  • #3 g_main_context_iterate
    at gmain.c line 3143
  • #4 g_main_context_iterate
    at gmain.c line 3083
  • #5 g_main_loop_run
    at gmain.c line 3340
  • #6 IA__gtk_main
    at gtkmain.c line 1257
  • #7 main
    at main.c line 294

Comment 2 Dan Winship 2013-01-05 14:36:12 UTC
so it seems like the wakeup fd is not getting made non-blocking... poke around in g_wakeup_new() and see which codepath is running, and what happens in the g_unix_set_fd_nonblocking() calls?

Also, does anything fail with "make check" in glib/glib/tests? (particularly the mainloop and gwakeup tests)
Comment 3 Colin Walters 2013-01-05 15:27:18 UTC
I wonder if c70072180557c0a897da0d96ef2cf4f5398ddd3b fixes it...before we were kind of tripping undefined behavior.
Comment 4 basile@opensource.dyc.edu 2013-01-06 02:06:54 UTC
(In reply to comment #2)
> so it seems like the wakeup fd is not getting made non-blocking... poke around
> in g_wakeup_new() and see which codepath is running, and what happens in the
> g_unix_set_fd_nonblocking() calls?
> 
> Also, does anything fail with "make check" in glib/glib/tests? (particularly
> the mainloop and gwakeup tests)

That's it, the wakeup fd is not getting O_NONBLOCK.  Its a problem in uclibc, and I've reduced it to this:

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/eventfd.h>
int
main()
{
    int f = eventfd(0, EFD_NONBLOCK);
    printf("Got O_NONBLOCK = %d\n", fcntl(f, F_GETFL) & O_NONBLOCK ? 1 : 0 );
    fcntl(f, F_SETFL, O_NONBLOCK);
    printf("Got O_NONBLOCK = %d\n", fcntl(f, F_GETFL) & O_NONBLOCK ? 1 : 0 );
    close(f);
}


On a glibc system this prints out:

    Got O_NONBLOCK = 1
    Got O_NONBLOCK = 1

while on a uclibc system this prints out:

    Got O_NONBLOCK = 0
    Got O_NONBLOCK = 1

I'm not sure there's anything for glib to do here. I'll pursue it with uclibc.  As a diagnostic, something like this also "fixes" the problem in glib:

GWakeup *
g_wakeup_new (void)
{
  GError *error = NULL;
  GWakeup *wakeup;

  wakeup = g_slice_new (GWakeup);

  /* try eventfd first, if we think we can */
#if defined (HAVE_EVENTFD)
#ifndef TEST_EVENTFD_FALLBACK
  wakeup->fds[0] = eventfd (0, EFD_CLOEXEC | EFD_NONBLOCK);
  fcntl( wakeup->fds[0], F_SETFL, O_NONBLOCK);
#else
  wakeup->fds[0] = -1;
#endif
Comment 5 basile@opensource.dyc.edu 2013-01-06 02:43:53 UTC
Okay this was fixed already in uclibc with the following commit:

http://git.uclibc.org/uClibc/commit/?id=e118373cbb58ba5ffa5fb6670957678d5b87cdb9

Sorry for the noise.