GNOME Bugzilla – Bug 11059
Linux poll issue
Last modified: 2014-11-29 04:54:20 UTC
Package: glib Version: 1.2.7 and earlier Linux 2.2.14 and earlier versions doesn't like it when you poll more file descriptors than you have [1]. Unfortunately, glib does this, if the programmer calls add_watch multiple times for the same channel (ie, file descriptor). (I'm not sure if glib does it internally too.) Solutions: - Fix Linux. This may have been fixed in Janauary. I can't tell from mailing list. - Patch g_main_poll to create only one GPollFD per FD. This may require extra bookkeeping and/or loops. - Detect in glib's configure and undefine HAVE_POLL to use g_poll. Workarounds: - Detect in your programs configure and define poll, which will override glibc's poll. This is hacky, but probably what I'll end up doing. - Call add_watch only once for each iochannel. I think I can do this, but I'd rather not. David [1] http://www.uwsg.iu.edu/hypermail/linux/kernel/0001.1/0181.html ------- Additional Comments From dhelder@umich.edu 2000-10-29 14:11:17 ---- Subject: poll issue workaround From: David Helder <dhelder@umich.edu> To: 11059@bugs.gnome.org Message-Id: <Pine.LNX.4.21.0010291359300.1073-100000@ask.eecs.umich.edu> Date: Sun, 29 Oct 2000 14:11:17 -0500 (EST) In my program, I'm testing for broken poll in configure. If it's broken, I compile and link in glibc's poll() that uses select, which GLib then uses because it's loaded before glibc's poll() that uses the broken poll syscall. The test can be overriden by a flag (Linux packages would need to assume its broken). This fixes the problem. Would you consider a patch that does this in GLib? It would be simpler to do in GLib since Glib already includes glibc's poll-that-uses-select. David -- __ _ __ David Helder - University of Michigan ___/ /__ __ __(_)__/ / dhelder@umich.edu / _ / _ `/ |/ / / _ / http://www.eecs.umich.edu/~dhelder |_,_/|_,_/|___/_/|_,_/ Jungle Monkey: http://www.junglemonkey.net ------- Additional Comments From otaylor@redhat.com 2000-10-29 15:01:03 ---- Subject: Re: Bug#11059: poll issue workaround From: Owen Taylor <otaylor@redhat.com> To: 11059@bugs.gnome.org Message-Id: <ybewverh0xs.fsf@fresnel.labs.redhat.com> Date: 29 Oct 2000 15:01:03 -0500 David Helder <dhelder@umich.edu> writes: > In my program, I'm testing for broken poll in configure. If it's broken, > I compile and link in glibc's poll() that uses select, which GLib then > uses because it's loaded before glibc's poll() that uses the broken poll > syscall. The test can be overriden by a flag (Linux packages would need > to assume its broken). This fixes the problem. > > Would you consider a patch that does this in GLib? It would be simpler to > do in GLib since Glib already includes glibc's poll-that-uses-select. See: void g_main_set_poll_func (GPollFunc func); Beyond this, I really don't think it is appropriate for GLib to try to work around system problems. The C library's function is to provide an isolation layer between syscalls and userland. Though a patch that added --enable-poll-emulation _might_ possibly, maybe, maybe be accepted if problems were common enough and hard enough to fix in the proper places. Owen ------- Bug moved to this database by debbugs-export@bugzilla.gnome.org 2001-01-27 12:28 ------- This bug was previously known as bug 11059 at http://bugs.gnome.org/ http://bugs.gnome.org/show_bug.cgi?id=11059 Originally filed under the glib product and general component. The original reporter (dhelder@umich.edu) of this bug does not have an account here. Reassigning to the exporter, debbugs-export@bugzilla.gnome.org. Reassigning to the default owner of the component, gtkdev@gtk.org.
Since the Linux kernel has been fixed for a long time, we may as well close this. Also setting severity to enhancement, in case it's reopened.
I didn't realize this got closed... The problem is with GLib, not Linux. According to Posix, poll() implementors may return an error if nfds is greater than the number of open file descriptors. But, if you add more than one watch to an fd in Glib, this is violated. There's no reason to believe from looking that the Glib API that you can't do this (it's add_watch, not set_watch). Solutions: - check in configure if poll is strict and if so, use the GNU version that wraps around select OR - write a loose version of poll that wraps around strict poll and use that if poll is strict I'd go with the later and I can submit a patch if you want.
Apparently this was reintroduced in some 2.4.x kernels, though removed in a 2.4.9pre4 patch...
There may still be non-Linux systems with a strict poll. I think the best thing to do is to test it in configure.in and assume that poll is not provided if it is strict. I already have my own test. I can send a GLib patch if you're interested.
In any case, even the "lax" Linux kernels cut off and the maximum number of file descriptors per process, so I think we need to just add code to coellesce duplicates to gtkmain.c. There is the beginning of a patch for this in: http://mail.gnome.org/archives/gtk-devel-list/2001-August/msg00388.html But it needs more testing/debugging. I'm not going to try to fix this for 1.2.x unless we get a lot of complaints, since I the 1.2.x code and 1.3.x code are fairly different and there is lots of potential here for introducing subtle bugs.
It should be considered whether the problem still exists in Glib 2.x, and whether the consensus is that it is in fact a Glib bug. To that end I hope dhelder@umich.edu can submit his testing code, and possible patch so others can verify it.
(The problem still exists in current code, and yes, it's really a GLib bug as described above "and even the "lax" Linux kernels cut off [at] the maximum number of file descriptors per process though it's not a high priority issue)
It's possible to test that with : struct rlimit rl; if (getrlimit (RLIMIT_NOFILE, &rl) == 0 && rl.rlim_cur != RLIM_INFINITY) nfiles = rl.rlim_cur; /* The current (soft) limit */
*** Bug 558678 has been marked as a duplicate of this bug. ***
Created attachment 209625 [details] [review] gmain: don't pass the same fd to g_poll() multiple times If a given fd is being polled by multiple sources, we used to pass it multiple times to g_poll(), which is technically illegal, and also made it more likely that we'd exceed the maximum number of pollfds. Fix it to merge together "duplicate" GPollFDs. This also requires some additional changes to g_main_context_check(), which was previously assuming that the passed-in fds array was in exactly the same order as context->poll_records.
Review of attachment 209625 [details] [review]: There's already a "cached_poll_array", maybe we could cache the mapping hash table too? It seems a bit lame to create a temporary hash table on each mainloop iteration (particularly one where key != value which takes extra space). Also the way that g_main_context_iterate() interacts with g_main_context_query() is bizarre. I guess since the latter is public API, we need to preserve it, but... We could reuse the current cached array and just sort it, eliminating duplicates while sorting? Not an expert in this code, so not a real review, just tossing out some thoughts on studying it briefly.
(In reply to comment #11) > There's already a "cached_poll_array", maybe we could cache the mapping hash > table too? It seems a bit lame to create a temporary hash table on each > mainloop iteration yeah... I was thinking maybe I should do some timing tests to make sure it's not noticeably slower. I didn't want to reorganize the code too much, because I knew if I started, I wouldn't stop until I'd fixed half the other open glib/mainloop bugs too :)
https://bugs.freedesktop.org/show_bug.cgi?id=23194 is the corresponding bug in libdbus, if it's any help.
Created attachment 232903 [details] [review] gmain: don't pass the same fd to g_poll() multiple times If a given fd is being polled by multiple sources, we used to pass it multiple times to g_poll(), which is technically illegal, and also made it more likely that we'd exceed the maximum number of pollfds. Fix it to merge together "duplicate" GPollFDs. The easiest way to do this involves re-sorting context->poll_records into fd order rather than priority order. This means we now have to walk the entire pollrec list for every g_main_context_query() and g_main_context_poll(), rather than only walking the list up to the current max_priority. However, this will only have a noticeable effect if you have tons of GPollFDs, and we're already too slow in that case anyway because of other O(n) operations that happen too often. So this shouldn't change much (and the new poll API will eventually let us be cleverer).
Created attachment 235150 [details] [review] gmain: don't pass the same fd to g_poll() multiple times rebase around Ryan's recent gmain changes, and also remove the win32-specific fd-de-duping code in gpoll.c
Some discussion on #gtk+ today led to the discovery that the select()-based implementation of g_poll() in gpoll.c (which gets used on OS X due to a bug in OS X's poll()) is only correct if you don't pass duplicate fds, and so any app that, eg, has separate G_IO_IN and G_IO_OUT sources on a socket will be broken on OS X. The patch here fixes it. So... ok to commit?
No sense in blocking this, then... I can rebase my stuff again :)
pushed. [And it only took 13.75 years!]
You deserve some price for fixing the oldest glib bug! :)
We're seeing one of the new tests failing on LE powerpc64 in Debian and Ubuntu now. (vivid-ppc64el)laney@kelsey01:~/glib2.0-2.43.1/debian/build/deb/glib/tests$ MALLOC_CHECK_=2 ./mainloop --tap # random seed: R02S9b1b47a4a738095eda51e5394a984dac # Start of mainloop tests ok 1 /mainloop/timeouts ok 2 /mainloop/wait ok 3 /mainloop/unix-file-poll ** GLib:ERROR:/home/laney/glib2.0-2.43.1/./glib/tests/mainloop.c:1665:test_nfds: assertion failed (nfds == 2): (3 == 2) # GLib:ERROR:/home/laney/glib2.0-2.43.1/./glib/tests/mainloop.c:1665:test_nfds: assertion failed (nfds == 2): (3 == 2) Aborted The above invocation is the minimal needed to reproduce it. If you run the test directly, or remove the MALLOC_CHECK_ or --tap then the test passes. The test runner passes them both. Any idea?
(In reply to comment #20) > GLib:ERROR:/home/laney/glib2.0-2.43.1/./glib/tests/mainloop.c:1665:test_nfds: > assertion failed (nfds == 2): (3 == 2) That assertion ought to be at line 1662, not 1665... is this just because you changed things while trying to debug it, or is there some debian patch being applied to the file? Anyway, I guess the next step would be to print out the out_fds and figure out if the problem is that some other source is sneaking in to the context, or that the duplicates are not being merged for some reason.
Created attachment 291741 [details] [review] gmain: fix poll record comparison We intend to keep the list of poll records sorted by (integer) file descriptor, but due to a typo we are actually keeping it sorted by pointer address of the GPollFD. Fix that.
Comment on attachment 291741 [details] [review] gmain: fix poll record comparison doh, yeah
Comment on attachment 291741 [details] [review] gmain: fix poll record comparison Attachment 291741 [details] pushed as 5aba9ca - gmain: fix poll record comparison