After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 11059 - Linux poll issue
Linux poll issue
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: mainloop
unspecified
Other other
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
: 558678 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2000-05-14 18:10 UTC by dhelder
Modified: 2014-11-29 04:54 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gmain: don't pass the same fd to g_poll() multiple times (5.96 KB, patch)
2012-03-13 16:41 UTC, Dan Winship
none Details | Review
gmain: don't pass the same fd to g_poll() multiple times (10.89 KB, patch)
2013-01-07 14:12 UTC, Dan Winship
none Details | Review
gmain: don't pass the same fd to g_poll() multiple times (12.00 KB, patch)
2013-02-04 15:24 UTC, Dan Winship
none Details | Review
gmain: fix poll record comparison (896 bytes, patch)
2014-11-28 17:44 UTC, Allison Karlitskaya (desrt)
committed Details | Review

Description dhelder 2001-01-27 17:28:30 UTC
Package: glib
Version: 1.2.7 and earlier

Linux 2.2.14 and earlier versions doesn't like it when you poll more file
descriptors than you have [1].  Unfortunately, glib does this, if the
programmer calls add_watch multiple times for the same channel (ie, file
descriptor).  (I'm not sure if glib does it internally too.)

Solutions:

- Fix Linux.  This may have been fixed in Janauary.  I can't tell from
   mailing list.
- Patch g_main_poll to create only one GPollFD per FD.  This may require
   extra bookkeeping and/or loops.
- Detect in glib's configure and undefine HAVE_POLL to use g_poll.

Workarounds:

- Detect in your programs configure and define poll, which will override
   glibc's poll.  This is hacky, but probably what I'll end up doing.
- Call add_watch only once for each iochannel.  I think I can do this, but
   I'd rather not.

David

[1] http://www.uwsg.iu.edu/hypermail/linux/kernel/0001.1/0181.html




------- Additional Comments From dhelder@umich.edu 2000-10-29 14:11:17 ----

Subject: poll issue workaround
From: David Helder <dhelder@umich.edu>
To: 11059@bugs.gnome.org
Message-Id: <Pine.LNX.4.21.0010291359300.1073-100000@ask.eecs.umich.edu>
Date: Sun, 29 Oct 2000 14:11:17 -0500 (EST)


In my program, I'm testing for broken poll in configure.  If it's broken,
I compile and link in glibc's poll() that uses select, which GLib then
uses because it's loaded before glibc's poll() that uses the broken poll
syscall.  The test can be overriden by a flag (Linux packages would need
to assume its broken).  This fixes the problem.

Would you consider a patch that does this in GLib?  It would be simpler to
do in GLib since Glib already includes glibc's poll-that-uses-select.

David


-- 
      __          _    __  David Helder - University of Michigan
  ___/ /__ __  __(_)__/ /  dhelder@umich.edu
 / _  / _ `/ |/ / / _  /   http://www.eecs.umich.edu/~dhelder
 |_,_/|_,_/|___/_/|_,_/    Jungle Monkey: http://www.junglemonkey.net




------- Additional Comments From otaylor@redhat.com 2000-10-29 15:01:03 ----

Subject: Re: Bug#11059: poll issue workaround
From: Owen Taylor <otaylor@redhat.com>
To: 11059@bugs.gnome.org
Message-Id: <ybewverh0xs.fsf@fresnel.labs.redhat.com>
Date: 29 Oct 2000 15:01:03 -0500


David Helder <dhelder@umich.edu> writes:

> In my program, I'm testing for broken poll in configure.  If it's broken,
> I compile and link in glibc's poll() that uses select, which GLib then
> uses because it's loaded before glibc's poll() that uses the broken poll
> syscall.  The test can be overriden by a flag (Linux packages would need
> to assume its broken).  This fixes the problem.
> 
> Would you consider a patch that does this in GLib?  It would be simpler to
> do in GLib since Glib already includes glibc's poll-that-uses-select.

See:

 void        g_main_set_poll_func     (GPollFunc   func);

Beyond this, I really don't think it is appropriate for GLib to try to
work around system problems. The C library's function is to provide
an isolation layer between syscalls and userland.

Though a patch that added --enable-poll-emulation _might_ possibly,
maybe, maybe be accepted if problems were common enough and hard
enough to fix in the proper places.

                                        Owen




------- Bug moved to this database by debbugs-export@bugzilla.gnome.org 2001-01-27 12:28 -------
This bug was previously known as bug 11059 at http://bugs.gnome.org/
http://bugs.gnome.org/show_bug.cgi?id=11059
Originally filed under the glib product and general component.

The original reporter (dhelder@umich.edu) of this bug does not have an account here.
Reassigning to the exporter, debbugs-export@bugzilla.gnome.org.
Reassigning to the default owner of the component, gtkdev@gtk.org.

Comment 1 Havoc Pennington 2001-01-27 19:33:24 UTC
Since the Linux kernel has been fixed for a long time, we may as well
close this. Also setting severity to enhancement, in case it's reopened.
Comment 2 dhelder 2001-04-04 01:04:02 UTC
I didn't realize this got closed...

The problem is with GLib, not Linux.  According to Posix, poll()
implementors may return an error if nfds is greater than the number of
open file descriptors.  But, if you add more than one watch to an fd
in
Glib, this is violated.  There's no reason to believe from looking
that
the Glib API that you can't do this (it's add_watch, not set_watch).

Solutions:

- check in configure if poll is strict and if so, use the GNU version
that
wraps around select

OR

- write a loose version of poll that wraps around strict poll and use
that
if poll is strict

I'd go with the later and I can submit a patch if you want.


Comment 3 Owen Taylor 2001-08-16 18:52:41 UTC
Apparently this was reintroduced in some 2.4.x kernels, though
removed in a 2.4.9pre4 patch...
Comment 4 dhelder 2001-08-20 16:03:17 UTC
There may still be non-Linux systems with a strict poll.  I think the
best thing to do is to test it in configure.in and assume that poll
is not provided if it is strict.  I already have my own test.  I can
send a GLib patch if you're interested.
Comment 5 Owen Taylor 2001-10-18 15:40:37 UTC
In any case, even the "lax" Linux kernels cut off and 
the maximum number of file descriptors per process, so I think we
need to just add code to coellesce duplicates to
gtkmain.c.

There is the beginning of a patch for this in:
 
http://mail.gnome.org/archives/gtk-devel-list/2001-August/msg00388.html

But it needs more testing/debugging.

I'm not going to try to fix this for 1.2.x unless we
get a lot of complaints, since I the 1.2.x code and 
1.3.x code are fairly different and there is 
lots of potential here for introducing subtle bugs.
Comment 6 Ryan McDougall 2004-01-15 22:52:55 UTC
It should be considered whether the problem still exists in Glib 2.x,
and whether the consensus is that it is in fact a Glib bug. To that
end I hope dhelder@umich.edu can submit his testing code, and possible
patch so others can verify it.
Comment 7 Owen Taylor 2004-03-19 20:33:37 UTC
(The problem still exists in current code, and yes, it's really
a GLib bug as described above

 "and even the "lax" Linux kernels cut off [at] 
  the maximum number of file descriptors per process

though it's not a high priority issue)
Comment 8 Philippe Blain 2004-12-20 19:36:52 UTC
It's possible to test that with :

    struct rlimit rl;

    if (getrlimit (RLIMIT_NOFILE, &rl) == 0 && rl.rlim_cur != RLIM_INFINITY)
        nfiles = rl.rlim_cur;     /* The current (soft) limit */
Comment 9 Allison Karlitskaya (desrt) 2011-09-10 00:55:29 UTC
*** Bug 558678 has been marked as a duplicate of this bug. ***
Comment 10 Dan Winship 2012-03-13 16:41:06 UTC
Created attachment 209625 [details] [review]
gmain: don't pass the same fd to g_poll() multiple times

If a given fd is being polled by multiple sources, we used to pass it
multiple times to g_poll(), which is technically illegal, and also
made it more likely that we'd exceed the maximum number of pollfds.

Fix it to merge together "duplicate" GPollFDs. This also requires some
additional changes to g_main_context_check(), which was previously
assuming that the passed-in fds array was in exactly the same order as
context->poll_records.
Comment 11 Colin Walters 2012-03-13 21:54:22 UTC
Review of attachment 209625 [details] [review]:

There's already a "cached_poll_array", maybe we could cache the mapping hash table too?  It seems a bit lame to create a temporary hash table on each mainloop iteration (particularly one where key != value which takes extra space).

Also the way that g_main_context_iterate() interacts with g_main_context_query() is bizarre.  I guess since the latter is public API, we need to preserve it, but...  We could reuse the current cached array and just sort it, eliminating duplicates while sorting?

Not an expert in this code, so not a real review, just tossing out some thoughts on studying it briefly.
Comment 12 Dan Winship 2012-03-14 13:48:22 UTC
(In reply to comment #11)
> There's already a "cached_poll_array", maybe we could cache the mapping hash
> table too?  It seems a bit lame to create a temporary hash table on each
> mainloop iteration

yeah... I was thinking maybe I should do some timing tests to make sure it's not noticeably slower.

I didn't want to reorganize the code too much, because I knew if I started, I wouldn't stop until I'd fixed half the other open glib/mainloop bugs too :)
Comment 13 Simon McVittie 2012-10-31 13:07:52 UTC
https://bugs.freedesktop.org/show_bug.cgi?id=23194 is the corresponding bug in libdbus, if it's any help.
Comment 14 Dan Winship 2013-01-07 14:12:11 UTC
Created attachment 232903 [details] [review]
gmain: don't pass the same fd to g_poll() multiple times

If a given fd is being polled by multiple sources, we used to pass it
multiple times to g_poll(), which is technically illegal, and also
made it more likely that we'd exceed the maximum number of pollfds.

Fix it to merge together "duplicate" GPollFDs. The easiest way to do
this involves re-sorting context->poll_records into fd order rather
than priority order. This means we now have to walk the entire pollrec
list for every g_main_context_query() and g_main_context_poll(),
rather than only walking the list up to the current max_priority.
However, this will only have a noticeable effect if you have tons of
GPollFDs, and we're already too slow in that case anyway because of
other O(n) operations that happen too often. So this shouldn't change
much (and the new poll API will eventually let us be cleverer).
Comment 15 Dan Winship 2013-02-04 15:24:31 UTC
Created attachment 235150 [details] [review]
gmain: don't pass the same fd to g_poll() multiple times

rebase around Ryan's recent gmain changes, and also remove the
win32-specific fd-de-duping code in gpoll.c
Comment 16 Dan Winship 2014-10-29 20:13:51 UTC
Some discussion on #gtk+ today led to the discovery that the select()-based implementation of g_poll() in gpoll.c (which gets used on OS X due to a bug in OS X's poll()) is only correct if you don't pass duplicate fds, and so any app that, eg, has separate G_IO_IN and G_IO_OUT sources on a socket will be broken on OS X. The patch here fixes it. So... ok to commit?
Comment 17 Allison Karlitskaya (desrt) 2014-10-29 21:04:36 UTC
No sense in blocking this, then... I can rebase my stuff again :)
Comment 18 Dan Winship 2014-10-29 21:35:48 UTC
pushed.

[And it only took 13.75 years!]
Comment 19 Marc-Andre Lureau 2014-10-29 22:57:26 UTC
You deserve some price for fixing the oldest glib bug! :)
Comment 20 Iain Lane 2014-11-26 15:35:54 UTC
We're seeing one of the new tests failing on LE powerpc64 in Debian and Ubuntu now.

(vivid-ppc64el)laney@kelsey01:~/glib2.0-2.43.1/debian/build/deb/glib/tests$ MALLOC_CHECK_=2 ./mainloop --tap
# random seed: R02S9b1b47a4a738095eda51e5394a984dac
# Start of mainloop tests
ok 1 /mainloop/timeouts
ok 2 /mainloop/wait
ok 3 /mainloop/unix-file-poll
**
GLib:ERROR:/home/laney/glib2.0-2.43.1/./glib/tests/mainloop.c:1665:test_nfds: assertion failed (nfds == 2): (3 == 2)
# GLib:ERROR:/home/laney/glib2.0-2.43.1/./glib/tests/mainloop.c:1665:test_nfds: assertion failed (nfds == 2): (3 == 2)
Aborted

The above invocation is the minimal needed to reproduce it. If you run the test directly, or remove the MALLOC_CHECK_ or --tap then the test passes. The test runner passes them both.

Any idea?
Comment 21 Dan Winship 2014-11-26 17:37:58 UTC
(In reply to comment #20)
> GLib:ERROR:/home/laney/glib2.0-2.43.1/./glib/tests/mainloop.c:1665:test_nfds:
> assertion failed (nfds == 2): (3 == 2)

That assertion ought to be at line 1662, not 1665... is this just because you changed things while trying to debug it, or is there some debian patch being applied to the file?

Anyway, I guess the next step would be to print out the out_fds and figure out if the problem is that some other source is sneaking in to the context, or that the duplicates are not being merged for some reason.
Comment 22 Allison Karlitskaya (desrt) 2014-11-28 17:44:18 UTC
Created attachment 291741 [details] [review]
gmain: fix poll record comparison

We intend to keep the list of poll records sorted by (integer) file
descriptor, but due to a typo we are actually keeping it sorted by
pointer address of the GPollFD.

Fix that.
Comment 23 Dan Winship 2014-11-28 19:50:40 UTC
Comment on attachment 291741 [details] [review]
gmain: fix poll record comparison

doh, yeah
Comment 24 Allison Karlitskaya (desrt) 2014-11-29 04:54:20 UTC
Comment on attachment 291741 [details] [review]
gmain: fix poll record comparison

Attachment 291741 [details] pushed as 5aba9ca - gmain: fix poll record comparison