After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 321746 - main loop can spin when watching a socket without G_IO_HUP
main loop can spin when watching a socket without G_IO_HUP
Status: RESOLVED WONTFIX
Product: glib
Classification: Platform
Component: mainloop
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2005-11-17 19:24 UTC by Dan Winship
Modified: 2012-01-16 15:03 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Dan Winship 2005-11-17 19:24:34 UTC
POSIX/XOPEN/Single UNIX Specification/whatever the kids are calling it
these days says:

    In addition, poll() shall set the POLLHUP, POLLERR, and POLLNVAL flag in
    revents if the condition is true, even if the application did not set the
    corresponding bit in events.

However, g_io_unix_check() will only return TRUE if the pollfd's revents
overlaps the requested events on the watch. On Linux at least, this means
that if you have an iochannel wrapping a socket fd, and you create a watch
with g_io_add_watch(chan, G_IO_OUT, func, data), and then the socket is
closed by the other side, then poll() will return with that pollfd's revents
to set POLLHUP, g_io_unix_check() will return FALSE (because POLLOUT wasn't
set), and then every future call to poll() will return immediately (after
setting the socket pollfd's revents to POLLHUP again) putting the app into
a spin loop sucking up CPU until you kill it or some idle/timeout handler
causes the watch to be removed.


I think the right fix is to have g_io_unix_check() always return TRUE
if one of the error bits is set, and have g_io_unix_dispatch always pass
those bits on to the callback. (If the callback ignores the GIOCondition
and just tries to write anyway, it will just get back some sort of error,
and then it can deal with that.)

(Oh, and probably also some documentation in g_io_add_watch saying "don't
do that".)
Comment 1 Federico Mena Quintero 2005-11-17 20:28:48 UTC
This is the bug that has bit Evolution, rcd, and everything, right?
Comment 2 Dan Winship 2005-11-17 21:17:22 UTC
I don't know of it affecting Evolution. I just fixed it in soup (bug 319305),
where it was affecting rcd.
Comment 3 Federico Mena Quintero 2005-11-17 21:32:40 UTC
Yeah, that bug.
Comment 4 Tor Lillqvist 2005-11-17 23:44:57 UTC
BTW, while I have your attention, did you notice my recent musings about this
stuff on Win32 on gtk-devel-list...? Would you say that giowin32.c should set
G_IO_HUP only if none of the other bits are set (nothing more to read)? Is this
how it works out on Unix? Or are apps/libs in general prepared to handle both
G_IO_IN *and* G_IO_HUP being set in a callback
Comment 5 Dan Winship 2005-11-18 14:17:03 UTC
I did see that, which is what reminded me to file this. But I'm not an
expert on poll semantics in general, I just know this case. :-}

soup was previously not watching for G_IO_HUP on either reads or writes,
but it only ever caused a problem on writes. On reads, if the connection
is closed, it will return at least G_IO_IN, and reading will return 0
bytes, which means the connection was closed. I don't know whether there's
a G_IO_HUP at the same time as the G_IO_IN, and I don't know if there would
be G_IO_HUPs after that if you polled again, because I never did.
Comment 6 Dan Winship 2008-08-29 13:42:48 UTC
Every now and then I'm looking over old bugs and then I see this and remember I need to update it, and then think "no I should figure out the solution and write a patch", and then I do neither.

Anyway, I said:

(In reply to comment #0)
> I think the right fix is to have g_io_unix_check() always return TRUE
> if one of the error bits is set, and have g_io_unix_dispatch always pass
> those bits on to the callback. (If the callback ignores the GIOCondition
> and just tries to write anyway, it will just get back some sort of error,
> and then it can deal with that.)

But this wouldn't actually work. The application might have two separate watches, one for G_IO_OUT, and one for G_IO_HUP|G_IO_ERR, and in that case, calling the G_IO_OUT handler on HUP might break a previously-working program.

So you'd need to do something messier, like keep track of all of the watches that exist on a channel, and only call the OUT handler on HUP if there isn't a HUP handler. Or something. ISTR looking at the code at one point and determining that this would be difficult, because there are no pointers from channels to watches, only the other way around.
Comment 7 Tor Lillqvist 2008-08-29 14:06:22 UTC
It's comforting to read that also on Unix the giochannel code actually has some bugs and fixing it would be messy... (On Windows, of course the number of bugs/misfeatures in it is one order of magnitude larger, and fixing them equally even messier.)
Comment 8 Dan Winship 2012-01-16 15:03:49 UTC
ok, these days, if you're doing socket I/O with glib, you should be using GSocket, and if not, then you're responsible for understanding the dark corners of sockets API semantics yourself.