GNOME Bugzilla – Bug 321746
main loop can spin when watching a socket without G_IO_HUP
Last modified: 2012-01-16 15:03:49 UTC
POSIX/XOPEN/Single UNIX Specification/whatever the kids are calling it these days says: In addition, poll() shall set the POLLHUP, POLLERR, and POLLNVAL flag in revents if the condition is true, even if the application did not set the corresponding bit in events. However, g_io_unix_check() will only return TRUE if the pollfd's revents overlaps the requested events on the watch. On Linux at least, this means that if you have an iochannel wrapping a socket fd, and you create a watch with g_io_add_watch(chan, G_IO_OUT, func, data), and then the socket is closed by the other side, then poll() will return with that pollfd's revents to set POLLHUP, g_io_unix_check() will return FALSE (because POLLOUT wasn't set), and then every future call to poll() will return immediately (after setting the socket pollfd's revents to POLLHUP again) putting the app into a spin loop sucking up CPU until you kill it or some idle/timeout handler causes the watch to be removed. I think the right fix is to have g_io_unix_check() always return TRUE if one of the error bits is set, and have g_io_unix_dispatch always pass those bits on to the callback. (If the callback ignores the GIOCondition and just tries to write anyway, it will just get back some sort of error, and then it can deal with that.) (Oh, and probably also some documentation in g_io_add_watch saying "don't do that".)
This is the bug that has bit Evolution, rcd, and everything, right?
I don't know of it affecting Evolution. I just fixed it in soup (bug 319305), where it was affecting rcd.
Yeah, that bug.
BTW, while I have your attention, did you notice my recent musings about this stuff on Win32 on gtk-devel-list...? Would you say that giowin32.c should set G_IO_HUP only if none of the other bits are set (nothing more to read)? Is this how it works out on Unix? Or are apps/libs in general prepared to handle both G_IO_IN *and* G_IO_HUP being set in a callback
I did see that, which is what reminded me to file this. But I'm not an expert on poll semantics in general, I just know this case. :-} soup was previously not watching for G_IO_HUP on either reads or writes, but it only ever caused a problem on writes. On reads, if the connection is closed, it will return at least G_IO_IN, and reading will return 0 bytes, which means the connection was closed. I don't know whether there's a G_IO_HUP at the same time as the G_IO_IN, and I don't know if there would be G_IO_HUPs after that if you polled again, because I never did.
Every now and then I'm looking over old bugs and then I see this and remember I need to update it, and then think "no I should figure out the solution and write a patch", and then I do neither. Anyway, I said: (In reply to comment #0) > I think the right fix is to have g_io_unix_check() always return TRUE > if one of the error bits is set, and have g_io_unix_dispatch always pass > those bits on to the callback. (If the callback ignores the GIOCondition > and just tries to write anyway, it will just get back some sort of error, > and then it can deal with that.) But this wouldn't actually work. The application might have two separate watches, one for G_IO_OUT, and one for G_IO_HUP|G_IO_ERR, and in that case, calling the G_IO_OUT handler on HUP might break a previously-working program. So you'd need to do something messier, like keep track of all of the watches that exist on a channel, and only call the OUT handler on HUP if there isn't a HUP handler. Or something. ISTR looking at the code at one point and determining that this would be difficult, because there are no pointers from channels to watches, only the other way around.
It's comforting to read that also on Unix the giochannel code actually has some bugs and fixing it would be messy... (On Windows, of course the number of bugs/misfeatures in it is one order of magnitude larger, and fixing them equally even messier.)
ok, these days, if you're doing socket I/O with glib, you should be using GSocket, and if not, then you're responsible for understanding the dark corners of sockets API semantics yourself.