GNOME Bugzilla – Bug 338943
Multiple watches on the same socket
Last modified: 2018-05-24 10:45:55 UTC
In the GLib API, it's legitimate to add multiple watches to the same IO channel, but that doesn't work currently on Windows. Each watch results in a separate GPollFD, and are passed individually. to MsgWaitForMultipleObjects(). That violates the docs for MsgWaitForMultipleObjects: The array can contain handles of objects of different types. It may not contain multiple copies of the same handle. But that isn't what is causing the problems; the problems are: - The logic in g_io_win32_prepare() will cause the different channels to overwrite each other unpredictably for WSAEventSelect() - As soon as MsgWaitForMultipleObjects() returns the fd as being ready for any event, *all* GPollFDs waiting on any condition will be marked as ready. (In GPoll) I'll attach a test case that I think will demonstrate the problem, though I haven't tried compiling it or running it on Win32, just on Linux, where it works. Depending on the order sources prepared, I'd expect it to either not print anything at all, or hit the HUP condition on Win32. A sketch of a possible fix: - Keep a single GPollFD for GIOChannel, not once for each watch. Make the events in that GPollFD be the union of all the active watches. You could implement this by keeping a linked list of all watches for the channel on the channel and recomputing the union of the events on change. An alternate approach would be to use a refcounted bitmask. There's an implementation I wrote *somewhere* in the GTK+ stack that's pretty nifty, that possibly could be moved into GLib. But I have no clue where :-( [ Basic technique was keep a set of masks in a linked list, when setting look for an existing mask with that bit clear and set it, otherwise add a new mask. On clearing, clear the first mask with that bit set. Remove masks that are 0. ] - In g_poll(), *before* waiting, call WSAEventSelect() to reset the record for the socket and clear any pending events. I believe that the current code slightly racy ... in a sequence like: - Input arrives, callback is called - More input arrives, reseting the event flag in the socket record - App reads all available input - MsgWaitForMultipleObjects is called, reports the socket readable - Callback is called, app expects data, gets none Now, it's probably good to make your app robust against getting no data in a read callback, but it doesn't match the guarantees that we present on Unix (*) There is no race condition with more data arriving before clearing pending events because the interesting events are level triggered - see the WSAEventSelect() docs. - Then after polling call WSAEnumNetworkEvents() to determine what events actually happened. I'm not sure there is enough information currently to let g_poll() know if the fd it is passed is a winsock event object ... if not, you could probably stuff some magic flag into the GPollFD->events field to mark that. An alternate approach would be to reset the events in prepare(), and then call WSAEnumNetworkEvents() in the first check() on any of the sources. (g_poll() would stick a value into the events field that represents "something is ready, not sure what") There would be duplicate work if there are multiple sources for the same channel... you'd have to reset the events in every call to prepare() ... but it should be survivable. What the above doesn't handle is multiple channels for the same underlying socket, but I don't think that's really that much worth worrying about.
The footnote is: (*) A related problem is that WSAEventSelect makes a socket non-blocking as a side effect. I don't think that we can fix it, so it is just a portability gotcha when moving code from Unix to Windows.
Created attachment 63829 [details] Test case Test case, run as: multiwatch mail.gnome.org or, if your port 25 is firewalled multiwatch smtp.myisp.net A SMTP server is used purely as a simple way of getting a port that sends us some data when we connect to it.
Created attachment 84938 [details] Fixed + improved testcase Updated Owen's test case to work on win32. Also changed it to a simpler test for the watch condition getting overwritten when there are multiple watches per socket - if both IN and OUT aren't flagged within 5 seconds, considers the test failed.
Anyone looking at this? This bugger has bitten me as well and it took some time before I figured it out. As a band-aid this bug should be documented and/or checked for at runtime. (I also suspect there might be some relation between this bug, bug 357674 and bug 425156)
I think I am going to have a go at this now... Anybody have any fresh ideas to add?
Created attachment 117063 [details] [review] very early test patch How close to real code is actually that test program? I mean, a real use of GIOChannels wouldn't want the G_IO_OUT and G_IO_IN handlers to be called repeatedly, would they? This preliminary patch adds handling of several watches per GIOChannel to giowin32.c. It did help the test program. The patch is quite simplistic, though, nothing as complicated as Owen describes, so I am not sure at all if it is good enough... lots of crack in there. And I am not sure at all that the existing event setting and resetting in giowin32.c (which I didn't touch now) makes much sense at all.... probably partly just works by accident. Need to read Owen's comment very carefully and ponder this more. I haven't yet checked if the patch breaks some of the very few test programs for giowin32 functionality.
I just realized a fundamental difference between how the poll() system call behaves compared to the WaitForMultipleObjects() used in g_poll() on Windows. I'll add this comment here so that my thoughts don't get lost;) poll() returns indicating *all* the fds that match their conditions. Well, at least if called with a zero timeout, or if called with non-zero timeout but no waiting was needed, doesn't it? I.e. for each of the fds for which its condition is true even before any waiting, this is indicated in the result. Not just for one of them. (When called with a non-zero timeout, and when a wait is needed, I guess what almost always happens is that just one fd "fires" and just that is then indicated in the result?) The WaitForMultipleObjects() on the other hand only tells that *one* of the waited for events is in the signalled state. Others might be too, but the return value just indicates one of them. I think this is one cause of some of the giochannel/gmain problems on Windows. (MsgWaitForMultipleObjects() complicates it a bit further by also waiting for messages, and apparently preferring to indicate messages.) So, probably quite some more complexity needs to be added to the Win32 version of g_poll(). When polling for messages and one or several events, or when polling for several events but no messages (perhaps only when polling with a zero timeout, though), after a non-timeout return value (i.e. either indicating that there are messages or that one of the events is in the signalled state), the code needs to check for each of the remaining events, too, if some of them also happens to be in the signalled state. This is needed for instance because the code in gmain.c resets the revent field of poll records based on what g_poll() returned. Even if one of the events being polled is in the signalled state, if there also are messages waiting, the MsgWaitForMultipleObjects() will only indicate that messages are waiting, so the gpollfds for the events will have their revents fields returned as unset. Does this sound reasonable?
By the way, the idea in the previous comment has been implemented in GLib since 2.18.1 (with an important bug fix in 2.18.2). See the Win32 implementation of g_poll() in gmain.c, and the helper function poll_rest(). The actual problem described in this bug has not been fixed though, as far as I see.
This may also be the same as I raised as #604568. The documentation should be updated to say something like: (near g_io_channel_win32_new_socket): Current limitation on WIN32 - only one watch per fd/handle, Suggestion - keep your own GIOConditions bitmask and de-re-subscribe when your needs change. It might not be too complex to add a runtime detector, that notices that a channel already has a watch, and prints a warning.
*** Bug 604568 has been marked as a duplicate of this bug. ***
Created attachment 150816 [details] G_IO_WIN32_DEBUG log for (possible) bug occurrence
I think this is affecting me through pygtk 2.14.2; I am trying to port a Python application to Windows which calls gobject.io_add_watch multiple times on the same (socket) fd, with different condition flags. After accepting one connection, the application spins at 100% CPU in the main loop. Just to confirm this is the bug I am hitting, attached (g_io_win32_debug_log.txt) is the output from running the application with G_IO_WIN32_DEBUG=1. The application creates a server socket and adds 1 watch (IO_IN|IO_ERR|IO_HUP) for incoming connections. After it accepts one connection, it adds 4 watches to the new socket, for IO_IN, IO_ERR, IO_HUP and IO_OUT, respectively. The main loop then spins checking for new socket events. Has anything changed in the code for this from 2.14.2 until now?
I can confirm that this bug still exists in 2.26. While getting telepathy-gabble to work on Windows it took me a serious amount of time until I realized that I was hit by this bug. It would be _very_ helpful and highly appreciated if there is a note in the documentation that currently only one watch per channel is supported. Just like the other notes on limitations on Windows. Is there any realistic chance to get this fairly old bug fixed? Is the patch from comment 6 still applicable? If so I would like to test it with telepathy-gabble.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/49.