Bug 338943 – Multiple watches on the same socket

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 338943 - Multiple watches on the same socket


Summary:	Multiple watches on the same socket


Status:	RESOLVED OBSOLETE

Product:	glib
Classification:	Platform
Component:	win32
Version:	2.10.x
Hardware:	Other Windows

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtk-win32 maintainers
QA Contact:	gtk-win32 maintainers

URL:
Whiteboard:

Duplicates:	604568 (view as bug list)
Depends on:
Blocks:

Reported:	2006-04-18 21:06 UTC by Owen Taylor
Modified:	2018-05-24 10:45 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Test case (1.26 KB, text/plain) 2006-04-18 21:09 UTC, Owen Taylor		Details
Fixed + improved testcase (3.08 KB, application/octet-stream) 2007-03-20 04:51 UTC, Steven Brown		Details
very early test patch (4.47 KB, patch) 2008-08-20 15:19 UTC, Tor Lillqvist	none	Details \| Review
G_IO_WIN32_DEBUG log for (possible) bug occurrence (14.16 KB, text/plain) 2010-01-05 04:02 UTC, Catalin Patulea		Details

Description Owen Taylor 2006-04-18 21:06:40 UTC

In the GLib API, it's legitimate to add multiple watches to the 
same IO channel, but that doesn't work currently on Windows.
Each watch results in a separate GPollFD, and are passed individually.
to MsgWaitForMultipleObjects(). That violates the docs for 
MsgWaitForMultipleObjects:

 The array can contain handles of objects of different types. It may 
 not contain multiple copies of the same handle.

But that isn't what is causing the problems; the problems are:

 - The logic in g_io_win32_prepare() will cause the different
   channels to overwrite each other unpredictably for 
   WSAEventSelect()
 - As soon as MsgWaitForMultipleObjects() returns the fd
   as being ready for any event, *all* GPollFDs waiting on
   any condition will be marked as ready. (In GPoll)

I'll attach a test case that I think will demonstrate the problem,
though I haven't tried compiling it or running it on Win32, just
on  Linux, where it works. Depending on the order sources
prepared, I'd expect it to either not print anything at all,
or hit the HUP condition on Win32.

A sketch of a possible fix:

 - Keep a single GPollFD for GIOChannel, not once for each
   watch. Make the events in that GPollFD be the union
   of all the active watches.

   You could implement this by keeping a linked list of
   all watches for the channel on the channel and recomputing
   the union of the events on change.

   An alternate approach would be to use a refcounted bitmask.
   There's an implementation I wrote *somewhere* in the GTK+
   stack that's pretty nifty, that possibly could be moved
   into GLib. But I have no clue where :-(

   [ Basic technique was keep a set of masks in a linked list,
     when setting look for an existing mask with that bit 
     clear and set it, otherwise add a new mask. On clearing,
     clear the first mask with that bit set. Remove masks
     that are 0. ]

 - In g_poll(), *before* waiting, call WSAEventSelect()
   to reset the record for the socket and clear any pending events. 
   I believe that the current code slightly racy ... in a sequence like:

    - Input arrives, callback is called
    - More input arrives, reseting the event flag in the socket record
    - App reads all available input
    - MsgWaitForMultipleObjects is called, reports the socket readable
    - Callback is called, app expects data, gets none
   
   Now, it's probably good to make your app robust against getting
   no data in a read callback, but it doesn't match the guarantees
   that we present on Unix (*)

   There is no race condition with more data arriving before
   clearing pending events because the interesting events are
   level triggered - see the WSAEventSelect() docs.

 - Then after polling call WSAEnumNetworkEvents() to determine
   what events actually happened.

I'm not sure there is enough information currently to let g_poll()
know if the fd it is passed is a winsock event object ... if not,
you could probably stuff some magic flag into the GPollFD->events
field to mark that.

An alternate approach would be to reset the events in prepare(),
and then call WSAEnumNetworkEvents() in the first check() on
any of the sources. (g_poll() would stick a value into the
events field that represents "something is ready, not sure what")

There would be duplicate work if there are multiple sources for
the same channel... you'd have to reset the events in every call 
to prepare() ... but it should be survivable.

What the above doesn't handle is multiple channels for the same
underlying socket, but I don't think that's really that much
worth worrying about.

Comment 1 Owen Taylor 2006-04-18 21:08:01 UTC

The footnote is:

 (*) A related problem is that WSAEventSelect makes a socket non-blocking
     as a side effect. I don't think that we can fix it, so it is just
     a portability gotcha when moving code from Unix to Windows.

Comment 2 Owen Taylor 2006-04-18 21:09:50 UTC

Created attachment 63829 [details]
Test case

Test case, run as:

 multiwatch mail.gnome.org

or, if your port 25 is firewalled

 multiwatch smtp.myisp.net

A SMTP server is used purely as a simple way of getting a port
that sends us some data when we connect to it.

Comment 3 Steven Brown 2007-03-20 04:51:14 UTC

Created attachment 84938 [details]
Fixed + improved testcase

Updated Owen's test case to work on win32.  Also changed it to a simpler test for the watch condition getting overwritten when there are multiple watches per socket - if both IN and OUT aren't flagged within 5 seconds, considers the test failed.

Comment 4 Pierre Ossman 2007-09-25 12:50:51 UTC

Anyone looking at this? This bugger has bitten me as well and it took some time before I figured it out.

As a band-aid this bug should be documented and/or checked for at runtime.

(I also suspect there might be some relation between this bug, bug 357674 and bug 425156)

Comment 5 Tor Lillqvist 2008-08-20 01:45:49 UTC

I think I am going to have a go at this now... Anybody have any fresh ideas to add?

Comment 6 Tor Lillqvist 2008-08-20 15:19:16 UTC

Created attachment 117063 [details] [review]
very early test patch

How close to real code is actually that test program? I mean, a real use of GIOChannels wouldn't want the G_IO_OUT and G_IO_IN handlers to be called repeatedly, would they?

This preliminary patch adds handling of several watches per GIOChannel to giowin32.c. It did help the test program. The patch is quite simplistic, though, nothing as complicated as Owen describes, so I am not sure at all if it is good enough... lots of crack in there.

And I am not sure at all that the existing event setting and resetting in giowin32.c (which I didn't touch now) makes much sense at all.... probably partly just works by accident. Need to read Owen's comment very carefully and ponder this more.

I haven't yet checked if the patch breaks some of the very few test programs for giowin32 functionality.

Comment 7 Tor Lillqvist 2008-08-21 00:32:46 UTC

I just realized a fundamental difference between how the poll() system call behaves compared to the WaitForMultipleObjects() used in g_poll() on Windows. I'll add this comment here so that my thoughts don't get lost;)

poll() returns indicating *all* the fds that match their conditions. Well, at least if called with a zero timeout, or if called with non-zero timeout but no waiting was needed, doesn't it? I.e. for each of the fds for which its condition is true even before any waiting, this is indicated in the result. Not just for one of them.

(When called with a non-zero timeout, and when a wait is needed, I guess what almost always happens is that just one fd "fires" and just that is then indicated in the result?)

The WaitForMultipleObjects() on the other hand only tells that *one* of the waited for events is in the signalled state. Others might be too, but the return value just indicates one of them.

I think this is one cause of some of the giochannel/gmain problems on Windows.

(MsgWaitForMultipleObjects() complicates it a bit further by also waiting for messages, and apparently preferring to indicate messages.)

So, probably quite some more complexity needs to be added to the Win32 version of g_poll(). When polling for messages and one or several events, or when polling for several events but no messages (perhaps only when polling with a zero timeout, though), after a non-timeout return value (i.e. either indicating that there are messages or that one of the events is in the signalled state), the code needs to check for each of the remaining events, too, if some of them also happens to be in the signalled state.

This is needed for instance because the code in gmain.c resets the revent field of poll records based on what g_poll() returned. Even if one of the events being polled is in the signalled state, if there also are messages waiting, the MsgWaitForMultipleObjects() will only indicate that messages are waiting, so the gpollfds for the events will have their revents fields returned as unset.

Does this sound reasonable?

Comment 8 Tor Lillqvist 2009-01-12 13:20:06 UTC

By the way, the idea in the previous comment has been implemented in GLib since 2.18.1 (with an important bug fix in 2.18.2). See the Win32 implementation of g_poll() in gmain.c, and the helper function poll_rest(). The actual problem described in this bug has not been fixed though, as far as I see.

Comment 9 Graham 2009-12-15 14:31:55 UTC

This may also be the same as I raised as #604568.
The documentation should be updated to say something like:
(near g_io_channel_win32_new_socket):

Current limitation on WIN32 - only one watch per fd/handle,
Suggestion - keep your own GIOConditions bitmask and
de-re-subscribe when your needs change.


It might not be too complex to add a runtime detector,
that notices that a channel already has a watch,
and prints a warning.

Comment 10 Tor Lillqvist 2009-12-15 15:41:19 UTC

*** Bug 604568 has been marked as a duplicate of this bug. ***

Comment 11 Catalin Patulea 2010-01-05 04:02:48 UTC

Created attachment 150816 [details]
G_IO_WIN32_DEBUG log for (possible) bug occurrence

Comment 12 Catalin Patulea 2010-01-05 04:04:10 UTC

I think this is affecting me through pygtk 2.14.2; I am trying to port a Python application to Windows which calls gobject.io_add_watch multiple times on the same (socket) fd, with different condition flags. After accepting one connection, the application spins at 100% CPU in the main loop.

Just to confirm this is the bug I am hitting, attached (g_io_win32_debug_log.txt) is the output from running the application with G_IO_WIN32_DEBUG=1. The application creates a server socket and adds 1 watch (IO_IN|IO_ERR|IO_HUP) for incoming connections. After it accepts one connection, it adds 4 watches to the new socket, for IO_IN, IO_ERR, IO_HUP and IO_OUT, respectively. The main loop then spins checking for new socket events.

Has anything changed in the code for this from 2.14.2 until now?

Comment 13 Thomas Flüeli 2011-02-03 08:45:33 UTC

I can confirm that this bug still exists in 2.26.

While getting telepathy-gabble to work on Windows it took me a serious amount of time until I realized that I was hit by this bug. It would be _very_ helpful and highly appreciated if there is a note in the documentation that currently only one watch per channel is supported. Just like the other notes on limitations on Windows.

Is there any realistic chance to get this fairly old bug fixed? Is the patch from comment 6 still applicable? If so I would like to test it with telepathy-gabble.

Comment 14 GNOME Infrastructure Team 2018-05-24 10:45:55 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/49.