After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 762283 - GSocket – Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket
GSocket – Fix race conditions on Win32 if multiple threads are waiting on con...
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: gio
unspecified
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 750728 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2016-02-18 17:13 UTC by Sebastian Dröge (slomo)
Modified: 2018-05-24 18:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
GSocket – Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket (9.61 KB, patch)
2016-02-18 17:13 UTC, Sebastian Dröge (slomo)
none Details | Review
GSocket – Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket (9.62 KB, patch)
2016-02-21 16:15 UTC, Sebastian Dröge (slomo)
accepted-commit_now Details | Review
GSocket: Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket (9.60 KB, patch)
2016-12-05 10:29 UTC, Sebastian Dröge (slomo)
none Details | Review
GSocket: Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket (9.60 KB, patch)
2016-12-05 10:40 UTC, Sebastian Dröge (slomo)
committed Details | Review

Description Sebastian Dröge (slomo) 2016-02-18 17:13:35 UTC
See commit message. We need this in GStreamer on Win32, otherwise applications
that work just fine on POSIX systems are just waiting forever for a socket to
be readable or writable.
Comment 1 Sebastian Dröge (slomo) 2016-02-18 17:13:39 UTC
Created attachment 321605 [details] [review]
GSocket – Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket

WSAWaitForMultipleEvents() only returns for one of the waiting threads, and
that one might not even be the one waiting for the condition that changed. As
such, only let a single thread wait on the event and use a GCond for all other
threads.

With this it is possible to e.g. have an UDP socket that is written to from
one thread and read from in another thread on Win32 too. On POSIX systems this
was working before already.
Comment 2 Sebastian Dröge (slomo) 2016-02-19 15:59:03 UTC
Main question here is if the moving of the flag unsetting is correct. The problem with unsetting the flags *after* the operation returned WOULDBLOCK (as it was before) is that between the operation returning and unsetting the flag, another thread might have already been woken up from the WSAWaitForMultipleEvents and set the flag... which then would be unset again and never ever be set again.
Comment 3 Dan Winship 2016-02-21 14:10:31 UTC
(In reply to Sebastian Dröge (slomo) from comment #2)
> Main question here is if the moving of the flag unsetting is correct.

I'm not really an expert on the win32-specific socket code... getting review from a win32 hacker here would be good.

>+            g_cond_signal (&socket->priv->win32_source_cond);

This should be g_cond_broadcast() shouldn't it?
Comment 4 Sebastian Dröge (slomo) 2016-02-21 16:11:34 UTC
(In reply to Dan Winship from comment #3)
> (In reply to Sebastian Dröge (slomo) from comment #2)
> > Main question here is if the moving of the flag unsetting is correct.
> 
> I'm not really an expert on the win32-specific socket code... getting review
> from a win32 hacker here would be good.

Who would be a good person to put in CC for that? :)

> >+            g_cond_signal (&socket->priv->win32_source_cond);
> 
> This should be g_cond_broadcast() shouldn't it?

Indeed, I wonder why this didn't cause problems in my testcase. Thanks!
Comment 5 Sebastian Dröge (slomo) 2016-02-21 16:15:40 UTC
Created attachment 321780 [details] [review]
GSocket – Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket

WSAWaitForMultipleEvents() only returns for one of the waiting threads, and
that one might not even be the one waiting for the condition that changed. As
such, only let a single thread wait on the event and use a GCond for all other
threads.

With this it is possible to e.g. have an UDP socket that is written to from
one thread and read from in another thread on Win32 too. On POSIX systems this
was working before already.
Comment 6 Sebastian Dröge (slomo) 2016-03-21 08:49:31 UTC
Any win32 person interested in reviewing this? :) Anybody missing in CC?
Comment 7 LRN 2016-03-21 10:35:18 UTC
Am i getting this correctly: you have multiple threads waiting for one socket endpoint to become ready (for something)? Not two different endpoints, the same endpoint?
Comment 8 Sebastian Dröge (slomo) 2016-03-21 11:28:34 UTC
One UDP socket. One thread is waiting for receiving and receiving data data, another thread is sending data. Data might be sent to a different destination than from where data is received but that doesn't matter.
Comment 9 LRN 2016-03-21 12:50:25 UTC
Just to be sure, i've googled it, and it seems that you should be able to do this for UDP sockets. I.e. just sending and receiving on the same udp endpoint from different threads is not illegal by itself.

I would advise to sync [sendto()+unset FD_WRITE], [recvfrom()+unset FD_READ] and [WSAEnumNetworkEvents()+set FD_READ and/or FD_WRITE]. All three are asynchronous (so you won't get locked if one of them blocks, because they don't block) and all three modify the FD flags. Can't say how much will affect the performance; hopefully, not much.
WSAWaitForMultipleEvents() itself does not modify any state and you don't need to sync it because of *that*, but, as you have found out, if called from multiple threads, it will return only to one thread (this does not seem to be documented anywhere, but if that is what you get, then that is how it works), so it must be synced independently too. Since you probably also want to only allow one thread to call WSAEnumNetworkEvents() after a wait is over, it means that you need two locks - one for read/write/enum and one for wait.

You first lock the wait lock, then wait, then lock the read/write/enum lock, then enum, then unlock read/write/enum lock, then unlock the wait lock. It's possible that other threads will call read/write in the gap between you returning from wait() and you locking the read/write/enum lock, but that should not affect anything (at worst, enum() will tell you that nothing is ready, despite the fact that wait() returned; so you'll just go on and do whatever else you have to do, then wait again).

That is the result of my googling and reading the docs.

FD_CONNECT, FD_ACCEPT and FD_CLOSE probably require separate consideration, but i don't think they will alter the picture.

Hope this is useful.
Comment 10 Sebastian Dröge (slomo) 2016-03-21 12:57:01 UTC
(In reply to LRN from comment #9)
> Just to be sure, i've googled it, and it seems that you should be able to do
> this for UDP sockets. I.e. just sending and receiving on the same udp
> endpoint from different threads is not illegal by itself.

Yes, it's just that the Windows socket event machinery in GIO seems completely racy :) There are some obvious problems, some less obvious and my fix that might be going in the right direction or is completely wrong ;)

> I would advise to sync [sendto()+unset FD_WRITE], [recvfrom()+unset FD_READ]
> and [WSAEnumNetworkEvents()+set FD_READ and/or FD_WRITE]. All three are
> asynchronous (so you won't get locked if one of them blocks, because they
> don't block) and all three modify the FD flags. Can't say how much will
> affect the performance; hopefully, not much.
> WSAWaitForMultipleEvents() itself does not modify any state and you don't
> need to sync it because of *that*, but, as you have found out, if called
> from multiple threads, it will return only to one thread (this does not seem
> to be documented anywhere, but if that is what you get, then that is how it
> works), so it must be synced independently too. Since you probably also want
> to only allow one thread to call WSAEnumNetworkEvents() after a wait is
> over, it means that you need two locks - one for read/write/enum and one for
> wait.
> 
> You first lock the wait lock, then wait, then lock the read/write/enum lock,
> then enum, then unlock read/write/enum lock, then unlock the wait lock. It's
> possible that other threads will call read/write in the gap between you
> returning from wait() and you locking the read/write/enum lock, but that
> should not affect anything (at worst, enum() will tell you that nothing is
> ready, despite the fact that wait() returned; so you'll just go on and do
> whatever else you have to do, then wait again).

That would require bigger rewriting of the code, but seems sensible to me. I went with the solution that required least changes though.

Depends on what would be considered mergeable here. I can implement that then if needed.


> That is the result of my googling and reading the docs.
> 
> FD_CONNECT, FD_ACCEPT and FD_CLOSE probably require separate consideration,
> but i don't think they will alter the picture.
> 
> Hope this is useful.

Thanks!
Comment 11 Sebastian Dröge (slomo) 2016-07-01 12:08:05 UTC
Any other comments? Would be good to get some feedback from a GLib developer so something mergeable can be produced
Comment 12 Ignacio Casal Quinteiro (nacho) 2016-07-04 14:21:36 UTC
One thing, did you run the unit tests also the ones from glib-networking and libsoup?
Comment 13 Sebastian Dröge (slomo) 2016-07-04 15:19:34 UTC
I didn't. Are they expected to all succeed on Windows currently?
Comment 14 Ignacio Casal Quinteiro (nacho) 2016-07-04 15:21:42 UTC
By myself I tried glib-networking, libsoup and some of the socket ones in glib. See though that I tried the openssl backend of glib-networking and not gnutls.
Comment 15 Ignacio Casal Quinteiro (nacho) 2016-08-24 12:43:41 UTC
Hey any news on this?
Comment 16 Sebastian Dröge (slomo) 2016-08-30 15:55:25 UTC
glib-networking tls/connection test is segfaulting for me without my patch, and the glib tests can't be built because of "undefined reference to __imp_pcre_version" when linking tests/regex.c.
Comment 17 Ignacio Casal Quinteiro (nacho) 2016-08-31 06:59:58 UTC
Review of attachment 321780 [details] [review]:

Let's go for it. If you feel like implementing the suggestion from LRN it would be also great.
Comment 18 Sebastian Dröge (slomo) 2016-12-05 10:29:25 UTC
Created attachment 341390 [details] [review]
GSocket: Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket

WSAWaitForMultipleEvents() only returns for one of the waiting threads, and
that one might not even be the one waiting for the condition that changed. As
such, only let a single thread wait on the event and use a GCond for all other
threads.

With this it is possible to e.g. have an UDP socket that is written to from
one thread and read from in another thread on Win32 too. On POSIX systems this
was working before already.
Comment 19 Sebastian Dröge (slomo) 2016-12-05 10:32:47 UTC
Rebased against latest GIT. Good to go still?
Comment 20 Sebastian Dröge (slomo) 2016-12-05 10:40:49 UTC
Created attachment 341391 [details] [review]
GSocket: Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket

WSAWaitForMultipleEvents() only returns for one of the waiting threads, and
that one might not even be the one waiting for the condition that changed. As
such, only let a single thread wait on the event and use a GCond for all other
threads.

With this it is possible to e.g. have an UDP socket that is written to from
one thread and read from in another thread on Win32 too. On POSIX systems this
was working before already.
Comment 21 Sebastian Dröge (slomo) 2016-12-05 16:18:32 UTC
<nacho> slomo, anyway let's get in your patch

Attachment 341391 [details] pushed as 799f8dc - GSocket: Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket
Comment 22 Sebastian Dröge (slomo) 2016-12-08 16:32:14 UTC
*** Bug 750728 has been marked as a duplicate of this bug. ***
Comment 23 Ignacio Casal Quinteiro (nacho) 2017-05-11 15:59:37 UTC
We have just tested this with a real application making extensive work of GSocket with async GTaks etc and this patch seems to have broken it. I am reverting it on the 2.52 branch and leaving it for now in master. We should research a bit more why this breaks when using GTasks.
Comment 24 Sebastian Dröge (slomo) 2017-05-11 16:14:49 UTC
Do you have a testcase I could run to reproduce it? Our use case is only using synchronous APIs, but writing to and reading from the same socket happens in different threads (and there the state tracking gets messed up then).
Comment 25 Ignacio Casal Quinteiro (nacho) 2017-05-11 16:17:34 UTC
When I get some time I'll try to come up with a test case
Comment 26 Ignacio Casal Quinteiro (nacho) 2017-06-02 09:44:20 UTC
Sorry for the delay, I've finally found some time to check this out. The following test case from libsoup reproduces the problem:

https://git.gnome.org/browse/libsoup/tree/tests/websocket-test.c

Just launch it and you will see that it will block forever. Instead without your patch all the tests pass without issues.
Comment 27 Ignacio Casal Quinteiro (nacho) 2017-07-12 08:43:59 UTC
Any news on this?
Comment 28 Sebastian Dröge (slomo) 2017-07-12 08:53:41 UTC
No, sorry. I'm very busy with other things. We'll continue to ship this patch together with the GStreamer binaries for Windows in any case, as without the patch gst-rtsp-server is unusable on Windows.

Hope to find some time for looking into this
Comment 29 Michael Catanzaro 2018-03-21 00:22:17 UTC
Adjusting component to gio in preparation for GitLab migration
Comment 30 GNOME Infrastructure Team 2018-05-24 18:35:12 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/1138.