GNOME Bugzilla – Bug 762283
GSocket – Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket
Last modified: 2018-05-24 18:35:12 UTC
See commit message. We need this in GStreamer on Win32, otherwise applications that work just fine on POSIX systems are just waiting forever for a socket to be readable or writable.
Created attachment 321605 [details] [review] GSocket – Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket WSAWaitForMultipleEvents() only returns for one of the waiting threads, and that one might not even be the one waiting for the condition that changed. As such, only let a single thread wait on the event and use a GCond for all other threads. With this it is possible to e.g. have an UDP socket that is written to from one thread and read from in another thread on Win32 too. On POSIX systems this was working before already.
Main question here is if the moving of the flag unsetting is correct. The problem with unsetting the flags *after* the operation returned WOULDBLOCK (as it was before) is that between the operation returning and unsetting the flag, another thread might have already been woken up from the WSAWaitForMultipleEvents and set the flag... which then would be unset again and never ever be set again.
(In reply to Sebastian Dröge (slomo) from comment #2) > Main question here is if the moving of the flag unsetting is correct. I'm not really an expert on the win32-specific socket code... getting review from a win32 hacker here would be good. >+ g_cond_signal (&socket->priv->win32_source_cond); This should be g_cond_broadcast() shouldn't it?
(In reply to Dan Winship from comment #3) > (In reply to Sebastian Dröge (slomo) from comment #2) > > Main question here is if the moving of the flag unsetting is correct. > > I'm not really an expert on the win32-specific socket code... getting review > from a win32 hacker here would be good. Who would be a good person to put in CC for that? :) > >+ g_cond_signal (&socket->priv->win32_source_cond); > > This should be g_cond_broadcast() shouldn't it? Indeed, I wonder why this didn't cause problems in my testcase. Thanks!
Created attachment 321780 [details] [review] GSocket – Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket WSAWaitForMultipleEvents() only returns for one of the waiting threads, and that one might not even be the one waiting for the condition that changed. As such, only let a single thread wait on the event and use a GCond for all other threads. With this it is possible to e.g. have an UDP socket that is written to from one thread and read from in another thread on Win32 too. On POSIX systems this was working before already.
Any win32 person interested in reviewing this? :) Anybody missing in CC?
Am i getting this correctly: you have multiple threads waiting for one socket endpoint to become ready (for something)? Not two different endpoints, the same endpoint?
One UDP socket. One thread is waiting for receiving and receiving data data, another thread is sending data. Data might be sent to a different destination than from where data is received but that doesn't matter.
Just to be sure, i've googled it, and it seems that you should be able to do this for UDP sockets. I.e. just sending and receiving on the same udp endpoint from different threads is not illegal by itself. I would advise to sync [sendto()+unset FD_WRITE], [recvfrom()+unset FD_READ] and [WSAEnumNetworkEvents()+set FD_READ and/or FD_WRITE]. All three are asynchronous (so you won't get locked if one of them blocks, because they don't block) and all three modify the FD flags. Can't say how much will affect the performance; hopefully, not much. WSAWaitForMultipleEvents() itself does not modify any state and you don't need to sync it because of *that*, but, as you have found out, if called from multiple threads, it will return only to one thread (this does not seem to be documented anywhere, but if that is what you get, then that is how it works), so it must be synced independently too. Since you probably also want to only allow one thread to call WSAEnumNetworkEvents() after a wait is over, it means that you need two locks - one for read/write/enum and one for wait. You first lock the wait lock, then wait, then lock the read/write/enum lock, then enum, then unlock read/write/enum lock, then unlock the wait lock. It's possible that other threads will call read/write in the gap between you returning from wait() and you locking the read/write/enum lock, but that should not affect anything (at worst, enum() will tell you that nothing is ready, despite the fact that wait() returned; so you'll just go on and do whatever else you have to do, then wait again). That is the result of my googling and reading the docs. FD_CONNECT, FD_ACCEPT and FD_CLOSE probably require separate consideration, but i don't think they will alter the picture. Hope this is useful.
(In reply to LRN from comment #9) > Just to be sure, i've googled it, and it seems that you should be able to do > this for UDP sockets. I.e. just sending and receiving on the same udp > endpoint from different threads is not illegal by itself. Yes, it's just that the Windows socket event machinery in GIO seems completely racy :) There are some obvious problems, some less obvious and my fix that might be going in the right direction or is completely wrong ;) > I would advise to sync [sendto()+unset FD_WRITE], [recvfrom()+unset FD_READ] > and [WSAEnumNetworkEvents()+set FD_READ and/or FD_WRITE]. All three are > asynchronous (so you won't get locked if one of them blocks, because they > don't block) and all three modify the FD flags. Can't say how much will > affect the performance; hopefully, not much. > WSAWaitForMultipleEvents() itself does not modify any state and you don't > need to sync it because of *that*, but, as you have found out, if called > from multiple threads, it will return only to one thread (this does not seem > to be documented anywhere, but if that is what you get, then that is how it > works), so it must be synced independently too. Since you probably also want > to only allow one thread to call WSAEnumNetworkEvents() after a wait is > over, it means that you need two locks - one for read/write/enum and one for > wait. > > You first lock the wait lock, then wait, then lock the read/write/enum lock, > then enum, then unlock read/write/enum lock, then unlock the wait lock. It's > possible that other threads will call read/write in the gap between you > returning from wait() and you locking the read/write/enum lock, but that > should not affect anything (at worst, enum() will tell you that nothing is > ready, despite the fact that wait() returned; so you'll just go on and do > whatever else you have to do, then wait again). That would require bigger rewriting of the code, but seems sensible to me. I went with the solution that required least changes though. Depends on what would be considered mergeable here. I can implement that then if needed. > That is the result of my googling and reading the docs. > > FD_CONNECT, FD_ACCEPT and FD_CLOSE probably require separate consideration, > but i don't think they will alter the picture. > > Hope this is useful. Thanks!
Any other comments? Would be good to get some feedback from a GLib developer so something mergeable can be produced
One thing, did you run the unit tests also the ones from glib-networking and libsoup?
I didn't. Are they expected to all succeed on Windows currently?
By myself I tried glib-networking, libsoup and some of the socket ones in glib. See though that I tried the openssl backend of glib-networking and not gnutls.
Hey any news on this?
glib-networking tls/connection test is segfaulting for me without my patch, and the glib tests can't be built because of "undefined reference to __imp_pcre_version" when linking tests/regex.c.
Review of attachment 321780 [details] [review]: Let's go for it. If you feel like implementing the suggestion from LRN it would be also great.
Created attachment 341390 [details] [review] GSocket: Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket WSAWaitForMultipleEvents() only returns for one of the waiting threads, and that one might not even be the one waiting for the condition that changed. As such, only let a single thread wait on the event and use a GCond for all other threads. With this it is possible to e.g. have an UDP socket that is written to from one thread and read from in another thread on Win32 too. On POSIX systems this was working before already.
Rebased against latest GIT. Good to go still?
Created attachment 341391 [details] [review] GSocket: Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket WSAWaitForMultipleEvents() only returns for one of the waiting threads, and that one might not even be the one waiting for the condition that changed. As such, only let a single thread wait on the event and use a GCond for all other threads. With this it is possible to e.g. have an UDP socket that is written to from one thread and read from in another thread on Win32 too. On POSIX systems this was working before already.
<nacho> slomo, anyway let's get in your patch Attachment 341391 [details] pushed as 799f8dc - GSocket: Fix race conditions on Win32 if multiple threads are waiting on conditions for the same socket
*** Bug 750728 has been marked as a duplicate of this bug. ***
We have just tested this with a real application making extensive work of GSocket with async GTaks etc and this patch seems to have broken it. I am reverting it on the 2.52 branch and leaving it for now in master. We should research a bit more why this breaks when using GTasks.
Do you have a testcase I could run to reproduce it? Our use case is only using synchronous APIs, but writing to and reading from the same socket happens in different threads (and there the state tracking gets messed up then).
When I get some time I'll try to come up with a test case
Sorry for the delay, I've finally found some time to check this out. The following test case from libsoup reproduces the problem: https://git.gnome.org/browse/libsoup/tree/tests/websocket-test.c Just launch it and you will see that it will block forever. Instead without your patch all the tests pass without issues.
Any news on this?
No, sorry. I'm very busy with other things. We'll continue to ship this patch together with the GStreamer binaries for Windows in any case, as without the patch gst-rtsp-server is unusable on Windows. Hope to find some time for looking into this
Adjusting component to gio in preparation for GitLab migration
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/1138.