GNOME Bugzilla – Bug 357674
2 serious bugs in giowin32.c making glib iochannels useless (g_io_add_watch_full()) + fix
Last modified: 2010-04-07 10:49:07 UTC
Please describe the problem: We are writing a modified client (http://warclient.byethost10.com/index.html) of Freeciv open source strategy game and wanted to use glib iochannels for asynchronous server list update. We had some code that worked perfect in linux but didn't work in windows. The glib version we used was 2.12.3. We used g_io_channel_win32_new_socket to create an iochannel from existing winsock socket which was set to nonblocking mode using ioctlsocket. Then we called connect to connect the socket. After that we used g_io_add_watch_full to set a callback for G_IO_ERR | G_IO_HUP | G_IO_OUT. We wanted to use glib iochannel for callbacks, data transfer was done by direct calls to winsock. The observed problems were: [1.] In windows, callback got called too early, before FC_CONNECT or FD_WRITE event was dispatched. The GCondition reported was G_IO_OUT. Subsequent calls to send failed with "Write error during send of metaserver request: A request to send or receive data was disallowed because the socket is not connected and (when sending on a datagram socket using a sendto call) no address was supplied.". The workaround was to try select in callback and return if socket was not writable. This way callback got called again until select reported socket writable. Download log from http://warclient.byethost10.com/phpBB2/files/finallog_151.zip . The important file is stdout. The socket to watch for is 1464. g_io_channel_win32_new_socket: sockfd=1464 g_io_win32_sock_create_watch: sock=1464 handle=0x5a4 condition={OUT|ERR|HUP} g_io_win32_prepare: WSAEventSelect(1464, 0x5a4, {WRITE|CONNECT|CLOSE} g_io_win32_check: WSAEnumNetworkEvents (1824, 0x71c) revents={} condition={IN|ERR|HUP} events={} g_io_win32_check: WSAEnumNetworkEvents (1464, 0x5a4) revents={} condition={OUT|ERR|HUP} events={} g_io_win32_dispatch: pollfd.revents=OUT condition=OUT|ERR|HUP result=OUT There we can see that we register for WRITE|CONNECT|CLOSE, but no events are received (in events={}) and g_io_win32_dispatch calls our callback with GCondition set to OUT (result=OUT). In this case, our callback should be called only: 1.) if socket is "valid", connect was called and if FD_CONNECT or FD_WRITE was received after we registered with WSAEventSelect (via glib) 2.) if iochannel was created on a socket that was already connected using winsock functions and last operation didn't result in WSAEWOULDBLOCK (those are send, sendto, WSASend, or WSASendTo). Then we should call the callback with OUT since the socket is writable. [2.] The 2nd observed problem was with receiving HUP GCondition in our callback. So our application wasn't notified connection was closed and couldnt respond appropriately. The HUP is not sent if events FD_READ and FD_CLOSE are dispatched at the same time. When glib g_io_win32_check discovers there are FD_READ and FD_CLOSE events, it dispatches only IN and not HUP. HUP works properly only if FD_CLOSE event is set after FD_READ was processed by g_io_win32_check. Then IN is dispatched first, and then HUP. The bad behavious is visible in the 1st stdout log. g_io_win32_check: WSAEnumNetworkEvents (1464, 0x59c) revents={IN} condition={IN|ERR|HUP} events={READ|CLOSE} g_io_win32_dispatch: pollfd.revents=IN condition=IN|ERR|HUP result=IN We see FD_READ and FD_CLOSE set in "events={READ|CLOSE}", but g_io_win32_dispatch returns only IN (look for the "result"). The good behaviour is visible in the 2nd stdout log from http://warclient.byethost10.com/phpBB2/files/logs2_111.zip g_io_win32_check: WSAEnumNetworkEvents (1464, 0x59c) revents={IN} condition={IN|ERR|HUP} events={READ} g_io_win32_dispatch: pollfd.revents=IN condition=IN|ERR|HUP result=IN g_io_win32_check: WSAEnumNetworkEvents (1816, 0x714) revents={} condition={IN|ERR|HUP} events={} g_io_win32_check: WSAEnumNetworkEvents (1464, 0x59c) revents={IN} condition={IN|ERR|HUP} events={CLOSE} g_io_win32_dispatch: pollfd.revents=HUP condition=IN|ERR|HUP result=HUP It should be noted that we use IOChannels for callbacks only, writing and reading is done directly via send, recv (winsock), write, read (linux). WSAGetLastError is used to detect errors when using winsock. FIX: These bugs were so serious, that we would have to abandon iochannels for windows completely and use WSA functions directly which would introduce mess in the code or use glib threads + blocking sockets. So I spent one afternoon studying the problem and giowin32.c looking for bugs and came with my own patch. The patch can be downloaded from here http://warclient.byethost10.com/phpBB2/files/glib_228.zip The patch is quite well documented, unclear lines can be discussed here, there is a reason for every line. It fixed both mentioned problems, and might fix the problems that gaim is having too. Some weird code that caused events to stop working totally was commented out. OUT will be dispatched properly even if FD_WRITE or FD_CONNECT isn't dispatched because it was dispatched before (this is safeguarded by using a select in g_io_channel_win32_new_socket, read MSDN, they are dispatched only once when u connect, and then FD_WRITE is dispatched only after reenabling call to send, sendto, WSASend, or WSASendTo) http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/wsaeventselect_2.asp I created 2nd bug report since I'm reporting 2 bugs in detail and also supply a fix. We didn't discover any side effects yet, but feel free to test the dlls directly by downloading them from http://warclient.byethost10.com/glib-2.12.3.zip.zip We use these patched dlls until there is an official fix in glib. Steps to reproduce: See above Actual results: See above Expected results: See above Does this happen every time? See above Other information: See above
FWIW, this patch seems to fix the problems that I reported in bug #324234.
Thanks for the excellect bug report. I will look into this soon.
That "warclient" machine doesn't seem to exist any longer, attempting to download the zip file with the patch goes to some site advertising web hosting. Would it be possible to attach the patch (as a diff file, no zips thanks) to this bug?
Created attachment 91260 [details] [review] same as above, just as normal patch > That "warclient" machine doesn't seem to exist any longer, attempting to > download the zip file with the patch goes to some site advertising web > hosting. Would it be possible to attach the patch (as a diff file, no > zips thanks) to this bug? I found the attached patch in my 'random patches' collection. I believe it might be the patch that was in the .zip file, but I'm not 100% sure (the topic might have come up on the mailing lists as well).
Created attachment 91287 [details] [review] Same patch, cleaned up and adapted to current giowin32.c This should be semantically the same as the original patch, but with some stylistic fixes, and adapted to the current giowin32.c after another fix I committed today. Does the original bug reporter have any minimal but complete test program to reproduce the problem he sees?
Will have to check to what extent the patch above overlaps with the one in bug #548278, it seems to at least partially be related to the same thing, and whether using the above patch also helps for that bug, and doesn't break the "Yu Kuan" test case.
A minimal but complete test case would really be useful. Then we could check if the error scenario this bug is about still occurs after the change today to giowin32.c in trunk. At least partially the change does the same thing as the patch above.
Hm, as we don't have the testcase, I am all for closing this bug. Tor, do you know which revision made the patch? Also, have you applied the patch from attachment 91287 [details] [review]? If so, could you mark it as committed, if not, mark as obsolete? Thanks :)
Tor, Ping.
The patch doesn't apply at all any more, so marking it as needs-work. I still don't know if corresponding logic actually is already present in the code. Too bad that the original bug reporter never provided any freestanding minimal test code.
Hello Unfortunately I don't have a testcase, I'm no longer involved in maintenance of warclient and I don't use glib. I would advise writing tests for each reported problem, even if they won't fail (maybe the problem was fixed). You can reach another developer of warclient at madeline.book@gmail.com . It would be best to ask current developers which version of glib they use with windows and if the problem still persists and if anything was done to workaround the problem. Originally I removed workarounds and delivered glib.dll with my patch.
FYI, contacted the email address provided in comment #11.
Hm. Given that there has been no activity, I'm closing this bug. If anybody still experiences this issue, please reopen.