GNOME Bugzilla – Bug 548278
Async GETs connections are always terminated unexpectedly on windows
Last modified: 2008-08-19 23:51:55 UTC
Hi, I have been experimenting with libsoup on windows for my Open street map GPS widget [0]. Unfortunately it does not work. All (90%) of GETS queued with an async session return abruptly with the following Error downloading tile: 7 - Connection terminated unexpectedly Requeing the message does not seem efficient considering the high frequency this error is seen. The error does not exist on linux. I have tried both 2.4.x and 2.23.x from [1]. I am using Windows XP SP3 and the latest stable released of msys, mingw, gtk+ and gobject. The attached main.c file contains a minimal testcase at the top. It was pulled from [0] [0]svn co http://open.grcnz.com/svn/albatross/branches/groundstation/osm-gps-map osm-gps-map [1]http://ftp.gnome.org/pub/GNOME/binaries/win32/libsoup/
Created attachment 116863 [details] test case at top of file /* RELEVANT TEST CASE CODE */ #include <glib.h> #include <libsoup/soup.h> static void osm_gps_map_tile_download_complete (SoupSession *session, SoupMessage *msg, char *uri) { if (SOUP_STATUS_IS_SUCCESSFUL (msg->status_code)) g_debug("ok: %s", uri); else g_warning("Error downloading tile: %d - %s", msg->status_code, msg->reason_phrase); } int main(int argc, char **argv) { int i; SoupSession *soup_session; g_thread_init(NULL); g_type_init(); gtk_init (&argc, &argv); soup_session = soup_session_async_new(); for (i = 1; i < 9; i++) { char *c = g_strdup_printf("http://tile.openstreetmap.org/3/4/%d.png", i); SoupMessage *msg = soup_message_new ("GET", c); soup_session_queue_message(soup_session, msg, osm_gps_map_tile_download_complete, c); } gtk_main(); }
Further investigations reveal that even when messages do finish successfully, they do not contain the correct response size, i.e the message body has been truncated, and writing them to pixmaps results in corrupt incomplete images.
I don't currently have a Windows hacking environment set up, so I can't debug this easily myself. The bug is presumably caused by some difference in behavior between WinSock and the unix sockets API in how you check the success of an async connect(), in which case it would be something in soup_socket_connect_async(), socket_connect_internal(), or connect_watch() (all in soup-socket.c). Tor: IIRC, evolution mostly only uses synchronous connections, so it's possible that async has always been broken on Windows and we just never noticed. Or I may have broken it when I rearranged the code for 2.4.
I will look into this soonish. (Fwiw, in GLib 2.18, you will be able to use GLib's GIO functionality to download files from http also on Windows, without using libsoup. (It uses Microsoft's WinHttp API.) That code is quite new, though, and no asynchronous stuff is yet implemented on Windows.)
Forgot to point out, that it might well be that as long as this bug gets fixed, libsoup will still be a better choice for your needs than using the WinHttp-based code in GLib, of course.
Created attachment 116915 [details] [review] Suggested patch This patch to GLib seems to fix the issue. There aren't really that many simple test programs that could be used to verify that it doesn't break anything else. But at least it doesn't break the test case from Yu Kuan on the gtk-devel-list in May, which was what last caused some changes to the code in giowin32.c... What the patch does is to add one more state variable to the GIOChannel data for sockets to tell whether Windows has ever signalled the socket to be writable. If not, then the behaviour in the prepare method is different, the event for the channel is not set automatically. Another way to fix this bug was to handle WSAENOTCONN errors like WSAEWOULDBLOCK in the socket write method. That had the side-effect that the code effectively busy polled for writing, though. The write method was called repeatedly until the connect() succeeded. Not good. The root reason why a problem like this exist is that sockets are created and manipulated out of GLib's control, and the GLib code can't know all it really would need to know about their state. When creating a watch for a socket, the code doesn't even know if the socket has been connected yet, or whether the calling code has itself set the ocket in non-blocking mode and a connect() is pending when the watch is created, as in this bug's case. It would be ideal if some new API like GNet or whatever was incorporated into GLib, so that GLib for such "internally created" sockets could know for sure *everything* that is done to them. I think somebody is working on that, even. Maybe in GLib 2.20 then.
so should this bug be moved from libsoup to glib? > It would be ideal if some new API like GNet or whatever was > incorporated into GLib, so that GLib for such "internally created" sockets > could know for sure *everything* that is done to them. I think somebody is > working on that, even. Maybe in GLib 2.20 then. Not sure how actively it's being worked on, but the tracking bug is bug 515973.
Moving to glib and resolving, patch committed to trunk: 2008-08-20 Tor Lillqvist <tml@novell.com> Bug 324234 - Using g_io_add_watch_full() to wait for connect() to return on a non-blocking socket returns prematurely Bug 548278 - Async GETs connections are always terminated unexpectedly on Windows * glib/giowin32.c: Add one more state variable to the GIOWin32Channel struct, ever_writable. Initialise it to FALSE, set to TRUE when the WSAEventSelect() indicates FD_WRITE, and never reset to FALSE. Don't do the WSASetEvent() in g_io_win32_prepare() unless ever_writable is TRUE. Don't automatically indicate G_IO_OUT in g_io_win32_check() unless ever_writable is TRUE. This fixes the behaviour of the test case program in bug #548278, and the "Testcase for the spurious OUT event bug" in bug #324234. It also doesn't seem to break anything. Not that there is any exhaustive test suite... Add a comment with a list of bugs that are related to the code in this file.