GNOME Bugzilla – Bug 616458
GSocketInputStream's read_async() method occasionally blocks
Last modified: 2010-08-15 19:36:10 UTC
This only happens if the socket is blocking mode. See stack trace in [1]. According to [2] it looks like the socket is actually closed when this happens but that just may be a gdb problem. I think the way GSocketInputStream should work is that read_sync() should always block and read_async() should never block. Ditto for GOutputStream too. [1] :
+ Trace 221505
Thread 2 (Thread 0x7fb84ec25710 (LWP 6234))
$1 = 1
Yes, it is currently assumed that you only use sync ops on blocking sockets and async ops on non-blocking sockets. Originally this would have been hard to fix, but I guess given that all sockets are non-blocking underneath now, there's no theoretical reason it couldn't be fixed now... we'd just need to add new variants of g_socket_send/g_socket_receive that let you override the blocking mode. I assume your problem is that you need to create a single socket, but then the caller needs to be able to implement both synchronous and asynchronous APIs with it?
(In reply to comment #1) > Yes, it is currently assumed that you only use sync ops on blocking sockets and > async ops on non-blocking sockets. Originally this would have been hard to fix, > but I guess given that all sockets are non-blocking underneath now, there's no > theoretical reason it couldn't be fixed now... we'd just need to add new > variants of g_socket_send/g_socket_receive that let you override the blocking > mode. Cool. Sounds great. > I assume your problem is that you need to create a single socket, but then the > caller needs to be able to implement both synchronous and asynchronous APIs > with it? Right. This is for GDBus. I used to just use g_socket_receive() and a GSource from GSocket but for abstraction reasons the transport interface is now on the GIOStream with the rules that sync IO must block and async IO must not block. This abstraction has been chosen so it is relatively easy to use GDBus with a GSocketConnection or a GTlsSocketConnection or a XMPP tube or whatever the user passes in. The way GDBus works is that sync IO is done for authentication and async IO is done from a dedicated worker thread shared by all GDBusConnection instances: http://cgit.freedesktop.org/~david/gdbus-standalone/tree/gdbus/gdbusprivate.c#n154 Basically the worker thread keeps calling read_async(). It works great with the work around mentioned in comment 0 - e.g. I now set the socket non-blocking right after authentication is done: http://cgit.freedesktop.org/~david/gdbus-standalone/tree/gdbus/gdbusconnection.c#n1465 I could remove all synchronous IO by rewriting the authentication stuff but I really don't want to do that (it normally just runs in a worker thread on connection initialization).
Created attachment 159609 [details] [review] GSocketInput/OutputStream: allow both sync and async methods on same socket This makes sync socket stream calls work on non-blocking sockets, but not vice versa (async stream calls on blocking sockets), since that's trickier given the existing APIs, and we really only *need* one of the two cases to work anyway.
Created attachment 159610 [details] [review] GSocketClient: when connecting async, leave the socket non-blocking Previously the code switched the socket back to blocking mode after connecting, which makes no sense; if you did an async connect, you most likely want to use async stream methods too, so the socket needs to be non-blocking. (This is not strictly necessary, but is definitely nicer. In IRC discussion, Benjamin felt the current behavior was definitely a bug, and anyone who was depending on it deserves to lose. I'm less sure, but I figure we can commit it to unstable and see if it breaks anyone...)
Review of attachment 159609 [details] [review]: - GInputStream should support non-blocking socket, but I think your implementation for non-blocking socket just works with blocking socket, so why check. - Since we use threads, blocking socket does not prevent async read/write, but may prevent cancellability of you call on certain OS. ::: gio/gsocketconnection.c @@ +54,3 @@ + * output streams if the underlying socket is #GSocket:blocking. + * However, if the socket is non-blocking, you can use either + * synchronous or asynchronous methods on its streams. Implementation of GSocketInputStream makes this comment inexact since it uses g_simple_async_result_run_in_thread() which is asynchronous. Maybe you wanted say that it prevents cancellable read and write ? Which is true for sync read too, since there is no posix way of exiting a blocking IO iirc. ::: gio/gsocketinputstream.c @@ -107,0 +107,22 @@ +g_socket_input_stream_block_and_read (GSocketInputStream *input_stream, + void *buffer, + gsize count, ... 19 more ... This is wrong. If g_socket_condition_wait() can return would block it means we can being active wait (and use 100% of one CPU!). Base on other usage of this function, you can just call it once and expect errors to be errors. @@ -113,3 +141,3 @@ GSocketInputStream *input_stream = G_SOCKET_INPUT_STREAM (stream); - return g_socket_receive (input_stream->priv->socket, buffer, count, + if (!g_socket_get_blocking (input_stream->priv->socket)) Why do we check this ? g_socket_input_stream_block_and_read() should works find with blocking socket too. ::: gio/gsocketoutputstream.c @@ -108,0 +108,24 @@ + +static gssize +g_socket_output_stream_block_and_write (GSocketOutputStream *output_stream, ... 21 more ... Same here, g_socket_condition_wait() cannot return G_IO_ERROR_WOULD_BLOCK and if it does, we may use 100% CPU here. @@ -117,1 +146,1 @@ - return g_socket_send (onput_stream->priv->socket, buffer, count, + if (!g_socket_get_blocking (output_stream->priv->socket)) Same as input, we can just call g_socket_output_stream_block_and_write() witch both blocking and non-blocking socket.
Review of attachment 159609 [details] [review]: - GInputStream should support non-blocking socket, but I think your implementation for non-blocking socket just works with blocking socket, so why check. - Since we use threads, blocking socket does not prevent async read/write, but may prevent cancellability of you call on certain OS. ::: gio/gsocketconnection.c @@ +54,3 @@ + * output streams if the underlying socket is #GSocket:blocking. + * However, if the socket is non-blocking, you can use either + * synchronous or asynchronous methods on its streams. Implementation of GSocketInputStream makes this comment inexact since it uses g_simple_async_result_run_in_thread() which is asynchronous. Maybe you wanted say that it prevents cancellable read and write ? Which is true for sync read too, since there is no posix way of exiting a blocking IO iirc. ::: gio/gsocketinputstream.c @@ -107,0 +107,22 @@ +g_socket_input_stream_block_and_read (GSocketInputStream *input_stream, + void *buffer, + gsize count, ... 19 more ... This is wrong. If g_socket_condition_wait() can return would block it means we can being active wait (and use 100% of one CPU!). Base on other usage of this function, you can just call it once and expect errors to be errors. @@ -113,3 +141,3 @@ GSocketInputStream *input_stream = G_SOCKET_INPUT_STREAM (stream); - return g_socket_receive (input_stream->priv->socket, buffer, count, + if (!g_socket_get_blocking (input_stream->priv->socket)) Why do we check this ? g_socket_input_stream_block_and_read() should works find with blocking socket too. ::: gio/gsocketoutputstream.c @@ -108,0 +108,24 @@ + +g_socket_output_stream_block_and_write (GSocketOutputStream *output_stream, + const void *buffer, ... 21 more ... Same here, g_socket_condition_wait() cannot return G_IO_ERROR_WOULD_BLOCK and if it does, we may use 100% CPU here. @@ -117,1 +146,1 @@ - return g_socket_send (onput_stream->priv->socket, buffer, count, + if (!g_socket_get_blocking (output_stream->priv->socket)) Same as input, we can just call g_socket_output_stream_block_and_write() witch both blocking and non-blocking socket.
Sorry for spam, I just don't know why I get 4 times the previous post. Again sorry.
(In reply to comment #5) > Review of attachment 159609 [details] [review]: > > - GInputStream should support non-blocking socket, but I think your > implementation for non-blocking socket just works with blocking socket, so why > check. OK, please note in the following that I'm carefully distinguishing between (a) a GSocket and its underlying file descriptor, (b) the GSocket "blocking" property and the file descriptor O_NONBLOCK flag, and (c) G_IO_ERROR_WOULD_BLOCK and EAGAIN. All GSockets, regardless of their "blocking" property, use file descriptors that are set to O_NONBLOCK. The difference in behavior between a socket with the "blocking" property set TRUE and one with the "blocking" property set FALSE is handled entirely within the GSocket code; if "blocking" is set TRUE, then any time an underlying operation returns EAGAIN, GSocket just calls g_socket_condition_wait(), which uses poll(2) on the file descriptor (and the GCancellable's file descriptor) to wait until it's readable/writable, and then tries again. Thus: if you have a GSocket with the "blocking" property set to FALSE, you will get back G_IO_ERROR_WOULD_BLOCK when it's not ready, but you can simulate the behavior of a "blocking" = TRUE GSocket by calling g_socket_condition_wait() at that point and trying again. OTOH, if you have a GSocket with the "blocking" property set to TRUE, you will never get back G_IO_ERROR_WOULD_BLOCK from any GSocket operation, and so you cannot do asynchronous ops on it (unless you do them synchronously from another thread). > + * output streams if the underlying socket is #GSocket:blocking. > + * However, if the socket is non-blocking, you can use either > + * synchronous or asynchronous methods on its streams. > > Implementation of GSocketInputStream makes this comment inexact since it uses > g_simple_async_result_run_in_thread() which is asynchronous. Where is g_simple_async_result_run_in_thread() being used? > Maybe you wanted say that it prevents cancellable read and write ? Which is > true for sync read too, since there is no posix way of exiting a blocking IO > iirc. As should be clear now from the above description, we never do any blocking POSIX I/O (other than poll, but in that case we're polling on both the socket fd and the cancellable fd together). > This is wrong. If g_socket_condition_wait() can return would block it means we > can being active wait (and use 100% of one CPU!). g_socket_condition_wait() cannot return G_IO_ERROR_WOULD_BLOCK, but g_socket_receive() *can*. See the g_socket_send() documentation: * Note though that you may still receive * %G_IO_ERROR_WOULD_BLOCK from g_socket_send() even if you were previously * notified of a %G_IO_OUT condition. (On Windows in particular, this is * very common due to the way the underlying APIs work.) So there's no chance of spinning here: sometimes, on Windows at least, g_socket_condition_wait() will immediately return TRUE, but then g_socket_send() will return G_IO_ERROR_WOULD_BLOCK. Calling g_socket_condition_wait() a second time will then block until the socket is actually ready, at which point g_socket_send() will succeed (or fail with an error other than G_IO_ERROR_WOULD_BLOCK). (It's not clear to me that the receiving case needs to do the same check, but it can't hurt.) > Base on other usage of this function, you can just call it once and expect > errors to be errors. The other usage is wrong. That's bug 603309. > + if (!g_socket_get_blocking (input_stream->priv->socket)) > > Why do we check this ? g_socket_input_stream_block_and_read() should works find > with blocking socket too. Yeah... it's just a bit redundant since both the stream and the socket will check its readability first. I guess it's fine though. As noted above, this patch makes it possible to do synchronous GSocketConnection ops on a non-"blocking" GSocket. I did it that way because that required no new API. But I think I agree that GSocketConnection should simply not care at all whether the underlying socket is marked "blocking" or not, so we should add GSocket API to let it just ignore the socket's "blocking" property.
(In reply to comment #11) > (In reply to comment #5) > property and the file descriptor O_NONBLOCK flag > All GSockets, regardless of their "blocking" property, use file descriptors > that are set to O_NONBLOCK. The difference in behavior between a socket with > the "blocking" property set TRUE and one with the "blocking" property set FALSE > is handled entirely within the GSocket code; if "blocking" is set TRUE, then > any time an underlying operation returns EAGAIN, GSocket just calls > g_socket_condition_wait(), which uses poll(2) on the file descriptor (and the > GCancellable's file descriptor) to wait until it's readable/writable, and then > tries again. Ok I missed that. I'm convinced now.
Created attachment 164541 [details] [review] Always do async vs sync correctly in GSocketConnection streams This fixes the problem for both sync and async by adding g_socket_receive_with_blocking() and g_socket_send_with_blocking(), which allow you to specify a blocking flag that overrides socket->priv->blocking. This seemed less lame than wrapping all the calls in a pair of g_socket_set_blocking() calls. As compared to the earlier patch, this also requires no changes to GSocketClient. We can just leave the sockets it creates with "blocking" always set to TRUE, and it just doesn't matter for GSocketConnection's purposes.
Side note, this bug occurs on OSX too.
Attachment 164541 [details] pushed as 547311b - Always do async vs sync correctly in GSocketConnection streams