GNOME Bugzilla – Bug 679527
SoupSessionAsync hung after receiving NULL headers
Last modified: 2012-07-19 13:08:21 UTC
Created attachment 218193 [details] full log of gdb session I'm not really sure how this happened. I was running 'ostree-pull' and after a while it got stuck: 10183 loose files to fetch: Requesting /repo/objects/5c/32ef5b5daf7d2cd97173abcc47e70f29b36d12c4a6f90a3f58b49f6098c18d.file [ 913726.2 KiB downloaded ] 10183 loose files to fetch: Requesting /repo/objects/5c/32ef5b5daf7d2cd97173abcc47e70f29b36d12c4a6f90a3f58b49f6098c18d.file [ 913726.2 KiB downloaded ] 10183 loose files to fetch: Requesting /repo/objects/5c/32ef5b5daf7d2cd97173abcc47e70f29b36d12c4a6f90a3f58b49f6098c18d.file [ 913726.2 KiB downloaded ] ... I poked around in gdb for a while (log attached, but it is fairly long) to see what was going on. This is the message at the head of the SoupSessionAsync message queue: $90 = {parent = {g_type_instance = {g_class = 0x1526ba0}, ref_count = 4, qdata = 0x0}, method = 0x7f9b06ae977a "GET", status_code = 8, reason_phrase = 0x76e6a10 "Message Corrupt", request_body = 0x76f7b40, request_headers = 0x770de00, response_body = 0x770f580, response_headers = 0x770f6c0} This is 'message->priv': $92 = {io_data = 0x1536ce0, chunk_allocator = 0, chunk_allocator_data = 0x0, chunk_allocator_dnotify = 0, msg_flags = 0, server_side = 0, sniffer = 0x0, bytes_for_sniffing = 0, http_version = SOUP_HTTP_1_1, orig_http_version = SOUP_HTTP_1_1, uri = 0x76fba00, addr = 0x0, auth = 0x0, proxy_auth = 0x0, disabled_features = 0x0, decoders = 0x0, first_party = 0x0, tls_certificate = 0x0, tls_errors = 0} This is 'message->priv->io_data': $93 = {item = 0x152a290, mode = SOUP_MESSAGE_IO_CLIENT, cancellable = 0x76db470, iostream = 0x6a8e800, istream = 0x7661ea0, body_istream = 0x0, ostream = 0x67a8030, body_ostream = 0x0, async_context = 0x151e940, blocking = 0, read_state = SOUP_MESSAGE_IO_STATE_HEADERS, read_encoding = SOUP_ENCODING_UNRECOGNIZED, read_header_buf = 0x770c4c0, read_body = 0x770f580, read_length = 0, write_state = SOUP_MESSAGE_IO_STATE_DONE, write_encoding = SOUP_ENCODING_NONE, write_buf = 0x1fa4640, write_body = 0x76f7b40, write_chunk = 0x0, write_body_offset = 0, write_length = 0, written = 0, io_source = 0x0, unpause_source = 0x0, paused = 0, get_headers_cb = 0x7f9b06acd220 <get_request_headers>, parse_headers_cb = 0x7f9b06acd040 <parse_response_headers>, header_data = 0x152a290, completion_cb = 0x7f9b06adccf0 <message_completed>, completion_data = 0x152a290} This is 'message->priv->io_data->read_header_buf': $95 = {data = 0x7725810 "p\bq\a", len = 0} So it looks like in soup-message-io.c:io_read() we somehow received an empty set of headers from the socket (or something else went wrong along the way) and crucially, we then failed to propagate the error and left the queue paused waiting for further data. I fixed the error by manually changing item->state from SOUP_MESSAGE_RUNNING to SOUP_MESSAGE_FINISHING and then calling run_queue(). I'm not sure if the message in question was retried correctly, but at least processing the queue restarted and ostree seemed happy enough. libsoup from git, head: 804281853d8a683bfdb61310827c16101fe8414b
After tracing from the beginning, it looks like this might be an ostree issue after all, because it seems to be ignoring an earlier error.
(In reply to comment #1) > After tracing from the beginning, it looks like this might be an ostree issue > after all, because it seems to be ignoring an earlier error. ah, how so? (In reply to comment #0) > This is 'message->priv->io_data->read_header_buf': > > $95 = {data = 0x7725810 "p\bq\a", len = 0} > > > So it looks like in soup-message-io.c:io_read() we somehow received an empty > set of headers from the socket no; read_header_buf gets truncated back to 0 after the headers are processed, so if msg->status_code is non-0, then it's expected that read_header_buf is empty.
Try git master... the code to retry when the server unexpectedly closes a persistent connection wasn't getting run in the SoupRequest codepaths, which might have caused hangs.
More fixes on master; the "wikipedia pages load without css" bug is fixed. Not sure about ostree, but try it.
I'm now getting "Connection terminated unexpectedly" coming up a lot, but haven't seen a single hang. I'm hoping that this is your fix in action.
that sounds related to the change on Friday. of course, it's *supposed* to retry, not just fail... i guess i need to start playing with ostree
It did retry, and I finally managed a successful pull :) Thanks for your help here! I'm happy to close the bug if you are.
I mean, it should retry without you ever seeing any error. Or is the "connection unexpectedly terminated" just something you saw in strace or something like that?
I see the error with a breakpoint on g_set_error() and g_set_error_literal() However, OSTree has very little in the way of error handling at this point, and previously it did actually report the error and fail. However, now although the error gets set at some point we seem to retry again, so I presume it's being handled internally.
ah, great. it's working then.