After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 679527 - SoupSessionAsync hung after receiving NULL headers
SoupSessionAsync hung after receiving NULL headers
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: Misc
unspecified
Other Linux
: Normal major
: ---
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
Depends on:
Blocks:
 
 
Reported: 2012-07-06 17:11 UTC by Sam Thursfield
Modified: 2012-07-19 13:08 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
full log of gdb session (50.64 KB, text/plain)
2012-07-06 17:11 UTC, Sam Thursfield
Details

Description Sam Thursfield 2012-07-06 17:11:06 UTC
Created attachment 218193 [details]
full log of gdb session

I'm not really sure how this happened. I was running 'ostree-pull' and after a while it got stuck:

10183 loose files to fetch: Requesting /repo/objects/5c/32ef5b5daf7d2cd97173abcc47e70f29b36d12c4a6f90a3f58b49f6098c18d.file  [ 913726.2 KiB downloaded ]
10183 loose files to fetch: Requesting /repo/objects/5c/32ef5b5daf7d2cd97173abcc47e70f29b36d12c4a6f90a3f58b49f6098c18d.file  [ 913726.2 KiB downloaded ]
10183 loose files to fetch: Requesting /repo/objects/5c/32ef5b5daf7d2cd97173abcc47e70f29b36d12c4a6f90a3f58b49f6098c18d.file  [ 913726.2 KiB downloaded ]
...

I poked around in gdb for a while (log attached, but it is fairly long) to see what was going on.

This is the message at the head of the SoupSessionAsync message queue:

$90 = {parent = {g_type_instance = {g_class = 0x1526ba0}, ref_count = 4, 
    qdata = 0x0}, method = 0x7f9b06ae977a "GET", status_code = 8, 
  reason_phrase = 0x76e6a10 "Message Corrupt", request_body = 0x76f7b40, 
  request_headers = 0x770de00, response_body = 0x770f580, 
  response_headers = 0x770f6c0}

This is 'message->priv':

$92 = {io_data = 0x1536ce0, chunk_allocator = 0, chunk_allocator_data = 0x0, 
  chunk_allocator_dnotify = 0, msg_flags = 0, server_side = 0, sniffer = 0x0, 
  bytes_for_sniffing = 0, http_version = SOUP_HTTP_1_1, 
  orig_http_version = SOUP_HTTP_1_1, uri = 0x76fba00, addr = 0x0, auth = 0x0, 
  proxy_auth = 0x0, disabled_features = 0x0, decoders = 0x0, 
  first_party = 0x0, tls_certificate = 0x0, tls_errors = 0}

This is 'message->priv->io_data':

$93 = {item = 0x152a290, mode = SOUP_MESSAGE_IO_CLIENT, 
  cancellable = 0x76db470, iostream = 0x6a8e800, istream = 0x7661ea0, 
  body_istream = 0x0, ostream = 0x67a8030, body_ostream = 0x0, 
  async_context = 0x151e940, blocking = 0, 
  read_state = SOUP_MESSAGE_IO_STATE_HEADERS, 
  read_encoding = SOUP_ENCODING_UNRECOGNIZED, read_header_buf = 0x770c4c0, 
  read_body = 0x770f580, read_length = 0, 
  write_state = SOUP_MESSAGE_IO_STATE_DONE, 
  write_encoding = SOUP_ENCODING_NONE, write_buf = 0x1fa4640, 
  write_body = 0x76f7b40, write_chunk = 0x0, write_body_offset = 0, 
  write_length = 0, written = 0, io_source = 0x0, unpause_source = 0x0, 
  paused = 0, get_headers_cb = 0x7f9b06acd220 <get_request_headers>, 
  parse_headers_cb = 0x7f9b06acd040 <parse_response_headers>, 
  header_data = 0x152a290, completion_cb = 0x7f9b06adccf0 <message_completed>, 
  completion_data = 0x152a290}

This is 'message->priv->io_data->read_header_buf':

$95 = {data = 0x7725810 "p\bq\a", len = 0}


So it looks like in soup-message-io.c:io_read() we somehow received an empty set of headers from the socket (or something else went wrong along the way) and crucially, we then failed to propagate the error and left the queue paused waiting for further data.

I fixed the error by manually changing item->state from SOUP_MESSAGE_RUNNING to SOUP_MESSAGE_FINISHING and then calling run_queue(). I'm not sure if the message in question was retried correctly, but at least processing the queue restarted and ostree seemed happy enough.

libsoup from git, head: 804281853d8a683bfdb61310827c16101fe8414b
Comment 1 Sam Thursfield 2012-07-11 09:18:23 UTC
After tracing from the beginning, it looks like this might be an ostree issue after all, because it seems to be ignoring an earlier error.
Comment 2 Dan Winship 2012-07-11 12:58:14 UTC
(In reply to comment #1)
> After tracing from the beginning, it looks like this might be an ostree issue
> after all, because it seems to be ignoring an earlier error.

ah, how so?

(In reply to comment #0)
> This is 'message->priv->io_data->read_header_buf':
> 
> $95 = {data = 0x7725810 "p\bq\a", len = 0}
> 
> 
> So it looks like in soup-message-io.c:io_read() we somehow received an empty
> set of headers from the socket

no; read_header_buf gets truncated back to 0 after the headers are processed, so if msg->status_code is non-0, then it's expected that read_header_buf is empty.
Comment 3 Dan Winship 2012-07-13 22:12:23 UTC
Try git master... the code to retry when the server unexpectedly
closes a persistent connection wasn't getting run in the SoupRequest
codepaths, which might have caused hangs.
Comment 4 Dan Winship 2012-07-17 21:17:51 UTC
More fixes on master; the "wikipedia pages load without css" bug is fixed. Not sure about ostree, but try it.
Comment 5 Sam Thursfield 2012-07-18 14:42:19 UTC
I'm now getting "Connection terminated unexpectedly" coming up a lot, but haven't seen a single hang. I'm hoping that this is your fix in action.
Comment 6 Dan Winship 2012-07-18 17:33:46 UTC
that sounds related to the change on Friday. of course, it's *supposed* to retry, not just fail...

i guess i need to start playing with ostree
Comment 7 Sam Thursfield 2012-07-19 11:00:22 UTC
It did retry, and I finally managed a successful pull :)

Thanks for your help here! I'm happy to close the bug if you are.
Comment 8 Dan Winship 2012-07-19 12:22:18 UTC
I mean, it should retry without you ever seeing any error. Or is the "connection unexpectedly terminated" just something you saw in strace or something like that?
Comment 9 Sam Thursfield 2012-07-19 13:04:21 UTC
I see the error with a breakpoint on g_set_error() and g_set_error_literal()

However, OSTree has very little in the way of error handling at this point, and previously it did actually report the error and fail. However, now although the error gets set at some point we seem to retry again, so I presume it's being handled internally.
Comment 10 Dan Winship 2012-07-19 13:08:21 UTC
ah, great. it's working then.