GNOME Bugzilla – Bug 771912
Segfault when request sent using soup_session_send_async is cancelled
Last modified: 2018-09-21 16:27:07 UTC
I'm getting frequent crashes when a request sent using soup_session_send_async() has the passed GCancellable actually cancelled, via the main loop. The session in use is an unremarkable SoupSession that has a small single-user cache. Representative backtrace: Thread 1 "geary" received signal SIGSEGV, Segmentation fault. write_ready_cb (source=<optimised out>, result=<optimised out>, istream=0x642e750 [SoupCacheInputStream]) at soup-cache-input-stream.c:175 175 if (pending) { (gdb) bt
+ Trace 236694
(gdb) info locals ostream = <optimised out> priv = 0x642e6f0 write_size = 1369 pending = <error reading variable pending (Cannot access memory at address 0x8)> error = 0x0
Another crash at the same point in the application gives a slightly different (better?) error and stack trace - libsoup 2.54.1: libsoup:ERROR:soup-cache-input-stream.c:195:soup_cache_input_stream_write_next_buffer: assertion failed: (priv->output_stream && !g_output_stream_is_closed (priv->output_stream)) Thread 1 "geary" received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 58 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt
+ Trace 236724
Mike, I managed to get a crash of Geary while running gdb. I'm using libsoup 2.56.0 on Fedora 25. Thread 1 "geary" received signal SIGSEGV, Segmentation fault. istream_caching_finished (istream=0x324ab20 [SoupCacheInputStream], bytes_written=495, error=0x0, user_data=0x37098c0) at soup-cache.c:784 784 g_clear_object (&entry->cancellable); (gdb) bt
+ Trace 236935
Hey Mike/Federico I hit this same bug as well, and am working on fixing it (GStreamer is the application that is seeing this backtrace) Using https://github.com/Sean-Der/libsoup do you have fewer SIGSEGV ? I am currently trying to figure out the case where buffer is NULL, but I just noticed a check was missing in one place (and all other callers were performing it) I am not super familiar with libsoup, so just trying my best to fix this one case until I can get a maintainers attention. thanks
Also I added an additional assert for `buffer` being NULL, I am still trying to figure out that case and what the proper behavior is.
Hi Sean I'd be happy to test your patches if you point me to a tutorial on building libsoup and especially let Geary use this version instead of Fedora system packages installed on my computer. If it's just about running the usual build combo (./autogen.sh. configure, etc.) and install a new binary in /usr/local, then it's pretty simple. But if it's different, as I suspect, I need some help.
You should attach a patch here so that it appears in the unreviewed patch tracker, otherwise we have no way to track all the patches that are not getting reviews. (In reply to Federico Bruni from comment #5) > If it's just about running the usual build combo (./autogen.sh. configure, > etc.) and install a new binary in /usr/local, then it's pretty simple. But > if it's different, as I suspect, I need some help. No it's just the usual. configure, make, make install into /usr/local should work fine.
Created attachment 350162 [details] [review] Add NULL check to SoupCache writing that was missing on one code path
Created attachment 350163 [details] [review] g_queue_pop_head returns NULL even though g_queue_get_length returns 1 I am not fully confident about this patch, I am worried there is a deeper issue here I don't fully understand.
(In reply to Sean-Der from comment #3) > Hey Mike/Federico > > I hit this same bug as well, and am working on fixing it (GStreamer is the > application that is seeing this backtrace) > > Using https://github.com/Sean-Der/libsoup do you have fewer SIGSEGV ? I am > currently trying to figure out the case where buffer is NULL, but I just > noticed a check was missing in one place (and all other callers were > performing it) > > I am not super familiar with libsoup, so just trying my best to fix this one > case until I can get a maintainers attention. Hey, thanks for looking into this, I've had a few stabs but never made any progress. Am rebuilding soup with the two patches above, will let you know if it seems to help over the next few days.
@Federico Sorry I missed your comment! I really appreciate you testing out patches, if you need any help getting it working I am happy to jump on IRC and walk through stuff. @Michael Awesome! I still have other segfaults in the cache writing process, but it is getting better. I have coredumps enabled and just keep fixing as they come.
I cannot test the patch, as I cannot reproduce the bug anymore. IIRC I used a long thread (181 messages) as crash test. But now it works fine, I cannot trigger the crash.
Quick update: After running Geary against 2.56.0 with the two patches above applied, I haven't seen any crashes with stacks like the ones I or Federico reported above. I'd normally have expected to see about 1-2 per day.
Thank you so much for testing Federico/Michael! Federico I wouldn't be surprised if an environment change (or really anything!) changed the issue for you it is super inconsistent for me as well. That's awesome to hear about the segfaults, this fixed the issue for me as well. Hopefully a maintainer is able to merge/review these thanks again
Comment on attachment 350163 [details] [review] g_queue_pop_head returns NULL even though g_queue_get_length returns 1 So clearly this shouldn't be happening... is it possible there's a thread-safety issue? If you can still reproduce the bug reliably, adding something like: g_print ("soup_cache_input_stream_write_next_buffer %p in %p\n", istream, g_thread_get_self ()); to the top of each function could help figure that out (and if that doesn't turn up any threading problem maybe a few more printfs would)?
Comment on attachment 350162 [details] [review] Add NULL check to SoupCache writing that was missing on one code path SoupCacheInputStream is pretty complicated for such a small file, and it's hard to say whether this is supposed to be necessary or not... Do you know under what circumstances this gets hit? If this helps *without also needing the gqueue patch* (which suggests some larger problem exists) then I'd be OK committing it as-is I guess, but it feels like eventually a larger rewrite would be nice (eg, having an explicit "priv->state" field rather than testing so many different variables in so many places). Maybe if you were going to tackle the larger rewrites needed to support sync operation as well that would be something to consider.
*** Bug 781879 has been marked as a duplicate of this bug. ***
FWIW, this happens under Geary when displaying an email conversation, due to loading avatar icons from Gravatar for each sender. For each message in the conversation, from the main loop, a call to soup_session_send_async() is made on a common, long-lived SoupSession object that has a SoupCache instance added to it as a feature. After each async call completes, the resulting GInputStream is then passed to a pixbuf loader iff the HTTP request was successful. The crash has been worked around on Geary master to a greater extent by never cancelling any avatar loads, but it does still occur sometimes when loading large conversations, where messages with the same sender appear more than once and hence same Gravatar HTTP request is sent multiple times. My theory is that for these larger conversations, since the same avatar may be requested multiple times, perhaps there is a race when the initial network load has completed, or perhaps GInputStream instances are somehow being re-used, or something? I'll try to get around to putting the g_print() trace call in, but I'm massively time poor at the moment, so that might take a while.
Ah, spoke too soon. After deleting my cache and restarting geary, I just had this crash: (gdb) bt
+ Trace 237432
*** Bug 793553 has been marked as a duplicate of this bug. ***
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/libsoup/issues/97.