GNOME Bugzilla – Bug 665801
Crash in SoupSessionAsync::finalize
Last modified: 2012-03-14 20:00:07 UTC
I'm trying to set up ews (in the git/master evo and friends). At some point "Fetch URL" worked, at least for the host URL (but never for the OAB URL). Now when I try it I get a popup saying: Autodiscover failed: Failed to find <ASUrl> and <OABUrl> in autodiscover response When I press OK, evolution crashes. Here's the backtrace: Core was generated by `evolution'. Program terminated with signal 11, Segmentation fault.
+ Trace 229234
Thread 1 (Thread 0xb595f830 (LWP 4826))
I just tried this again with the same results. However, the backtrace looks different. Program terminated with signal 11, Segmentation fault.
+ Trace 229268
Thread 1 (Thread 0xb595a830 (LWP 32196))
Created attachment 203576 [details] [review] proposed ews patch for evolution-ews; Partially related, this fixes the issue with memory management. The 'simple' was leaked, and the first unref was incorrect, there is done an unref of cnc in the e_ews_autodiscover_ws_url, with a comment: g_object_unref (cnc); /* the GSimpleAsyncResult holds it now */ It doesn't fix the crash, itself though, see below.
I guess the main issue is with libsoup itself. To reproduce this I may cheat the code to repeat David's environment, which means to edit e-ews-connection.c, line 1166, change the if: if (!success) { g_free (urls->as_url); g_free (urls->oab_url); g_free (urls); g_set_error (&error, EWS_CONNECTION_ERROR, -1, _("Failed to find <ASUrl> and <OABUrl> in ... goto failed; } to if (1 || !success) { which will make sure that all 4 requests will be processed. After that, if I press the "Fetch URL" button multiple times (every time after I dismiss the error message about "Failed to find <ASUrl> and <OABUrl> in...", then it can sometimes crash for me too, in the same place. It's crashing in libsoup, and valgrind makes it feel like the issue is there: Invalid read of size 8 at 0x7E55DB5: g_source_destroy (gmain.c:1047) by 0x3E80E43DD0: finalize (soup-session-async.c:64) by 0x3E6B61145C: g_object_unref (gobject.c:2746) by 0x23AB2DB2: e_ews_connection_dispose (e-ews-connection.c:723) by 0x3E6B6113DA: g_object_unref (gobject.c:2709) by 0x3E6DA66EC1: g_simple_async_result_finalize (gsimpleasyncresult.c:269) by 0x3E6B61145C: g_object_unref (gobject.c:2746) by 0x7E555FD: g_source_callback_unref (gmain.c:1324) by 0x7E54F55: g_source_destroy_internal (gmain.c:993) by 0x7E58AF2: g_main_context_dispatch (gmain.c:2450) by 0x7E59277: g_main_context_iterate (gmain.c:3073) by 0x7E597C4: g_main_loop_run (gmain.c:3281) by 0x39D8B516AC: gtk_main (gtkmain.c:1362) by 0x403513: main (main.c:688) Address 0x107e7060 is 32 bytes inside a block of size 96 free'd at 0x4A0662E: free (vg_replace_malloc.c:366) by 0x7E5F792: g_free (gmem.c:263) by 0x7E545EC: g_source_unref_internal (gmain.c:1702) by 0x3E80E3A075: soup_add_completion (soup-misc.c:138) by 0x3E80E43E49: do_idle_run_queue (soup-session-async.c:475) by 0x3E6B60EA23: g_closure_invoke (gclosure.c:774) by 0x3E6B620D16: signal_emit_unlocked_R (gsignal.c:3272) by 0x3E6B62A140: g_signal_emit_valist (gsignal.c:3003) by 0x3E6B62A2E1: g_signal_emit (gsignal.c:3060) by 0x3E80E42687: soup_session_abort (soup-session.c:1844) by 0x3E80E42C94: dispose (soup-session.c:216) by 0x3E6B6113DA: g_object_unref (gobject.c:2709) by 0x23AB2DB2: e_ews_connection_dispose (e-ews-connection.c:723) by 0x3E6B6113DA: g_object_unref (gobject.c:2709) by 0x3E6DA66EC1: g_simple_async_result_finalize (gsimpleasyncresult.c:269) by 0x3E6B61145C: g_object_unref (gobject.c:2746) by 0x7E555FD: g_source_callback_unref (gmain.c:1324) by 0x7E54F55: g_source_destroy_internal (gmain.c:993) by 0x7E58AF2: g_main_context_dispatch (gmain.c:2450) by 0x7E59277: g_main_context_iterate (gmain.c:3073) by 0x7E597C4: g_main_loop_run (gmain.c:3281) by 0x39D8B516AC: gtk_main (gtkmain.c:1362) by 0x403513: main (main.c:688) This is with libsoup-2.36.1-2.fc16.x86_64. Dan, can you think of a reason for this, please? Any suggestion is welcomed.
(In reply to comment #3) > Address 0x107e7060 is 32 bytes inside a block of size 96 free'd > at 0x4A0662E: free (vg_replace_malloc.c:366) > by 0x7E5F792: g_free (gmem.c:263) > by 0x7E545EC: g_source_unref_internal (gmain.c:1702) > by 0x3E80E3A075: soup_add_completion (soup-misc.c:138) The only way the source created by soup_add_completion() could be freed from within soup_add_completion() would be if it was added from a thread other than the thread running the GMainContext that it was added to. Which it looks like you're doing; you're unreffing the session from the main thread/context rather than from the soup_context. It's a little bit weird that you can't even unref the session from another thread, and maybe that could be fixed, but it looks like you're also queuing and cancelling messages from the wrong thread in other places too, so there could still be problems even if that was changed. I didn't look too carefully, but it's not obvious to me why you need to be using another thread at all, given that you are running a GMainLoop in the main thread too...?
David Woodhouse would better answer this, I believe. They create an asynchronous operation, and let it run in a dedicated thread, probably to not block UI. Unreffing EEwsConnection in a dedicated thread has some issues too. I checked the code and e_ews_autodiscover_ws_url() is called on a response to button click, thus from a main thread. There is immediately created EEwsConnection, which has created its own soup_session, also in the main thread, but passes it that soup_context you mentioned above. What is that good for I do not know. The thing is that the connection is also unreffed in the main thread too (as the asynchronous operation ends on idle). Am I right that what you suggest here is not to unref EEwsConnection::soup_session in its e_ews_connection_dispose() function, but invoke something in EEwsConnection::soup_context context, say on its idle? What to unref there, the soup_session only, or all the soup_session, soup_loop, soup_thread and soup_context too?
Not all users of EEwsConnection have a mainloop that can be used for this, so EEwsConnection makes its own thread and mainloop. It is somewhat painful that we have these restrictions on where the soup functions can be called from. To my brain, they seem to be locking bugs in libsoup, although I appreciate that it is a design decision to be that way. Nevertheless, it does mean that I sometimes forget to invoke the soup functions by jumping through the correct (in_idle) hoops. Let's fix those cases. BUT: I wasn't aware that *even* g_object_unref() had to be called in the appropriate context. That is scary and even more "broken" to my mind than the situation that I have already come reluctantly to accept... fixing that as you suggest (Dan) would seem sensible to me.
(In reply to comment #6) > It is somewhat painful that we have these restrictions on where the soup > functions can be called from. To my brain, they seem to be locking bugs in > libsoup, although I appreciate that it is a design decision to be that way. Note that SoupSessionSync *is* thread-safe. > I wasn't aware that *even* g_object_unref() had to be called in the appropriate > context. That is scary and even more "broken" to my mind than the situation > that I have already come reluctantly to accept... Yeah, it's not intentional. The problem is that if there are persistent connections open when you destroy the SoupSession, they have to be closed, and there's code elsewhere that queues an idle handler any time a connection is closed. Just need to make it not do that in the finalizing case. Bug 667364
Created attachment 206729 [details] [review] unref soup session in its own context This untested patch attempts to move the unref of the soup_session into its own context, to work around the issue.
Patch works for me fine. I would be more conservative and pass only soup_session into do_kill_soup_session, but otherwise it works. There is one compiler warning too: e-ews-connection.c: In function 'e_ews_connection_dispose': e-ews-connection.c:740:12: warning: passing argument 2 of 'g_source_set_callback' from incompatible pointer type [enabled by default] /usr/include/glib-2.0/glib/gmain.h:358:6: note: expected 'GSourceFunc' but argument is of type 'void (*)(struct EEwsConnectionPrivate *)' The do_kill_soup_session() should also return FALSE.
(In reply to comment #9) > Patch works for me fine. I would be more conservative and pass only > soup_session into do_kill_soup_session, but otherwise it works. That was my original though. But if we leave the call to g_main_loop_quit(priv->soup_loop) in the e_ews_connection_dispose() function rather than in do_kill_soup_session(), won't that mean we can kill the event loop before it ever gets round to calling do_kill_soup_session() And we'll then leak the session? Hm.... perhaps there's a better way to do this. What if we add a g_object_unref(priv->soup_session) to the end of the e_ews_soup_thread() function? Then all that e_ews_connection_dispose() needs to do is call g_main_loop_quit() and wait for the thread. It'll take care of unreffing the session all by itself?
I tested with mine patch applied too and both work fine. I also realized (before your above comment) that you are using other parts of the priv struct in the do_kill_soup_session function, thus my above comment about passing only soup_session in is not correct. I would not complicate it much, this seems to work correctly. If this will work to David, and Dan doesn't have any opinion, then I would just use it (with a little modifications mentioned in comment #9).
Created attachment 207106 [details] [review] simpler fix This variant is *simpler*, and more ObviouslyCorrect™ — I don't have to worry about how sensible it is to pass the 'priv' variable out from e_ews_connection_dispose() to a helper which runs in another thread.
I've committed the patch from comment 12 to master and gnome-3-2 branches, along with a suitably grumpy commit message about libsoup :)
What about my cleanup patch? It fixes some memory leaks.
Committed the memory leak as: Created commit 13b2005 in ews master (3.3.90+)
Hrm, a little regression with the memory leak patch :( https://bugzilla.redhat.com/show_bug.cgi?id=802408 1352 1353 g_simple_async_result_set_op_res_gpointer (ad->simple, urls, NULL); 1354 g_simple_async_result_complete_in_idle (ad->simple); 1355 g_object_unref (ad->simple); 1356 return; 1357 1358 failed:
+ Trace 229863
Thread 1 (Thread 0x7f2a81ffb700 (LWP 1293))
(In reply to comment #16) > 1353 g_simple_async_result_set_op_res_gpointer (ad->simple, urls, NULL); > 1354 g_simple_async_result_complete_in_idle (ad->simple); > 1355 g_object_unref (ad->simple); Probably ad is getting freed by autodiscover_done_cb() in the other thread between 1354 and 1355, and so you end up passing garbage to g_object_unref(). Try pulling ad->simple out into a local variable before completing it.
I tried to reproduce this and it works just fine here. I see that the request are not interleaving its occupation in autodiscover_response_cb(), the only one is when the request succeeded, where is called soup_session_cancel_message(), which results in simple return from this function for such affected request. I also checked ref_count and such and all seems just fine here. i also tried to cheat a bit, playing with failed and more/less successful responses, but it still works as it should for me, during my testing. I thought that setting ad->msgs[idx] = NULL; just when it's recognized that the message can be processed is potentially dangerous, but I cannot prove the idea with my tests.
(In reply to comment #17) > (In reply to comment #16) > > 1353 g_simple_async_result_set_op_res_gpointer (ad->simple, urls, NULL); > > 1354 g_simple_async_result_complete_in_idle (ad->simple); > > 1355 g_object_unref (ad->simple); > > Probably ad is getting freed by autodiscover_done_cb() in the other thread > between 1354 and 1355, and so you end up passing garbage to g_object_unref(). > Try pulling ad->simple out into a local variable before completing it. Interesting, I would think of such case, but you've right, it can happen in general. Thanks for the suggestion. I made a fix for it based on it. Created commit 11c1acd in ews master (3.3.92+)
(In reply to comment #19) > Interesting, I would think of such case, but you've right, it can happen in Err, it was supposed to be: "...I would *not* think..."