After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 665801 - Crash in SoupSessionAsync::finalize
Crash in SoupSessionAsync::finalize
Status: RESOLVED FIXED
Product: evolution-ews
Classification: Other
Component: Miscellaneous / EWS Core
3.3.x
Other Linux
: Normal critical
: ---
Assigned To: Evolution EWS maintainer(s)
Evolution EWS maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2011-12-08 16:03 UTC by David Ronis
Modified: 2012-03-14 20:00 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
proposed ews patch (966 bytes, patch)
2011-12-15 13:36 UTC, Milan Crha
committed Details | Review
unref soup session in its own context (1.36 KB, patch)
2012-02-03 21:24 UTC, David Woodhouse
none Details | Review
simpler fix (863 bytes, patch)
2012-02-08 14:14 UTC, David Woodhouse
committed Details | Review

Description David Ronis 2011-12-08 16:03:19 UTC
I'm trying to set up ews (in the git/master evo and friends).   At some point "Fetch URL" worked, at least for the host URL (but never for the OAB URL).  Now when I try it I get a popup saying:

Autodiscover failed: Failed to find <ASUrl> and <OABUrl> in autodiscover response

When I press OK, evolution crashes.   Here's the backtrace:


Core was generated by `evolution'.
Program terminated with signal 11, Segmentation fault.

Thread 1 (Thread 0xb595f830 (LWP 4826))

  • #0 thread_memory_magazine2_free
    at gslice.c line 848
  • #1 g_slice_free1
    at gslice.c line 1097
  • #2 g_source_unref_internal
    at gmain.c line 1657
  • #3 destroy_unref_source
    at soup-session-async.c line 59
  • #4 g_hash_table_remove_all_nodes
    at ghash.c line 520
  • #5 g_hash_table_remove_all
    at ghash.c line 1284
  • #6 g_hash_table_destroy
    at ghash.c line 1033
  • #7 finalize
    at soup-session-async.c line 76
  • #8 g_object_unref
    at gobject.c line 2883
  • #9 e_ews_connection_dispose
    at e-ews-connection.c line 707
  • #10 g_object_unref
    at gobject.c line 2846
  • #11 autodiscover_done_cb
    at e-ews-connection.c line 1046
  • #12 g_simple_async_result_complete
    at gsimpleasyncresult.c line 744
  • #13 complete_in_idle_cb
    at gsimpleasyncresult.c line 756
  • #14 g_idle_dispatch
    at gmain.c line 4632
  • #15 g_main_dispatch
    at gmain.c line 2513
  • #16 g_main_context_dispatch
    at gmain.c line 3050
  • #17 g_main_context_iterate
    at gmain.c line 3121
  • #18 g_main_context_iterate
    at gmain.c line 3058
  • #19 g_main_loop_run
    at gmain.c line 3315
  • #20 gtk_main
    at gtkmain.c line 1159
  • #21 main
    at main.c line 675

Comment 1 David Ronis 2011-12-13 18:45:11 UTC
I just tried this again with the same results.   However, the backtrace looks different.

Program terminated with signal 11, Segmentation fault.

Thread 1 (Thread 0xb595a830 (LWP 32196))

  • #0 g_mutex_get_impl
    at gthread-posix.c line 119
  • #1 g_mutex_lock
    at gthread-posix.c line 211
  • #2 g_source_destroy_internal
    at gmain.c line 933
  • #3 destroy_unref_source
    at soup-session-async.c line 58
  • #4 g_hash_table_remove_all_nodes
    at ghash.c line 520
  • #5 g_hash_table_remove_all
    at ghash.c line 1284
  • #6 g_hash_table_destroy
    at ghash.c line 1033
  • #7 finalize
    at soup-session-async.c line 76
  • #8 g_object_unref
    at gobject.c line 2882
  • #9 e_ews_connection_dispose
    at e-ews-connection.c line 723
  • #10 g_object_unref
    at gobject.c line 2845
  • #11 autodiscover_done_cb
    at e-ews-connection.c line 1062
  • #12 g_simple_async_result_complete
    at gsimpleasyncresult.c line 744
  • #13 complete_in_idle_cb
    at gsimpleasyncresult.c line 756
  • #14 g_idle_dispatch
    at gmain.c line 4632
  • #15 g_main_dispatch
    at gmain.c line 2513
  • #16 g_main_context_dispatch
    at gmain.c line 3050
  • #17 g_main_context_iterate
    at gmain.c line 3121
  • #18 g_main_context_iterate
    at gmain.c line 3058
  • #19 g_main_loop_run
    at gmain.c line 3315
  • #20 gtk_main
    at gtkmain.c line 1159
  • #21 main
    at main.c line 688

Comment 2 Milan Crha 2011-12-15 13:36:41 UTC
Created attachment 203576 [details] [review]
proposed ews patch

for evolution-ews;

Partially related, this fixes the issue with memory management. The 'simple' was leaked, and the first unref was incorrect, there is done an unref of cnc in the e_ews_autodiscover_ws_url, with a comment:
   g_object_unref (cnc); /* the GSimpleAsyncResult holds it now */
It doesn't fix the crash, itself though, see below.
Comment 3 Milan Crha 2011-12-15 13:41:46 UTC
I guess the main issue is with libsoup itself. To reproduce this I may cheat the code to repeat David's environment, which means to edit e-ews-connection.c, line 1166, change the if:
   	if (!success) {
		g_free (urls->as_url);
		g_free (urls->oab_url);
		g_free (urls);
		g_set_error	(&error, EWS_CONNECTION_ERROR,
				-1,
				_("Failed to find <ASUrl> and <OABUrl> in ...
		goto failed;
	}

to
   	if (1 || !success) {
which will make sure that all 4 requests will be processed. After that, if I press the "Fetch URL" button multiple times (every time after I dismiss the error message about "Failed to find <ASUrl> and <OABUrl> in...", then it can sometimes crash for me too, in the same place. It's crashing in libsoup, and valgrind makes it feel like the issue is there:

Invalid read of size 8
   at 0x7E55DB5: g_source_destroy (gmain.c:1047)
   by 0x3E80E43DD0: finalize (soup-session-async.c:64)
   by 0x3E6B61145C: g_object_unref (gobject.c:2746)
   by 0x23AB2DB2: e_ews_connection_dispose (e-ews-connection.c:723)
   by 0x3E6B6113DA: g_object_unref (gobject.c:2709)
   by 0x3E6DA66EC1: g_simple_async_result_finalize (gsimpleasyncresult.c:269)
   by 0x3E6B61145C: g_object_unref (gobject.c:2746)
   by 0x7E555FD: g_source_callback_unref (gmain.c:1324)
   by 0x7E54F55: g_source_destroy_internal (gmain.c:993)
   by 0x7E58AF2: g_main_context_dispatch (gmain.c:2450)
   by 0x7E59277: g_main_context_iterate (gmain.c:3073)
   by 0x7E597C4: g_main_loop_run (gmain.c:3281)
   by 0x39D8B516AC: gtk_main (gtkmain.c:1362)
   by 0x403513: main (main.c:688)
 Address 0x107e7060 is 32 bytes inside a block of size 96 free'd
   at 0x4A0662E: free (vg_replace_malloc.c:366)
   by 0x7E5F792: g_free (gmem.c:263)
   by 0x7E545EC: g_source_unref_internal (gmain.c:1702)
   by 0x3E80E3A075: soup_add_completion (soup-misc.c:138)
   by 0x3E80E43E49: do_idle_run_queue (soup-session-async.c:475)
   by 0x3E6B60EA23: g_closure_invoke (gclosure.c:774)
   by 0x3E6B620D16: signal_emit_unlocked_R (gsignal.c:3272)
   by 0x3E6B62A140: g_signal_emit_valist (gsignal.c:3003)
   by 0x3E6B62A2E1: g_signal_emit (gsignal.c:3060)
   by 0x3E80E42687: soup_session_abort (soup-session.c:1844)
   by 0x3E80E42C94: dispose (soup-session.c:216)
   by 0x3E6B6113DA: g_object_unref (gobject.c:2709)
   by 0x23AB2DB2: e_ews_connection_dispose (e-ews-connection.c:723)
   by 0x3E6B6113DA: g_object_unref (gobject.c:2709)
   by 0x3E6DA66EC1: g_simple_async_result_finalize (gsimpleasyncresult.c:269)
   by 0x3E6B61145C: g_object_unref (gobject.c:2746)
   by 0x7E555FD: g_source_callback_unref (gmain.c:1324)
   by 0x7E54F55: g_source_destroy_internal (gmain.c:993)
   by 0x7E58AF2: g_main_context_dispatch (gmain.c:2450)
   by 0x7E59277: g_main_context_iterate (gmain.c:3073)
   by 0x7E597C4: g_main_loop_run (gmain.c:3281)
   by 0x39D8B516AC: gtk_main (gtkmain.c:1362)
   by 0x403513: main (main.c:688)

This is with libsoup-2.36.1-2.fc16.x86_64.

Dan, can you think of a reason for this, please? Any suggestion is welcomed.
Comment 4 Dan Winship 2012-01-04 16:59:36 UTC
(In reply to comment #3)
>  Address 0x107e7060 is 32 bytes inside a block of size 96 free'd
>    at 0x4A0662E: free (vg_replace_malloc.c:366)
>    by 0x7E5F792: g_free (gmem.c:263)
>    by 0x7E545EC: g_source_unref_internal (gmain.c:1702)
>    by 0x3E80E3A075: soup_add_completion (soup-misc.c:138)

The only way the source created by soup_add_completion() could be freed from within soup_add_completion() would be if it was added from a thread other than the thread running the GMainContext that it was added to. Which it looks like you're doing; you're unreffing the session from the main thread/context rather than from the soup_context.

It's a little bit weird that you can't even unref the session from another thread, and maybe that could be fixed, but it looks like you're also queuing and cancelling messages from the wrong thread in other places too, so there could still be problems even if that was changed.

I didn't look too carefully, but it's not obvious to me why you need to be using another thread at all, given that you are running a GMainLoop in the main thread too...?
Comment 5 Milan Crha 2012-01-05 12:15:53 UTC
David Woodhouse would better answer this, I believe. They create an asynchronous operation, and let it run in a dedicated thread, probably to not block UI. Unreffing EEwsConnection in a dedicated thread has some issues too. I checked the code and e_ews_autodiscover_ws_url() is called on a response to button click, thus from a main thread. There is immediately created EEwsConnection, which has created its own soup_session, also in the main thread, but passes it that soup_context you mentioned above. What is that good for I do not know. The thing is that the connection is also unreffed in the main thread too (as the asynchronous operation ends on idle).

Am I right that what you suggest here is not to unref EEwsConnection::soup_session in its e_ews_connection_dispose() function, but invoke something in EEwsConnection::soup_context context, say on its idle? What to unref there, the soup_session only, or all the soup_session, soup_loop, soup_thread and soup_context too?
Comment 6 David Woodhouse 2012-01-05 14:38:41 UTC
Not all users of EEwsConnection have a mainloop that can be used for this, so EEwsConnection makes its own thread and mainloop.

It is somewhat painful that we have these restrictions on where the soup functions can be called from. To my brain, they seem to be locking bugs in libsoup, although I appreciate that it is a design decision to be that way.

Nevertheless, it does mean that I sometimes forget to invoke the soup functions by jumping through the correct (in_idle) hoops. Let's fix those cases.

BUT:

I wasn't aware that *even* g_object_unref() had to be called in the appropriate context. That is scary and even more "broken" to my mind than the situation that I have already come reluctantly to accept... fixing that as you suggest (Dan) would seem sensible to me.
Comment 7 Dan Winship 2012-01-05 15:00:36 UTC
(In reply to comment #6)
> It is somewhat painful that we have these restrictions on where the soup
> functions can be called from. To my brain, they seem to be locking bugs in
> libsoup, although I appreciate that it is a design decision to be that way.

Note that SoupSessionSync *is* thread-safe.

> I wasn't aware that *even* g_object_unref() had to be called in the appropriate
> context. That is scary and even more "broken" to my mind than the situation
> that I have already come reluctantly to accept...

Yeah, it's not intentional. The problem is that if there are persistent connections open when you destroy the SoupSession, they have to be closed, and there's code elsewhere that queues an idle handler any time a connection is closed. Just need to make it not do that in the finalizing case. Bug 667364
Comment 8 David Woodhouse 2012-02-03 21:24:26 UTC
Created attachment 206729 [details] [review]
unref soup session in its own context

This untested patch attempts to move the unref of the soup_session into its own context, to work around the issue.
Comment 9 Milan Crha 2012-02-06 12:38:54 UTC
Patch works for me fine. I would be more conservative and pass only soup_session into do_kill_soup_session, but otherwise it works. There is one compiler warning too:

e-ews-connection.c: In function 'e_ews_connection_dispose':
e-ews-connection.c:740:12: warning: passing argument 2 of 'g_source_set_callback' from incompatible pointer type [enabled by default]
/usr/include/glib-2.0/glib/gmain.h:358:6: note: expected 'GSourceFunc' but argument is of type 'void (*)(struct EEwsConnectionPrivate *)'

The do_kill_soup_session() should also return FALSE.
Comment 10 David Woodhouse 2012-02-06 12:45:09 UTC
(In reply to comment #9)
> Patch works for me fine. I would be more conservative and pass only
> soup_session into do_kill_soup_session, but otherwise it works.

That was my original though. But if we leave the call to g_main_loop_quit(priv->soup_loop) in the e_ews_connection_dispose() function rather than in do_kill_soup_session(), won't that mean we can kill the event loop before it ever gets round to calling do_kill_soup_session() And we'll then leak the session?

Hm.... perhaps there's a better way to do this. What if we add a g_object_unref(priv->soup_session) to the end of the e_ews_soup_thread() function?

Then all that e_ews_connection_dispose() needs to do is call g_main_loop_quit() and wait for the thread. It'll take care of unreffing the session all by itself?
Comment 11 Milan Crha 2012-02-06 13:46:46 UTC
I tested with mine patch applied too and both work fine. I also realized (before your above comment) that you are using other parts of the priv struct in the do_kill_soup_session function, thus my above comment about passing only soup_session in is not correct.

I would not complicate it much, this seems to work correctly. If this will work to David, and Dan doesn't have any opinion, then I would just use it (with a little modifications mentioned in comment #9).
Comment 12 David Woodhouse 2012-02-08 14:14:10 UTC
Created attachment 207106 [details] [review]
simpler fix

This variant is *simpler*, and more ObviouslyCorrect™ — I don't have to worry about how sensible it is to pass the 'priv' variable out from e_ews_connection_dispose() to a helper which runs in another thread.
Comment 13 David Woodhouse 2012-02-08 14:39:36 UTC
I've committed the patch from comment 12 to master and gnome-3-2 branches, along with a suitably grumpy commit message about libsoup :)
Comment 14 Milan Crha 2012-02-09 08:26:58 UTC
What about my cleanup patch? It fixes some memory leaks.
Comment 15 Milan Crha 2012-02-09 14:20:01 UTC
Committed the memory leak as:

Created commit 13b2005 in ews master (3.3.90+)
Comment 16 Milan Crha 2012-03-13 08:02:53 UTC
Hrm, a little regression with the memory leak patch :(
https://bugzilla.redhat.com/show_bug.cgi?id=802408

1352
1353	g_simple_async_result_set_op_res_gpointer (ad->simple, urls, NULL);
1354	g_simple_async_result_complete_in_idle (ad->simple);
1355	g_object_unref (ad->simple);
1356	return;
1357
1358 failed:

Thread 1 (Thread 0x7f2a81ffb700 (LWP 1293))

  • #0 g_object_unref
    at gobject.c line 2910
  • #1 autodiscover_response_cb
    at e-ews-connection.c line 1355
  • #2 process_queue_item
    at soup-session-async.c line 432
  • #3 run_queue
    at soup-session-async.c line 467
  • #4 idle_run_queue
    at soup-session-async.c line 494
  • #5 g_main_dispatch
    at gmain.c line 2510
  • #6 g_main_context_dispatch
    at gmain.c line 3047
  • #7 g_main_context_iterate
    at gmain.c line 3118
  • #8 g_main_loop_run
    at gmain.c line 3312
  • #9 e_ews_soup_thread
    at e-ews-connection.c line 974
  • #10 g_thread_proxy
    at gthread.c line 801
  • #11 start_thread
    at pthread_create.c line 309
  • #12 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 115

Comment 17 Dan Winship 2012-03-13 12:54:07 UTC
(In reply to comment #16)
> 1353    g_simple_async_result_set_op_res_gpointer (ad->simple, urls, NULL);
> 1354    g_simple_async_result_complete_in_idle (ad->simple);
> 1355    g_object_unref (ad->simple);

Probably ad is getting freed by autodiscover_done_cb() in the other thread between 1354 and 1355, and so you end up passing garbage to g_object_unref(). Try pulling ad->simple out into a local variable before completing it.
Comment 18 Milan Crha 2012-03-13 18:49:22 UTC
I tried to reproduce this and it works just fine here. I see that the request are not interleaving its occupation in autodiscover_response_cb(), the only one is when the request succeeded, where is called soup_session_cancel_message(), which results in simple return from this function for such affected request. I also checked ref_count and such and all seems just fine here. i also tried to cheat a bit, playing with failed and more/less successful responses, but it still works as it should for me, during my testing. I thought that setting
   ad->msgs[idx] = NULL;
just when it's recognized that the message can be processed is potentially dangerous, but I cannot prove the idea with my tests.
Comment 19 Milan Crha 2012-03-14 19:57:29 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > 1353    g_simple_async_result_set_op_res_gpointer (ad->simple, urls, NULL);
> > 1354    g_simple_async_result_complete_in_idle (ad->simple);
> > 1355    g_object_unref (ad->simple);
> 
> Probably ad is getting freed by autodiscover_done_cb() in the other thread
> between 1354 and 1355, and so you end up passing garbage to g_object_unref().
> Try pulling ad->simple out into a local variable before completing it.

Interesting, I would think of such case, but you've right, it can happen in general. Thanks for the suggestion. I made a fix for it based on it.

Created commit 11c1acd in ews master (3.3.92+)
Comment 20 Milan Crha 2012-03-14 20:00:07 UTC
(In reply to comment #19)
> Interesting, I would think of such case, but you've right, it can happen in

Err, it was supposed to be: "...I would *not* think..."