After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 664793 - Deadlock on EClient operation cancel
Deadlock on EClient operation cancel
Status: RESOLVED FIXED
Product: evolution-data-server
Classification: Platform
Component: general
3.2.x (obsolete)
Other Linux
: Normal critical
: ---
Assigned To: Evolution Shell Maintainers Team
Evolution QA team
Depends on:
Blocks:
 
 
Reported: 2011-11-25 10:19 UTC by Milan Crha
Modified: 2012-04-04 17:12 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
eds patch (4.45 KB, patch)
2012-04-04 17:09 UTC, Milan Crha
committed Details | Review

Description Milan Crha 2011-11-25 10:19:18 UTC
I'm playing in evolution with meeting invitations, which are cancelling GDBus requests when the message is closed. I faced a hang due to this with the below backtrace. Currently installed glib2 version is glib2-2.30.1-1.fc16. I do not expect I'm able to do anything with this within evolution and/or evolution-data-server, am I?

Thread 1 (Thread 0x7fe7f57ce9c0 (LWP 23942))

  • #0 __lll_lock_wait
    from /lib64/libpthread.so.0
  • #1 _L_lock_863
    from /lib64/libpthread.so.0
  • #2 pthread_mutex_lock
    from /lib64/libpthread.so.0
  • #3 send_message_with_reply_timeout_cb
    at gdbusconnection.c line 1686
  • #4 g_timeout_dispatch
    at gmain.c line 3891
  • #5 g_main_dispatch
    at gmain.c line 2425
  • #6 g_main_context_dispatch
    at gmain.c line 2995
  • #7 g_main_context_iterate
    at gmain.c line 3073
  • #8 g_main_loop_run
    at gmain.c line 3281
  • #9 g_dbus_connection_send_message_with_reply_sync
    at gdbusconnection.c line 2027
  • #10 g_dbus_connection_call_sync_internal
    at gdbusconnection.c line 5259
  • #11 g_dbus_proxy_call_sync_internal
    at gdbusproxy.c line 2775
  • #12 g_dbus_proxy_call_sync
    at gdbusproxy.c line 2964
  • #13 proxy_method_call_sync
    at e-gdbus-templates.c line 1945
  • #14 e_gdbus_proxy_method_call_sync_uint__void
    at e-gdbus-templates.c line 1994
  • #15 e_gdbus_cal_call_cancel_operation_sync
    at e-gdbus-cal.c line 1076
  • #16 gdbus_cal_call_cancel_operation_sync
    at e-gdbus-cal.c line 1646
  • #17 e_gdbus_async_op_keeper_cancel_op_sync
    at e-gdbus-templates.c line 760
  • #18 e_gdbus_op_cancelled_cb
    at e-gdbus-templates.c line 919
  • #19 g_closure_invoke
    at gclosure.c line 774
  • #20 signal_emit_unlocked_R
    at gsignal.c line 3272
  • #21 g_signal_emit_valist
    at gsignal.c line 3003
  • #22 g_signal_emit
    at gsignal.c line 3060
  • #23 g_cancellable_cancel
    at gcancellable.c line 507
  • #24 puri_free
    at itip-formatter.c line 2990
  • #25 emf_clear_puri_node
    at em-format.c line 121
  • #68 gdk_event_source_dispatch
    at gdkeventsource.c line 360
  • #69 g_main_dispatch
    at gmain.c line 2425
  • #70 g_main_context_dispatch
    at gmain.c line 2995
  • #71 g_main_context_iterate
    at gmain.c line 3073
  • #72 g_main_loop_run
    at gmain.c line 3281
  • #73 gtk_main
    at gtkmain.c line 1362
  • #74 main
    at main.c line 675

Comment 1 David Zeuthen (not reading bugmail) 2011-11-26 00:42:32 UTC
This looks like an application-level bug, reassigning.
Comment 2 David Zeuthen (not reading bugmail) 2011-11-26 00:53:33 UTC
(In reply to comment #1)
> This looks like an application-level bug, reassigning.

Specifically, see bug 651133 comment 5 and the bugs it links to. The problem is that Evo is doing sync calls from the ::cancelled handler which is a big no-no.
Comment 3 Milan Crha 2011-11-28 12:31:01 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > This looks like an application-level bug, reassigning.
> 
> Specifically, see bug 651133 comment 5 and the bugs it links to. The problem is
> that Evo is doing sync calls from the ::cancelled handler which is a big no-no.

Hmm, I didn't get it from the bug. It (the comment you cited) links to 4 bugs, two closed, two still opened. From my point of view, a person which just uses a library, there is GCancellable and GDBus sub-libraries, which are pretty much unrelated from the outside. So the "big no-no" is not obvious to me at all. I want to let the server side know that user cancelled something, and I do not want to do any expectations about libraries I do not write, which I only use. So what is the supposed *workaround* for the bug in GCancellable/GDBus here, please? Calling my notification to the server from idle or timeout? It's not good either. Will there be any difference if the call will not be a GDBus synchronous call, i.e. if I just call the function asynchronously and will not wait for its result? How do I know that without looking into the code of the library I only use and do not write?

I like the beginning of the comment you quoted, that "GCancellable tries to be clever". That "tries" is the key word for me. :) If it forces me to do any kind of unexpected workaround, then it tries it wrong. From my point of view.

From the documentation on the GCancellable::cancelled signal I suppose only the last paragraph is relevant to this bug:

> Note that the cancelled signal is emitted in the thread that the user
> cancelled from, which may be the main thread. So, the cancellable signal
> should not do something that can block.

but I do not see the connection and why it should be a problem in this case. If you look in the above backtrace then you see that Thread 3 is calling g_cancellable_disconnect() while Thread 1 is inside g_cancellable_cancel(), but both on a different instance of GCancellable object.

Thus the question is, why cannot G_LOCK(cancellable) be a lock per GCancellable instance, rather than a global lock? That may *fix* this kind of race, rather than workaround it. Am I right?

By the way, is the cancellable lock locked while the signal is emitted? From sources of glib 2.30.1 I see that the lock is not locked while the signal is called, thus who is holding it for the Thread 3 that it is stuck? I tried to reproduce this, but it seems I'm not able to on my will, it didn't want to stuck when I wanted to make sure I didn't miss any relevant thread, which would be under GCancellable call where the global lock would be used.
Comment 4 Milan Crha 2012-04-04 17:09:50 UTC
Created attachment 211312 [details] [review]
eds patch

for evolution-data-server;

Never mind, I made this in the EClient code. The patch fixes more dependant things, namely:
a) the "cancelled" callback on a GCancellable schedules its "cancel" to idle
b) authentication request processing is not piled on each other
c) if client_utils_open_new_auth_cb() is called when the operation was
   already cancelled, then a user is not asked for a password

There all are connected, thus I made them in one patch.
Comment 5 Milan Crha 2012-04-04 17:12:36 UTC
Created commit 2d6f9dc in eds master (3.5.1+)
Created commit 57824e4 in eds gnome-3-4 (3.4.1+)