GNOME Bugzilla – Bug 664793
Deadlock on EClient operation cancel
Last modified: 2012-04-04 17:12:36 UTC
I'm playing in evolution with meeting invitations, which are cancelling GDBus requests when the message is closed. I faced a hang due to this with the below backtrace. Currently installed glib2 version is glib2-2.30.1-1.fc16. I do not expect I'm able to do anything with this within evolution and/or evolution-data-server, am I?
+ Trace 229128
Thread 1 (Thread 0x7fe7f57ce9c0 (LWP 23942))
This looks like an application-level bug, reassigning.
(In reply to comment #1) > This looks like an application-level bug, reassigning. Specifically, see bug 651133 comment 5 and the bugs it links to. The problem is that Evo is doing sync calls from the ::cancelled handler which is a big no-no.
(In reply to comment #2) > (In reply to comment #1) > > This looks like an application-level bug, reassigning. > > Specifically, see bug 651133 comment 5 and the bugs it links to. The problem is > that Evo is doing sync calls from the ::cancelled handler which is a big no-no. Hmm, I didn't get it from the bug. It (the comment you cited) links to 4 bugs, two closed, two still opened. From my point of view, a person which just uses a library, there is GCancellable and GDBus sub-libraries, which are pretty much unrelated from the outside. So the "big no-no" is not obvious to me at all. I want to let the server side know that user cancelled something, and I do not want to do any expectations about libraries I do not write, which I only use. So what is the supposed *workaround* for the bug in GCancellable/GDBus here, please? Calling my notification to the server from idle or timeout? It's not good either. Will there be any difference if the call will not be a GDBus synchronous call, i.e. if I just call the function asynchronously and will not wait for its result? How do I know that without looking into the code of the library I only use and do not write? I like the beginning of the comment you quoted, that "GCancellable tries to be clever". That "tries" is the key word for me. :) If it forces me to do any kind of unexpected workaround, then it tries it wrong. From my point of view. From the documentation on the GCancellable::cancelled signal I suppose only the last paragraph is relevant to this bug: > Note that the cancelled signal is emitted in the thread that the user > cancelled from, which may be the main thread. So, the cancellable signal > should not do something that can block. but I do not see the connection and why it should be a problem in this case. If you look in the above backtrace then you see that Thread 3 is calling g_cancellable_disconnect() while Thread 1 is inside g_cancellable_cancel(), but both on a different instance of GCancellable object. Thus the question is, why cannot G_LOCK(cancellable) be a lock per GCancellable instance, rather than a global lock? That may *fix* this kind of race, rather than workaround it. Am I right? By the way, is the cancellable lock locked while the signal is emitted? From sources of glib 2.30.1 I see that the lock is not locked while the signal is called, thus who is holding it for the Thread 3 that it is stuck? I tried to reproduce this, but it seems I'm not able to on my will, it didn't want to stuck when I wanted to make sure I didn't miss any relevant thread, which would be under GCancellable call where the global lock would be used.
Created attachment 211312 [details] [review] eds patch for evolution-data-server; Never mind, I made this in the EClient code. The patch fixes more dependant things, namely: a) the "cancelled" callback on a GCancellable schedules its "cancel" to idle b) authentication request processing is not piled on each other c) if client_utils_open_new_auth_cb() is called when the operation was already cancelled, then a user is not asked for a password There all are connected, thus I made them in one patch.
Created commit 2d6f9dc in eds master (3.5.1+) Created commit 57824e4 in eds gnome-3-4 (3.4.1+)