GNOME Bugzilla – Bug 606813
deadlock
Last modified: 2010-02-24 13:43:25 UTC
Created attachment 151312 [details] stack trace of deadlocked threads This might be related to bug 600322 but I thought I would open a new bug as I am not even nearly sure. In any case, evolution is fully responsive although the number of actions in the status bar are stacking up and not making progress.
Created attachment 151464 [details] stack trace of deadlocked threads Here's another. It could be that this happens when the load on my IMAP server gets high and it's response latency grows. I want to add, that evolution is not entirely locked up when this happens. I can still select things from the menu and I can quit evolution. When it's locked up like this it's just not getting new messages from mailboxes, etc.
Here it is again. It certainly does seem to correlate to times of the day when there is load on the (imap+dns+everything) server. But surely, slow or lack of response from network services should not pooch evolution to the point of being deadlocked and needing a restart.
+ Trace 220198
Created attachment 152002 [details] another stack trace of deadlocked threads Here's another.
Taking the liberty of CCing mcrha@redhat.com given comment #0.
Created attachment 152241 [details] yet another stack trace of deadlocked threads Here's another.
(In reply to comment #4) > Taking the liberty of CCing mcrha given comment #0. OK, just do not write my email to the comment next time. Thanks.
It seems to me like it's not a deadlock, it's "only" a very slow operation. First few threads are sync_request_thread_cb for SQLite. Then there are many threads from IMAP and NNTP requesting addrinfo for their servers (camel_getaddrinfo) and other bunch of threads is fulfilling these requests through NSS. The imap_sync and nntp_connect_online are holding their connect locks, thus any operating using these locks is "frozen" until the lock is released by the previous call. There should be enough to cancel the operation to recover Evolution, but as you mentioned bug #600322 then I guess you've some test patch applied from there, do not you?
(In reply to comment #7) > It seems to me like it's not a deadlock, it's "only" a very slow operation. Well, slow would have to be glacially slow then. I have waited many many hours (overnight even) and the situation does not resolve. Just the number of tasks pending on the status bar grows more and more. > There should be enough to cancel the operation to recover Evolution, but as you > mentioned bug #600322 then I guess you've some test patch applied from there, > do not you? Yes, I do. I currently have attachment 150926 [details] from bug 600322 applied.
Created attachment 152339 [details] hang on quit after deadlock For a slightly different incarnation of this, this is a stacktrace of an evolution that hung (completely -- no window refreshing, etc.) when I tried to quit after it got into the deadlocked state described in this bug.
(In reply to comment #8) > Yes, I do. I currently have attachment 150926 [details] from bug 600322 applied. that shouldn't have any negative impact on this, on the first look, except of instead crashing printing the message on console. I looked more closely to the last backtrace and I see that most of the threads are waiting on a lock, which seems to be held by some previous call to getaddrinfo or similar. So it seem to me like the underlying library is trying to be thread safe, but doesn't count with situations where one cancels thread by force, and keeps the lock locked, which makes any subsequent calls to the same function deadlock, waiting for a release of something what is locked. It also explains why there was no issue with this before, because Camel's code didn't change for years. So it should be the library. I'll try to workaround it, by disabling cancelling of the thread while inside those functions, but if I'm right, then the real fix should go to the library, not to Camel.
Created attachment 152406 [details] test patch for evolution-data-server; Please give a try to this patch. It's merged with the one from bug #600322 mentioned above.
Created attachment 152813 [details] test patch ][ for evolution-data-server; As discussed on IRC I entered a bug #608766 for removing pthread calls from evolution modules, which includes this part as well. The main change for the camel-net-utils.c is that the GThread cannot be cancelled, thus that will have impact on this bug as well. So instead of the test patch (how is it working for you, by the way), it'll be changed as shown in a snip of the patch from the mentioned bug. Could you give it a try before the full patch will be reviewed, approved and committed, please? This was generated for master, without any other test patches, thus remove any other which are touching the same file and use this one only. Thanks in advance.
*** Bug 608427 has been marked as a duplicate of this bug. ***
(In reply to comment #12) > Created an attachment (id=152813) [details] > test patch ][ > > for evolution-data-server; I had to massage this patch a bit to apply here. I will attach the massaged patch here. Please review for correctness.
Created attachment 153340 [details] respin of attachment 152813 [details] for local application Here is the version of attachment 152813 [details] that I applied here. Please review for correctness.
(In reply to comment #12) > Created an attachment (id=152813) [details] > test patch ][ > > for evolution-data-server; This patch (my reworking of it, anyway, but I don't think this is a result of my massaging) produces the following errors on build: camel-net-utils.c:691: error: ‘PTHREAD_CANCEL_ENABLE’ undeclared (first use in this function) camel-net-utils.c:691: error: (Each undeclared identifier is reported only once camel-net-utils.c:691: error: for each function it appears in.) camel-net-utils.c:693: warning: implicit declaration of function ‘pthread_setcancelstate’ camel-net-utils.c:693: error: ‘PTHREAD_CANCEL_DISABLE’ undeclared (first use in this function) camel-net-utils.c: In function ‘cs_getnameinfo’: camel-net-utils.c:838: error: ‘PTHREAD_CANCEL_ENABLE’ undeclared (first use in this function) camel-net-utils.c:840: error: ‘PTHREAD_CANCEL_DISABLE’ undeclared (first use in this function) Any ideas?
I see, there left a previous patch, the one which adds > pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, &old_cancelstate); and similar lines to the source file. Remove that patch, and it should be OK. I guess the patch should also apply cleanly after that, notice you shouldn't apply any other test patch to the camel-net-utils.c together with this one.
(In reply to comment #17) > I see, there left a previous patch, the one which adds > > pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, &old_cancelstate); > and similar lines to the source file. Yes. That would be attachment 150926 [details] from bug 600322. > Remove that patch, and it should be OK. OK. Will do. > I guess the patch should also apply cleanly after that, notice you shouldn't > apply any other test patch to the camel-net-utils.c together with this one. OK.
This got fixed within bug #600322, thus I'm closing it as a duplicate. Thanks. *** This bug has been marked as a duplicate of bug 600322 ***