After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 606813 - deadlock
deadlock
Status: RESOLVED DUPLICATE of bug 600322
Product: evolution
Classification: Applications
Component: Mailer
2.28.x (obsolete)
Other Linux
: Normal critical
: ---
Assigned To: evolution-mail-maintainers
Evolution QA team
: 608427 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2010-01-13 05:32 UTC by Brian J. Murrell
Modified: 2010-02-24 13:43 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
stack trace of deadlocked threads (242.19 KB, text/plain)
2010-01-13 05:32 UTC, Brian J. Murrell
Details
stack trace of deadlocked threads (418.64 KB, text/plain)
2010-01-15 12:17 UTC, Brian J. Murrell
Details
another stack trace of deadlocked threads (575.88 KB, text/plain)
2010-01-22 12:06 UTC, Brian J. Murrell
Details
yet another stack trace of deadlocked threads (131.30 KB, text/plain)
2010-01-25 16:41 UTC, Brian J. Murrell
Details
hang on quit after deadlock (167.44 KB, text/plain)
2010-01-26 20:07 UTC, Brian J. Murrell
Details
test patch (9.64 KB, text/plain)
2010-01-27 14:50 UTC, Milan Crha
Details
test patch ][ (4.21 KB, text/plain)
2010-02-02 08:50 UTC, Milan Crha
Details
respin of attachment 152813 for local application (3.91 KB, text/plain)
2010-02-09 16:40 UTC, Brian J. Murrell
Details

Description Brian J. Murrell 2010-01-13 05:32:06 UTC
Created attachment 151312 [details]
stack trace of deadlocked threads

This might be related to bug 600322 but I thought I would open a new bug as I am not even nearly sure.  In any case, evolution is fully responsive although the number of actions in the status bar are stacking up and not making progress.
Comment 1 Brian J. Murrell 2010-01-15 12:17:27 UTC
Created attachment 151464 [details]
stack trace of deadlocked threads

Here's another.  It could be that this happens when the load on my IMAP server gets high and it's response latency grows.

I want to add, that evolution is not entirely locked up when this happens.  I can still select things from the menu and I can quit evolution.

When it's locked up like this it's just not getting new messages from mailboxes, etc.
Comment 2 Brian J. Murrell 2010-01-21 18:49:40 UTC
Here it is again.  It certainly does seem to correlate to times of the day when there is load on the (imap+dns+everything) server.  But surely, slow or lack of response from network services should not pooch evolution to the point of being deadlocked and needing a restart.


Comment 3 Brian J. Murrell 2010-01-22 12:06:40 UTC
Created attachment 152002 [details]
another stack trace of deadlocked threads

Here's another.
Comment 4 Brian J. Murrell 2010-01-25 16:40:24 UTC
Taking the liberty of CCing mcrha@redhat.com given comment #0.
Comment 5 Brian J. Murrell 2010-01-25 16:41:21 UTC
Created attachment 152241 [details]
yet another stack trace of deadlocked threads

Here's another.
Comment 6 Milan Crha 2010-01-25 17:51:17 UTC
(In reply to comment #4)
> Taking the liberty of CCing mcrha given comment #0.

OK, just do not write my email to the comment next time. Thanks.
Comment 7 Milan Crha 2010-01-25 18:14:57 UTC
It seems to me like it's not a deadlock, it's "only" a very slow operation. First few threads are sync_request_thread_cb for SQLite. Then there are many threads from IMAP and NNTP requesting addrinfo for their servers (camel_getaddrinfo) and other bunch of threads is fulfilling these requests through NSS. The imap_sync and nntp_connect_online are holding their connect locks, thus any operating using these locks is "frozen" until the lock is released by the previous call.

There should be enough to cancel the operation to recover Evolution, but as you mentioned bug #600322 then I guess you've some test patch applied from there, do not you?
Comment 8 Brian J. Murrell 2010-01-26 14:27:07 UTC
(In reply to comment #7)
> It seems to me like it's not a deadlock, it's "only" a very slow operation.

Well, slow would have to be glacially slow then.  I have waited many many hours (overnight even) and the situation does not resolve.  Just the number of tasks pending on the status bar grows more and more.

> There should be enough to cancel the operation to recover Evolution, but as you
> mentioned bug #600322 then I guess you've some test patch applied from there,
> do not you?

Yes, I do.  I currently have attachment 150926 [details] from bug 600322 applied.
Comment 9 Brian J. Murrell 2010-01-26 20:07:51 UTC
Created attachment 152339 [details]
hang on quit after deadlock

For a slightly different incarnation of this, this is a stacktrace of an evolution that hung (completely -- no window refreshing, etc.) when I tried to quit after it got into the deadlocked state described in this bug.
Comment 10 Milan Crha 2010-01-27 14:36:52 UTC
(In reply to comment #8)
> Yes, I do.  I currently have attachment 150926 [details] from bug 600322 applied.

that shouldn't have any negative impact on this, on the first look, except of instead crashing printing the message on console.

I looked more closely to the last backtrace and I see that most of the threads are waiting on a lock, which seems to be held by some previous call to getaddrinfo or similar. So it seem to me like the underlying library is trying to be thread safe, but doesn't count with situations where one cancels thread by force, and keeps the lock locked, which makes any subsequent calls to the same function deadlock, waiting for a release of something what is locked.

It also explains why there was no issue with this before, because Camel's code didn't change for years. So it should be the library. I'll try to workaround it, by disabling cancelling of the thread while inside those functions, but if I'm right, then the real fix should go to the library, not to Camel.
Comment 11 Milan Crha 2010-01-27 14:50:52 UTC
Created attachment 152406 [details]
test patch

for evolution-data-server;

Please give a try to this patch. It's merged with the one from bug #600322 mentioned above.
Comment 12 Milan Crha 2010-02-02 08:50:38 UTC
Created attachment 152813 [details]
test patch ][

for evolution-data-server;

As discussed on IRC I entered a bug #608766 for removing pthread calls from evolution modules, which includes this part as well. The main change for the camel-net-utils.c is that the GThread cannot be cancelled, thus that will have impact on this bug as well. So instead of the test patch (how is it working for you, by the way), it'll be changed as shown in a snip of the patch from the mentioned bug.

Could you give it a try before the full patch will be reviewed, approved and committed, please? This was generated for master, without any other test patches, thus remove any other which are touching the same file and use this one only. Thanks in advance.
Comment 13 Milan Crha 2010-02-09 15:12:11 UTC
*** Bug 608427 has been marked as a duplicate of this bug. ***
Comment 14 Brian J. Murrell 2010-02-09 16:36:56 UTC
(In reply to comment #12)
> Created an attachment (id=152813) [details]
> test patch ][
> 
> for evolution-data-server;

I had to massage this patch a bit to apply here.  I will attach the massaged patch here.  Please review for correctness.
Comment 15 Brian J. Murrell 2010-02-09 16:40:27 UTC
Created attachment 153340 [details]
respin of attachment 152813 [details] for local application

Here is the version of attachment 152813 [details] that I applied here.  Please review for correctness.
Comment 16 Brian J. Murrell 2010-02-09 17:06:56 UTC
(In reply to comment #12)
> Created an attachment (id=152813) [details]
> test patch ][
> 
> for evolution-data-server;

This patch (my reworking of it, anyway, but I don't think this is a result of my massaging) produces the following errors on build:

camel-net-utils.c:691: error: ‘PTHREAD_CANCEL_ENABLE’ undeclared (first use in this function)
camel-net-utils.c:691: error: (Each undeclared identifier is reported only once
camel-net-utils.c:691: error: for each function it appears in.)
camel-net-utils.c:693: warning: implicit declaration of function ‘pthread_setcancelstate’
camel-net-utils.c:693: error: ‘PTHREAD_CANCEL_DISABLE’ undeclared (first use in this function)
camel-net-utils.c: In function ‘cs_getnameinfo’:
camel-net-utils.c:838: error: ‘PTHREAD_CANCEL_ENABLE’ undeclared (first use in this function)
camel-net-utils.c:840: error: ‘PTHREAD_CANCEL_DISABLE’ undeclared (first use in this function)

Any ideas?
Comment 17 Milan Crha 2010-02-09 18:50:52 UTC
I see, there left a previous patch, the one which adds
> pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, &old_cancelstate);
and similar lines to the source file. Remove that patch, and it should be OK.
I guess the patch should also apply cleanly after that, notice you shouldn't apply any other test patch to the camel-net-utils.c together with this one.
Comment 18 Brian J. Murrell 2010-02-09 19:03:40 UTC
(In reply to comment #17)
> I see, there left a previous patch, the one which adds
> > pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, &old_cancelstate);
> and similar lines to the source file.

Yes.  That would be attachment 150926 [details] from bug 600322.

> Remove that patch, and it should be OK.

OK.  Will do.

> I guess the patch should also apply cleanly after that, notice you shouldn't
> apply any other test patch to the camel-net-utils.c together with this one.

OK.
Comment 19 Milan Crha 2010-02-24 13:43:25 UTC
This got fixed within bug #600322, thus I'm closing it as a duplicate. Thanks.

*** This bug has been marked as a duplicate of bug 600322 ***