GNOME Bugzilla – Bug 725720
Livelock with 3.11.91
Last modified: 2015-08-18 10:50:59 UTC
This started happening after the upgrade from 3.11.90 to 3.11.91 evolution-3.11.91-1.fc21.x86_64 evolution-data-server-3.11.91-1.fc21.x86_64 Here is the pstack of the evolution process and below it the CAMEL_DEBUG=all log when it happens.
+ Trace 233266
[imapx:B] Have token 'FETCH' id 25421 [imapx:B] got untagged response [imapx:B] Have token 'FETCH' id 25422 [imapx:B] got untagged response [imapx:B] Have token 'FETCH' id 25423 [imapx:B] got untagged response [imapx:B] Have token 'FETCH' id 25424 (evolution:10605): GLib-CRITICAL **: Source ID 215180 was not found when attempting to remove it [imapx:B] adding command, format = 'UID FETCH %t (BODY.PEEK[]' [imapx:B] adding command, format = '<%u.%u>' [imapx:B] got gint '0' (evolution:10605): GLib-GObject-CRITICAL **: g_closure_unref: assertion 'closure->ref_count > 0' failed [imapx:B] got gint '20480' [imapx:B] adding command, format = ')' [imapx:B] completing command buffer is [39] 'UID FETCH 639128 (BODY.PEEK[]<0.20480>)' [imapx:B] enqueue job 'UID FETCH 639128 (BODY.PEEK[]<0.20480>)' [imapx:B] ** Starting next command [imapx:B] - we're selected on 'INBOX.gnome.CVS', current jobs? [imapx:B] - 0 'FETCH' [imapx:B] -- Checking job queue [imapx:B] -- 100 'FETCH'? [imapx:B] Selecting mailbox 'INBOX' for command 'FETCH'(0x7fea01f6fde0) [imapx:B] adding command, format = 'UID FETCH %t (BODY.PEEK[]' [imapx:B] adding command, format = '<%u.%u>' [imapx:B] got gint '20480' [imapx:B] got gint '20480' [imapx:B] adding command, format = ')' [imapx:B] completing command buffer is [43] 'UID FETCH 639128 (BODY.PEEK[]<20480.20480>)' [imapx:B] enqueue job 'UID FETCH 639128 (BODY.PEEK[]<20480.20480>)' [imapx:B] ** Starting next command [imapx:B] - we're selected on 'INBOX.gnome.CVS', current jobs? [imapx:B] - 0 'FETCH' [imapx:B] -- Checking job queue [imapx:B] -- 100 'FETCH'? [imapx:B] -- 100 'FETCH'? [imapx:B] Selecting mailbox 'INBOX' for command 'FETCH'(0x47ceb30) [imapx:B] adding command, format = 'UID FETCH %t (BODY.PEEK[]' [imapx:B] adding command, format = '<%u.%u>' [imapx:B] got gint '40960' [imapx:B] got gint '20480' [imapx:B] adding command, format = ')' [imapx:B] completing command buffer is [43] 'UID FETCH 639128 (BODY.PEEK[]<40960.20480>)' [imapx:B] enqueue job 'UID FETCH 639128 (BODY.PEEK[]<40960.20480>)' [imapx:B] ** Starting next command [imapx:B] - we're selected on 'INBOX.gnome.CVS', current jobs? [imapx:B] - 0 'FETCH' [imapx:B] -- Checking job queue [imapx:B] -- 100 'FETCH'? [imapx:B] -- 100 'FETCH'? [imapx:B] -- 100 'FETCH'? [imapx:B] Selecting mailbox 'INBOX' for command 'FETCH'(0x7fe9fc004a00)
Forgot to add that this is not really reproducible, happens after some time of normal use. The GLib-*-CRITICAL stuff has been something ongoing for the last releases and unrealated to the lock I think.
Hard to tell a livelock from a static backtrace -- but in this case all the threads seem to be in a blocking poll() or waiting on a condition variable, except Thread 2 is using a timed wait.
+ Trace 233269
Thread 2 (Thread 0x7fea2e1f3700 (LWP 22678))
If that's the source of the livelock, frame #4 suggests perhaps this is a case of thread pool starvation -- like it's waiting for an available worker thread to handle some async task, but I can't tell what the task is. Not clear that this is IMAPX related. Did you upgrade other packages at the same time, like GLib or WebKit/GTK?
Its rawhide. Most of the gnome stack has been updated in the last couple days. When it happens the UI is responsive, eg you can scroll the message list or the preview pane, its just the background mail operations that are not working, the status stuck with one or two ongoing operations one of them "refreshing folder XX" and the camel debug log also stops.
Okay, I'll keep a lookout for it. I'm not on rawhide, in fact I intentionally build against older base libraries to help weed out these kinds of things.
You might be right about this not really locking. I've waited one such "lockup" for 5-10 minutes and eventually it finished with the refreshing of folder XX and everything was back to normal. The camel log was still stalled during all that waiting. I am not sure how to go about this further...
Could you retest with 3.12.2, when it'll be released the next week, please? I'll update Fedora rawhide shortly after the release. I made some fixes in IMAPx which should help here too. Thanks in advance.
Yanko pointed me here from bug #730788, when I mentioned a regression with changes from that bug, reported within bug #732366. Looking in the backtrace I'd guess that the eds with changes from the bug #732366 should address this. The 3.12.4 will be released the next week (what a coincidence with the comment #6).
Following the 3.17.x development and after the imapx rework in 3.17.90 I don't see any more stalls in the queue processing. Will close this one as incomplete for the old branch, but its definitely fixed in master. Thanks