GNOME Bugzilla – Bug 720197
[IMAPx] Indefinite waiting for message download
Last modified: 2017-07-19 20:46:45 UTC
04:45:02 <andre_> when Evo gets stuck in my GMail IMAP account with "Retrieving message 12345 in folder 'example'" and cancelling just appends the string "(cancelling)" and closing Evo does not work so I have to kill it, how would I debug that? 04:47:04 <mbarnes> know anything about GCancellable? 04:48:08 <andre_> my very naive expectation is that there should be some timeout. It was displayed for more than an hour before killing evolution. 04:48:29 <mbarnes> it's just a thing to signal operations to cancel 04:48:52 <mbarnes> it emits a "cancelled" signal, and the operation is supposed to acknowledge that and return with a G_IO_ERROR_CANCELLED error 04:49:27 <mbarnes> the (cancelling) message means the "cancelled" signal was fired 04:49:59 <mbarnes> if it's stuck on that, it usually means the operation is deadlocked 04:50:24 <mbarnes> gdb trace should be sufficient 04:50:53 <mbarnes> look for g_mutex_lock() or g_cond_wait() at the top of one of the threads 04:51:01 <andre_> so I'd Ctrl+C at some point after getting stuck? 04:51:01 <mbarnes> those are the usual culprits 04:52:30 <rishi> You could use "gstack <pid>" too. 05:05:04 <andre> as I got an Evolution being stuck with "Retrieving message" for a few years again, thread 3 and thread 2 have g_cond_wait on top: $:andre\> gstack 13664
+ Trace 232898
Try this test package: http://koji.fedoraproject.org/koji/taskinfo?taskID=6276999 This is somewhat of a guess, but shouldn't hurt anything if I'm wrong. The patch is against 3.10.3, which I built for Fedora yesterday, so you'll have to upgrade Evolution with this. I don't think it's even hit updates-testing yet so you may have to download evolution-3.10.3-1.fc20 from the website.
I've been running the test package and it seems to be slightly better, however ran into this again right now: $:andre\> gstack 2504
+ Trace 232918
Thread 1 (Thread 0xb0f04900 (LWP 2504))
This might be fixed with evolution-data-server 3.12.2+. Please try with it and report back, once it is out.
I won't upgrade to 3.12 soon, seeing other major IMAP+ issues in 3.12 reported and having had lots of IMAP+ instability issues in 3.10 already... For the records, the problem described in this bug report still happens in evolution-data-server-3.10.4-3.fc20.i686 evolution-3.10.4-2.fc20.i686
It would worth to try 3.12.2+ in a virtual machine and see whether anything was improved for you too.
Andre, have you upgraded/tested with 3.12.5+ ? I'm seeing such hangs still, but you seem to have a much better grasp on how to actually debug these issues...
I don't have the time for setting up and testing stuff in virtual machines currently, and I have my doubts anyway that things have really improved. https://wiki.gnome.org/Apps/Evolution/Debugging for debugging information.
Problem still happens in evolution-3.12.9-1.fc21.i686 evolution-data-server-3.12.9-2.fc21.i686 Tried to close Evolution via the "X" button which disables the UI but never ever ever ends the application. Status bar shows the line: Retrieving message '2464' in gerrit-notifications (cancelling) $:andre\> gstack 21890
+ Trace 234478
See Thread 8 in that stacktrace.
Okay, so it's not fixed. I'd like to know how to reproduce this reliably, to be able to debug it locally.
> I'd like to know how to reproduce this reliably Milan: Well, get a crappy internet provider with a flaky connection and run "ping" against imap.gmail.com so you realize that some packages get lost? :) I don't think you can easily reproduce this if you don't have a "flaky connection" emulator somewhere... Instead feel free to tell me which further information I can offer. Or take a bus/car/train and be my guest for a few days. :P Stacktrace while running Evo with that stuck message in the status bar: $:andre\> gstack 2456
+ Trace 234544
Garr. Stacktrace parser hides the fact that the previous comment includes a second stacktrace, after clicking the "X" icon to close Evolution so Evolution has become greyed out but never closes.
I think I found out what the problem is, finally. The thing is that the message download can be done by two ways: a) true download of the message through the network/connection; b) if there is an ongoing download job, then wait for its completion and decide what to do only after it's done. The problem is that if there was two concurrent message download requests, and the second was waiting for the first completion and the first failed early, then the second was waiting for the completion indefinitely, because the now-done job wasn't unregistered properly. The relevant backtrace part is below. Created commit e3aa112 in eds master (3.15.92+) $3 0xae8bd47c in camel_imapx_job_wait (job=job@entry=0xf227680, error=error@entry=0x0) at camel-imapx-job.c:194 $4 0xae8d491c in imapx_server_get_message
*** Bug 723888 has been marked as a duplicate of this bug. ***
♥! Looking forward to trying 3.16 soon.
Aaah, a true oldschool deadlock! Thanks for this, Milan. Hoping it's squashed for good—time will tell I guess!
*** Bug 688661 has been marked as a duplicate of this bug. ***
I just managed to reproduce another variant of this bug. The symptoms were slightly different. When there were ongoing multiple command, and the active command failed with some error, only the pending command and the active one were cancelled, but there could be also scheduled a job, which didn't have set any commands yet, which was not cancelled too, thus it was waiting for the stopped server to be served. (The "description" probably doesn't make sense.) The fix is to cancel any leftover jobs after cancelled commands. Created commit 565ba6c in eds master (3.17.3+) Created commit b425140 in eds gnome-3-16 (3.16.3+)
*** Bug 733775 has been marked as a duplicate of this bug. ***
Downstream bug report claims similar issue: https://bugzilla.redhat.com/show_bug.cgi?id=1440734 This time the message list with mail reader prevent message download due to often cancel of an ongoing message download (from mail_reader_message_selected_cb()) possibly due to message list regeneration (non-threaded view in this case). There had been offered a patch for it (which I extended with a change in e-table-selection-model.c to make the call order consistent) and committed it to sources: Created commit_80c5a32 in evo master (3.25.2+) [1] Created commit_1b3d8b0 in evo gnome-3-24 (3.24.3+) [1] https://git.gnome.org/browse/evolution/commit/?id=80c5a32