GNOME Bugzilla – Bug 629916
imapx fails to handle errors in imapx_command_idle_stop()
Last modified: 2013-09-14 16:54:26 UTC
There are some other 'deadlock on exit' bugs but they all seem to have different diagnoses and even patches already committed. I had been running with only one connection at a time, and things had been OK. Then for *one* account I increased the number of concurrent connections to 5. I hadn't restarted Evolution, and I think it was only actually using *one* connection still. I tried to close Evolution, and it didn't close -- say there saying 'Storing folder...'. [imapx:A] Found connection for bugzilla and connection number 1 [imapx:A] checking/storing on flags '\SEEN' [imapx:A] adding command, fmt = 'UID STORE ' [imapx:A] uidset add '15907' [imapx:A] start [imapx:A] adding command, fmt = '%d' [imapx:A] got gint '15907' [imapx:A] adding command, fmt = ' %tFLAGS.SILENT (%t)' [imapx:A] completing command buffer is [37] 'UID STORE 15907 +FLAGS.SILENT (\SEEN)' [imapx:A] enqueue job 'UID STORE 15907 +FLAGS.SILENT (\SEEN)' [imapx:A] ** Starting next command [imapx:A] camel_imapx_write: 'DONE ' [imapx:A] Stopping idle after 5 seconds [imapx:A] waiting for idle to stop Investigation showed that it really *was* waiting for idle to stop. I logged into the server and ran strace on the dovecot process, and it was still idling, occasionally saying '* OK Still here...'. Did we fail to actually write the DONE command somehow?
(In reply to comment #0) > Did we fail to actually write the DONE command somehow? An added printf confirms that hypothesis: [imapx:B] camel_imapx_write: 'DONE ' Failed to issue DONE!
Hah, it failed because in camel-tcp-stream-raw.c::write_to_prfd(), the call to camel_operation_cancel_check() returns true -- and thus we refuse to write the DONE to the stream. And then we can't store the outstanding changed flags, and thus Evolution never actually exits.
Ah, that realisation makes it easier to reproduce. Select a folder, change a flag (any flag), then wait until it enters idle. And then quit evolution, and it'll hang.
The patch in https://bugzilla.gnome.org/show_bug.cgi?id=620609#c4 fixes the fact that this bug is so easy to reproduce at exit. But we still have issues with the error handling here -- we still get stuck if the operation gets cancelled while we're in IDLE. Even if we remove the code sequence which *reliably* triggers that situation, it can still happen and still needs to be fixed. We need to handle errors from imapx_command_idle_stop(), and ensure that any pending commands get queued up on a new connection. Because those errors might happen for reasons *other* than cancellation. We *also* ideally would avoid having to do that just because of cancellation -- we shouldn't have to throw away the IMAP connection and make a new one.
I put in some basic error handling for the case where imapx_command_idle_stop() fails to write the DONE for whatever reason: --- a/camel/providers/imapx/camel-imapx-server.c +++ b/camel/providers/imapx/camel-imapx-server.c @@ -2098,6 +2101,10 @@ imapx_command_idle_stop (CamelIMAPXServer *is, GError **e g_set_error ( error, CAMEL_IMAPX_ERROR, 1, "Unable to issue DONE"); + c(is->tagprefix, "Failed to issue DONE to terminate IDLE\n"); + is->state = IMAPX_SHUTDOWN; + camel_operation_cancel(is->op); + is->parser_quit = TRUE; return FALSE; } However, it still sucks -- for *every* folder we still get an imapx_sync() call, which calls camel_imapx_store_get_server() even if it doesn't have any outstanding changes to write. And camel_imapx_store_get_server() will *fail*, with symptoms very similar to those in bug 621890.
Created attachment 170663 [details] [review] patch
Review of attachment 170663 [details] [review]: for 2.32 too...