After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 629916 - imapx fails to handle errors in imapx_command_idle_stop()
imapx fails to handle errors in imapx_command_idle_stop()
Status: RESOLVED FIXED
Product: evolution-data-server
Classification: Platform
Component: Mailer
2.32.x (obsolete)
Other Linux
: Normal normal
: ---
Assigned To: evolution-mail-maintainers
Evolution QA team
evolution[imapx]
Depends on:
Blocks:
 
 
Reported: 2010-09-17 11:39 UTC by David Woodhouse
Modified: 2013-09-14 16:54 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch (2.55 KB, patch)
2010-09-20 14:21 UTC, David Woodhouse
accepted-commit_after_freeze Details | Review

Description David Woodhouse 2010-09-17 11:39:08 UTC
There are some other 'deadlock on exit' bugs but they all seem to have different diagnoses and even patches already committed.

I had been running with only one connection at a time, and things had been OK. Then for *one* account I increased the number of concurrent connections to 5. I hadn't restarted Evolution, and I think it was only actually using *one* connection still. I tried to close Evolution, and it didn't close -- say there saying 'Storing folder...'.

[imapx:A] Found connection for bugzilla and connection number 1 
[imapx:A] checking/storing on flags '\SEEN'
[imapx:A] adding command, fmt = 'UID STORE '
[imapx:A] uidset add '15907'
[imapx:A]  start
[imapx:A] adding command, fmt = '%d'
[imapx:A] got gint '15907'
[imapx:A] adding command, fmt = ' %tFLAGS.SILENT (%t)'
[imapx:A] completing command buffer is [37] 'UID STORE 15907 +FLAGS.SILENT (\SEEN)'
[imapx:A] enqueue job 'UID STORE 15907 +FLAGS.SILENT (\SEEN)'
[imapx:A] ** Starting next command
[imapx:A] camel_imapx_write: 'DONE
'
[imapx:A] Stopping idle after 5 seconds
[imapx:A] waiting for idle to stop 

Investigation showed that it really *was* waiting for idle to stop. I logged into the server and ran strace on the dovecot process, and it was still idling, occasionally saying '* OK Still here...'. Did we fail to actually write the DONE command somehow?
Comment 1 David Woodhouse 2010-09-17 11:42:39 UTC
(In reply to comment #0)
> Did we fail to actually write the DONE command somehow?

An added printf confirms that hypothesis:

[imapx:B] camel_imapx_write: 'DONE
'
Failed to issue DONE!
Comment 2 David Woodhouse 2010-09-17 12:18:10 UTC
Hah, it failed because in camel-tcp-stream-raw.c::write_to_prfd(), the call to camel_operation_cancel_check() returns true -- and thus we refuse to write the DONE to the stream. And then we can't store the outstanding changed flags, and thus Evolution never actually exits.
Comment 3 David Woodhouse 2010-09-17 12:22:05 UTC
Ah, that realisation makes it easier to reproduce. Select a folder, change a flag (any flag), then wait until it enters idle. And then quit evolution, and it'll hang.
Comment 4 David Woodhouse 2010-09-17 12:47:48 UTC
The patch in https://bugzilla.gnome.org/show_bug.cgi?id=620609#c4 fixes the fact that this bug is so easy to reproduce at exit.

But we still have issues with the error handling here -- we still get stuck if the operation gets cancelled while we're in IDLE. Even if we remove the code sequence which *reliably* triggers that situation, it can still happen and still needs to be fixed.

We need to handle errors from imapx_command_idle_stop(), and ensure that any pending commands get queued up on a new connection. Because those errors might happen for reasons *other* than cancellation.

We *also* ideally would avoid having to do that just because of cancellation -- we shouldn't have to throw away the IMAP connection and make a new one.
Comment 5 David Woodhouse 2010-09-17 15:54:10 UTC
I put in some basic error handling for the case where imapx_command_idle_stop() fails to write the DONE for whatever reason:

--- a/camel/providers/imapx/camel-imapx-server.c
+++ b/camel/providers/imapx/camel-imapx-server.c
@@ -2098,6 +2101,10 @@ imapx_command_idle_stop (CamelIMAPXServer *is, GError **e
                g_set_error (
                        error, CAMEL_IMAPX_ERROR, 1,
                        "Unable to issue DONE");
+               c(is->tagprefix, "Failed to issue DONE to terminate IDLE\n");
+               is->state = IMAPX_SHUTDOWN;
+               camel_operation_cancel(is->op);
+               is->parser_quit = TRUE;
                return FALSE;
        }
 

However, it still sucks -- for *every* folder we still get an imapx_sync() call, which calls camel_imapx_store_get_server() even if it doesn't have any outstanding changes to write. And camel_imapx_store_get_server() will *fail*, with symptoms very similar to those in bug 621890.
Comment 6 David Woodhouse 2010-09-20 14:21:14 UTC
Created attachment 170663 [details] [review]
patch
Comment 7 David Woodhouse 2010-09-20 14:22:01 UTC
Review of attachment 170663 [details] [review]:

for 2.32 too...