GNOME Bugzilla – Bug 667725
camel-imapx:ERROR:camel-imapx-server.c:1628:imapx_untagged: code should not be reached
Last modified: 2012-05-22 11:21:37 UTC
[imapx:A] camel_imapx_read: buffer is ') * 13002 FETCH (UID 752042 RFC822.SIZE 6581 RFC822.HEADER {2970} Return-path: <linux-kernel-owner@vger.kernel.org> Envelope-to: dwmw2@twosheds.infradead.org Delivery-date: Thu, 04 Aug 2011 03:15:00 +0000 Received: from casper.infradead.org ([2001:770:15f::2]) by twosheds.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1QooOh-0000lz-N4 for dwmw2@twosheds.infradead.org; Thu, 04 Aug 2011 03:14:59 +0000 Received: from vger.kernel.org ([209.132.180.67]) by casper.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1QooOe-00083F-GU; Thu, 04 Aug 2011 03:14:56 +0000 Received: (major' [imapx:A] token ')' ** camel-imapx:ERROR:camel-imapx-server.c:1628:imapx_untagged: code should not be reached Aborted This seems to be aborting because it's receiving message information without flags (which means a message body fetch) for a message it doesn't know exists. Will upload some debugging information shortly... This seems to be happening while I run my mail archival script at http://david.woodhou.se/archivemail.sh — which basically means that a *lot* of messages are being removed from the folders while Evolution is running.
Log at http://david.woodhou.se/gnome-bug-667725.txt.gz (35MiB) in all its glory, starting from the moment we select the offending folder. We try QRESYNC, but we still end up out of sync. Not entirely sure why; it may be because we already *crashed* a couple of times so our cache is already slightly confused. But that's OK; we can just fall back to the non-QRESYNC method of fetching all flags for all messages. And thus we do, at about line 219920 of the log. At line 427334 we see the server inform us about message #752042. At line 427649 that information from the server is parsed. At line 1475206 we add it to the uidset for a FETCH command At line 1475714 that FETCH command is sent. (Note: there is weirdness here with the uidset handling. We're giving bloody odd ranges; 752039:752040,752040:752537 should just be '752039:752537'. That may be unrelated. It's even weirder at line 1456135, where we seem to use a one-UID range for *dozens* of consecutive UIDs! It's something to do with the same uid being added more than once. Which may be relevant to our other issues too.) At line 1475944 we receive the header for message #752042, quite happily. At line 1781818 the server tells us that the message has gained the \Deleted flag. At line 7375477 it's added to the uidset of a fetch command *again*, for some reason. At line 7475743 that FETCH command is sent. At line 7375835 the server gives us the mail again. And we promptly crap ourselves. I don't know why we're refetching the *same* mail, and I certainly don't know why we're doing so when it doesn't seem to *exist* any more in our uid list. Confused. At line 7375835 we receive the body of message #752042, that we asked for. And promptly crap ourselves.
Another similar crash with 3.4.1 at http://david.woodhou.se/evo-3.4-imapx-crash.txt I think I've fixed the weirdness with UID ranges at some point in the past, but the real bug still exists.
This is the set of commands in the latest dump, before it crashes because it didn't expect to see a FETCH without flags for message 15839. [imapx:C] camel_imapx_write: 'C00152 SELECT lists.arch [imapx:C] camel_imapx_write: 'C00153 UID FETCH 15401:* (UID FLAGS) (returns flags for messages 15839-18151) [imapx:C] camel_imapx_write: 'C00151 UID FETCH 12738:* (UID FLAGS) (returns flags for messages 15839-18151 again) [imapx:C] camel_imapx_write: 'C00154 NOOP [imapx:C] camel_imapx_write: 'C00156 UID FETCH 15839:16338 (RFC822.SIZE RFC822.HEADER) [imapx:C] camel_imapx_write: 'C00158 NOOP [imapx:C] camel_imapx_write: 'C00159 STATUS lists.arm (MESSAGES UNSEEN UIDVALIDITY UIDNEXT HIGHESTMODSEQ) [imapx:C] camel_imapx_write: 'C00161 UID FETCH 16338:16837 (RFC822.SIZE RFC822.HEADER) [imapx:C] camel_imapx_write: 'C00164 NOOP [imapx:C] camel_imapx_write: 'C00166 UID FETCH 16837:17336 (RFC822.SIZE RFC822.HEADER) [imapx:C] camel_imapx_write: 'C00168 NOOP [imapx:C] camel_imapx_write: 'C00172 UID FETCH 17336:17835 (RFC822.SIZE RFC822.HEADER) [imapx:C] camel_imapx_write: 'C00176 NOOP [imapx:C] camel_imapx_write: 'C00180 UID FETCH 17835:18151,15839:16021 (RFC822.SIZE RFC822.HEADER) You'll note we fetch message 15839 twice. The first time it arrives, we *accept* it. But then the second time we abort. We seem to have 'discovered' the full set of messages 15839-18151 *twice*, and we're *fetching* them all twice. And crashing as soon as we start receiving the second copy. (There's also an off-by-one in our uidset handling or the code to limit it to 500 messages at once, and for each block of 500 we fetch a message twice *too* —message 16837, 17336, 17835 are all fetched as the last message of one set *and8 the first message of the next. But that doesn't seem to hurt us.)
Created attachment 214473 [details] [review] Proposed fix We ended up fetching the same set of messages more than once, and then aborting because we got confused and ended up fetching a message we didn't *think* we were fetching. It turns out this happening when we queued a complete scan (i.e. 'UID FETCH 1:* (UID FLAGS)' but the SELECT of the folder was 'noticing' that UIDNEXT changed, and automatically calling fetch_new_messages() to queue *another* similar FETCH. The set of uids to fetch ended up with a bunch of duplicates, and it all kind of goes downhill from there. Fix that by disabling the fetch_new_messages() call on successful select, if an existing fetch for scan_changes is already queued. Also fix an off-by-one error in the code which limits batches to 500 messages, which was causing the boundary message to be fetched twice; once as the last in one set, and then again as the first in the next.
Created attachment 214475 [details] [review] more patch The above patch is working, but I've found another case in which we can trigger the same crash. When we call imapx_job_scan_changes_start() and fetch the full set of message flags, and then imapx_command_step_fetch_done() is called after the results are in, to fetch the headers of new messages. But *while* we're fetching the headers, a flag changes on an existing message, and the server gives us an unsolicited FETCH message with the new flags. In my case, the server is *including* the UID of the changed message. There's no reason why it shouldn't. But we don't understand that. We *assume* that if the UID is given, then it's not unsolicited; that it was something we *asked* for. And thus we append the UID to our list of to-be-fetched messages. And fetch it, and then crash because we don't find it in the uids list (which is happening because they're not sorted and we do a binary chop search in the list). This patch should probably fix that too, but I'm increasingly convinced that imapx shouldn't *care* about solicited vs. unsolicited stuff, and that anywhere in imapx_untagged() that knows *anything* about what jobs/commands are outstanding should be considered a design flaw. The IMAP server tells us stuff in untagged responses, period. We should learn from that. We should *use* that to trigger new commands, by all means, but it's almost never sane to interpret the untagged responses differently according to what we asked the server. IMAP just isn't designed that way.
I trust your judgement over mine; I can barely make heads or tails of that code. Go ahead and commit.
To ssh://dwmw2@git.gnome.org/git/evolution-data-server 8dbcc64..f86152c gnome-3-4 -> gnome-3-4 e689ff0..17f3fa1 master -> master
david, what you have mentioned in comment #5 is also very valid. It would be nice if you can file a bug on the same and i will take care of it.
(In reply to comment #8) > david, what you have mentioned in comment #5 is also very valid. It would be > nice if you can file a bug on the same and i will take care of it. Bug 676548