GNOME Bugzilla – Bug 713530
Less intrusive background sync'ing
Last modified: 2018-02-19 01:04:09 UTC
---- Reported by geary-maint@gnome.bugs 2013-08-09 18:20:00 -0700 ---- Original Redmine bug id: 7325 Original URL: http://redmine.yorba.org/issues/7325 Searchable id: yorba-bug-7325 Original author: Brendan Long Original description: I just left Geary running in the background, and the log is full of messages like this: [deb] 19:15:31 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1875-06-19T11:49:28-0659 [deb] 19:15:32 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1875-05-19T11:49:28-0659 [deb] 19:15:32 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1875-04-19T11:49:28-0659 [deb] 19:15:33 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1875-03-19T11:49:28-0659 [deb] 19:15:33 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1875-02-19T11:49:28-0659 [deb] 19:15:33 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1875-01-19T11:49:28-0659 [deb] 19:15:34 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-12-19T11:49:28-0659 [deb] 19:15:34 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-11-19T11:49:28-0659 [deb] 19:15:35 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-10-19T11:49:28-0659 [deb] 19:15:35 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-09-19T11:49:28-0659 [deb] 19:15:36 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-08-19T11:49:28-0659 [deb] 19:15:36 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-07-19T11:49:28-0659 [deb] 19:15:37 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-06-19T11:49:28-0659 [deb] 19:15:37 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-05-19T11:49:28-0659 [deb] 19:15:38 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-04-19T11:49:28-0659 [deb] 19:15:38 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-03-19T11:49:28-0659 [deb] 19:15:38 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-02-19T11:49:28-0659 [deb] 19:15:39 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1874-01-19T11:49:28-0659 [deb] 19:15:39 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1873-12-19T11:49:28-0659 I'm fairly certain I didn't send any emails in 1873. I stopped geary and restarted, and it looks like this set of messages starts with: [deb] 19:19:51 imap-engine-account-synchronizer.vala:282: Oldest local email in Gmail account self@brendanlong.com:INBOX not old enough (2011-10-19T11:49:28-0600 vs. 1-01-01T00:00:00-0659), email_total=3516 vs. local_count=3515, synchronizing... [deb] 19:19:51 imap-engine-generic-folder.vala:490: Not opening Gmail account self@brendanlong.com:INBOX: already open (open_count=3) [deb] 19:19:51 imap-engine-account-synchronizer.vala:339: Background sync'ing Gmail account self@brendanlong.com:INBOX [deb] 19:19:51 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-09-19T11:49:28-0600 [deb] 19:19:52 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-08-19T11:49:28-0600 [deb] 19:19:52 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-07-19T11:49:28-0600 [deb] 19:19:52 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-06-19T11:49:28-0600 [deb] 19:19:52 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-05-19T11:49:28-0600 [deb] 19:19:53 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-04-19T11:49:28-0600 [deb] 19:19:53 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-03-19T11:49:28-0600 [deb] 19:19:53 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-02-19T11:49:28-0700 ---- Additional Comments From geary-maint@gnome.bugs 2013-09-27 19:11:00 -0700 ---- ### History #### #1 Updated by Jim Nelson 3 months ago * **Target version** set to _0.4.0_ Huh -- it looks like the GDateTime was initialized to a bogus value in the first run. In gear menu -> Accounts, what do you have your Download mail time set to? The second set is normal; Geary doesn't pull all your old mail at once, rather it pulls in batches to keep the application usable. #### #2 Updated by Brendan Long 3 months ago Jim Nelson wrote: > Huh -- it looks like the GDateTime was initialized to a bogus value in the first run. In gear menu -> Accounts, what do you have your Download mail time set to? It's set to "Everything".. So I suppose you could say this isn't a bug, but I would expect an email client to realize that I don't have any messages from 1 AD (that's how far back it appears to want to go, according to the logs: `not old enough (2011-10-19T11:49:28-0600 vs. 1-01-01T00:00:00-0659)`). If I set it to "1 year back", it works as expected. Is there really no better way to tell if there's older email in an IMAP account? :( At the very least, you should be able to stop at [1971](https://en.wikipedia.org/wiki/Email#Email_networks) #### #3 Updated by Jim Nelson 3 months ago * **Assignee** set to _Jim Nelson_ * **Priority** changed from _Normal_ to _Urgent_ That's what I thought; yes, it should be able to detect when it's hit the end of the folder and doesn't need to go further. (I thought you were saying it was starting in 1875 and going back from there, i.e. an initialization problem.) I'll get on this Monday. #### #4 Updated by Jim Nelson 3 months ago * **Status** changed from _Open_ to _Fixed_ Applied in changeset ec10e06213f8b3fd3155a3d8333f7ec47181226b. #### #5 Updated by Brendan Long 3 months ago This fix doesn't seem to work for me. I still get the same messages, so I added a tiny bit more debugging: int local_count = yield folder.local_folder.get_email_count_async(ImapDB.Folder.ListFlags.NONE, bg_cancellable); debug("Got %d of %d emails", local_count, folder.properties.email_total); if (local_count >= folder.properties.email_total) { And I get output like: [deb] 17:32:48 imap-engine-account-synchronizer.vala:339: Background sync'ing Gmail account self@brendanlong.com:INBOX [deb] 17:32:48 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-09-19T11:49:28-0600 [deb] 17:32:49 imap-engine-account-synchronizer.vala:377: Got 3709 of 3710 emails [deb] 17:32:49 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-08-19T11:49:28-0600 [deb] 17:32:49 imap-engine-account-synchronizer.vala:377: Got 3709 of 3710 emails [deb] 17:32:49 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-07-19T11:49:28-0600 [deb] 17:32:49 imap-engine-account-synchronizer.vala:377: Got 3709 of 3710 emails [deb] 17:32:49 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 2011-06-19T11:49:28-0600 # etc [deb] 17:40:54 imap-engine-account-synchronizer.vala:377: Got 3709 of 3710 emails [deb] 17:40:54 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1906-05-19T11:49:28-0700 [deb] 17:40:54 imap-engine-account-synchronizer.vala:377: Got 3709 of 3710 emails [deb] 17:40:54 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1906-04-19T11:49:28-0700 [deb] 17:40:54 imap-engine-account-synchronizer.vala:377: Got 3709 of 3710 emails [deb] 17:40:54 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1906-03-19T11:49:28-0700 [deb] 17:40:54 imap-engine-account-synchronizer.vala:377: Got 3709 of 3710 emails [deb] 17:40:54 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1906-02-19T11:49:28-0700 [deb] 17:40:55 imap-engine-account-synchronizer.vala:377: Got 3709 of 3710 emails [deb] 17:40:55 imap-engine-account-synchronizer.vala:353: Background sync'ing Gmail account self@brendanlong.com:INBOX to 1906-01-19T11:49:28-0700 Note: Thunderbird says there's 3710 messages in the inbox and the oldest one is from 2011. #### #6 Updated by Jim Nelson 3 months ago * **Status** changed from _Fixed_ to _Open_ Huh -- it looks like it's not fetching one email and that's preventing the terminus check from completing. Let me take a look and see if I can figure out what the issue is. #### #7 Updated by Jim Nelson about 1 month ago * **Status** changed from _Open_ to _Fixed_ Applied in changeset 28a355a8e8db5b1c1d901c56392246ed45b30858. #### #8 Updated by Jim Nelson about 1 month ago Brendan, I've not been able to reproduce your problem, but I did find something very similar to it that I've patched. That, plus some other work that's been performed this cycle, I'm hoping will solve your problem, if you can try again. Additionally, I set the epoch to 1971 as you suggested, just for sanity's sake. --- Bug imported by chaz@yorba.org 2013-11-21 20:21 UTC --- This bug was previously known as _bug_ 7325 at http://redmine.yorba.org/show_bug.cgi?id=7325 Unknown milestone "unknown in product geary. Setting to default milestone for this product, "---". Setting qa contact to the default for this product. This bug either had no qa contact or an invalid one.
*** Bug 725800 has been marked as a duplicate of this bug. ***
A user has sent us a log with this problem, so re-opening.
Mike, I've put a branch up at wip/713530-sync. Can you check if this helps your issues? At the least I'm hoping it won't thrash so hard and cause your system to be unusable. I've also added some code to spend less time synchronizing. If you could run this and send another redacted log to geary@yorba.org, that would help me greatly.
Just sent the log through then.
I've pushed to master (commit 1a41622) changes which should make background synchronization not work the CPU or the disk so hard. However, it appears from Mike's log that it's still not pulling in all the emails from certain folders, which is something that needs to be investigated further.
The patch helped the problem but Mike reported that it's not completely resolved. He has three accounts configured and there's still too much I/O occurring as they synchronize. The background synchronizer should only be running on one account at a time. It also causes delays for him to view messages (I've seen this too), and that's something else that needs to be addressed.
*** Bug 772948 has been marked as a duplicate of this bug. ***
Using the additional log: > debug("Got %d of %d emails", local_count, folder.properties.email_total); I can see on my accounts that: - my personnal account is missing 3 emails in Sent folder. I think I found them: these are 3 emails that I have sent twice at the same time. Everything is the identical (timestamp, etc.) except the boundary and the Message-ID. Maybe Geary is skipping one of them because it thinks they are the same one? - my Gmail account is missing 2 emails in Important folder as well as 2 emails in "All emails", but I think the same emails are missing in both folders. However I found 4 missing emails yet that are all Google Calendar reminders from the same date/time (again!): > Content-Transfer-Encoding: 7bit > Content-Type: text/html; > charset=us-ascii > From: Gautier Pelloux-Prayer <gautier.pelloux@gmail.com> > Mime-Version: 1.0 (Mac OS X com.apple.MailServiceAgent 9.1 \(3096.5\)) > Subject: Alert - Plop > Message-Id: <BF2ED1E4-091E-4E71-8766-138BF499999A@gmail.com> > To: gautier.pelloux@gmail.com > 30 December 2014 12:30 : Plop But since I found 4 missing emails, there must be 2 invalid emails somewhere… it's probably another issue. Anyway if you have any idea where email might be skip due to "already in database", I'm listening!
Here is the code responsible of NOT downloading what Geary thinks is a duplicate: https://github.com/GNOME/geary/blob/master/src/engine/imap-db/imap-db-folder.vala#L1230-L1233 Basically the same mail was sent twice by Gmail for whatever reason, but with different message IDs. And since Geary only checks for the same size and date, then it will ignores one of them... I workaround the problem by modifying the code as it: https://github.com/bagage/geary/blob/bg-launch-713530/src/engine/imap-db/imap-db-folder.vala#L1230-L1250 So basically checking that message IDs are the same before ignoring a mail - but since I am not sure why in the first place such a code exists, I cannot ensure that this is the right thing to do. Nevertheless, so far my mail are getting all downloaded but it's painfully slow. I'll update this once all my folders have been treated by Geary.
Created attachment 342421 [details] [review] patch proposal v1 OK so there was a bug in my code, mails were being stored multiple times ;-). It's now much faster to get them all. Here is a proposal patch that should fix this issue! If review approves it, I'm not against some extra testing…!
Note: this patch only fixes problem for the first 51 mails of the current opened folder. It does not fix the issue in background synchronization mode - I have to look at it again...
Created attachment 343016 [details] [review] patch proposal v2 Finally was able to find the root cause (basically Message-ID not being fetched when background sync'ing). I added PROPERTIES as a mandatory field (as PROPERTIES already is), which does the trick :).
Review of attachment 343016 [details] [review]: Nice catch! This looks good, and I'll look at committing it as-is. Two questions though: - So with patch, are you seeing the debug statements you updated printing out "(already got m of n emails)", where m is actually equal to n? I'm still seeing for all of my folders m being less than n (in most cases by 10-20 messages), and that not changing after a sync. How have you been going about debugging this? ::: src/engine/imap-db/imap-db-folder.vala @@ +1238,3 @@ + if (email.message_id != null) + stmt.bind_string(2, email.message_id.to_string()); + This is interesting, so have you found cases where there is a null message id here? If it is null, what does the DB query look like, "... AND message_id=null"?
Well, I guess there is still some cleanup to do there. - Debug traces "alrgeady got m of n emails" (with m = n) should not appear more than once (eg. the first time background sync'ing is done). After that time, this trace should never appear again (since sync is done). But to have this first total sync, it may take a looong time (especially if you have more than 10K mails per folder). - Regarding your second question, the request is conditionnal: + if (email.message_id != null) + stmt = cx.prepare("SELECT id FROM MessageTable WHERE internaldate=? AND rfc822_size=? AND message_id=?"); + else + stmt = cx.prepare("SELECT id FROM MessageTable WHERE internaldate=? AND rfc822_size=?"); So if message_id is null, then we will not filter by message_id. Here are a few notes about cleanup now: - first, you are right: I should check if I still have some case when email.message_id is null. In any case I believe it should NOT, so yeah I'm going to remove this if/else condition anyway. - second, the patch does not fix actually all folders sync… I was a bit optimistic but it looks like there are still some emails missing
Oops, missclick. Anyway, for my accounts: - Gmail missing one "Important" and one in "All mails". This particular mail is the one that was sent twice at the same time with diffent ID: it is actually fetched and present, but in another folder (INBOX). I think my patch fixed the first part (fetching the 2 "identical" mails), but fails to fix the second. It should not be much difficult to fix though (hopefully!). - Personnal provider INBOX is stranger: Icedove downloaded 16366 mails. Geary tells me that server told it that there are 16370. And it fetched 16355… All other folders are fine though. Still some work to do on it. By the way, I'm attaching some debugging scripts I am using to compare missing mails between geary and icedove.
Created attachment 343138 [details] displayGearyDownloadProgression.sh
Created attachment 343139 [details] showMissingGearyMails.sh
Seems like you're making some progress, but it strikes me as being a pretty difficult issue to debug. What's your feeling about it in general? It just occurred to me that when we fix Bug 776129, we won't be able to compare the server's Message-Id with the one in the database, since they might be different. So if we're going to rely on it here, we would need tofetch the Message-Id header instead of the IMAP Message-Id when requesting Geary.Email.Field.REFERENCES, or we'd need to store both in the db. I'll make a note of this over there.
You know though, in the end I kind of wonder if we're making our lives hard by relying on only internaldate/size and maybe now Message-ID. It seems the suggested approach[0] is to rely purely on message UIDs, and if it changes (it can, but should be uncommon) then simply deal with it by downloading all messages again. That would suck, but maybe in that case there's a way to only re-sync some of the data to be able to stitch up the existing messages in the DB with their new UIDs. That's a /lot/ of work though, so maybe something to do only if we can't get these existing bugs ironed out. [0] - https://tools.ietf.org/html/rfc4549
Yeah maybe relying on the message UID is the only thing to do reliably. I'll try to give these 2 last issues a shot, but actually I doubt we'll ever be able to solve it reliably. For at least one reason: Message-ID is not mandatory so you might end-up with my issue (twice the same mails except content boundaries changed…). I have many mails (thanks Apple) with no Message-ID at all. Plus ill-formated other ones… I wonder how Icedove is handling all of that.
I was wondering that exact same thing. There's a few clues on their wiki: - https://wiki.mozilla.org/MailNews:Better_Faster_IMAP_Plan - https://wiki.mozilla.org/Thunderbird:IMAP_RFC_4551_Implementation Also, I just thought of a case that the background sync might not account for, when the following is true: - Geary is closed - A message gets delivered to a mailbox (sieve filter, another mail client is used to move a message, etc.) - The new message is newer than the oldest message in the mailbox - The oldest message is newer than the download epoch When geary is next opened and starts doing a background sync, it will hit the "Oldest local email in %s not old enough" condition, and start syncing from the oldest into the past. Since the new message is newer than the oldest, the new message won't get picked up, and the remote and local counts won't match afterwards. Does that sound plausible? To test this out, in the "Oldest local email" section, I simply nulled out oldest_local and oldest_local_id. The result was a full sync back to the epoch for each out-of-sync mailbox, but now at least a few more of my mailboxes are now actually in sync.
Well nulling "Oldest local email" didn't help on my side but that was expected since I can reproduce the problem by wiping all Geary's data. Do you think it makes sense to try to fix these problems or should we consider using message UID and give up on this?
Well, I'm not sure if the current approach is every going to perform particularly well, but switching to using UIDs might be a lot of work - that won't get done before 0.12. In the meantime there's definitely some issues that people are still reporting that could use getting fixed, where we're not picking up all/many messages in certain folders: e.g. Bug 776029 comment 5, and this issue you're seeing after a database wipe. Even if we can't get the performance fixed right away, it would be good to at least get it correct, if possible. Have you managed to track down what messages aren't being sync'ed even after you have deleted the db?
Using current master and my patch, the situation today is much better: - for my gmail account, 100% of my emails are present. Yeah! - for my personnal account, 100% of non-INBOX emails are present. Yeah! - for my personnal account, it misses 15 emails in INBOX (14300 / 14315). For these 15 missing: - 1 email is present twice. Everything exactly the same, except the UID. No way to fix this yet since Geary does not handle UID, right? - the rest (14 emails) are emails that are duplicates, except the UID, X-DSPAM-SIGNATURE and id in "Received" field. They are all from the exact same host, so I believe this host was sending mails twice from 2 different servers at the same time? Again, I don't see any solution but relying on UID. The proposed patch definitively helps to increase the situation though. It would be great to have it for 0.12.0 release!
Fair enough. I've rebased it off master and pushed as commit d33febb. Even with it I'm also still getting quite a few message missing, but also now have a bunch folders that are up to date. At least this improves things before we can switch to using UIDs. Ta!
I wonder if Bug 714802 and the patch there is relevant to this?
I've pushed a WIP branch to wip/713530-background-sync which I think fixes the issues with the duplicates. The fix isn't perfect, and I think the AccountSynchroniser needs to be reworked and maybe merged with the FlagWatcher (which I might get to as part of Bug 778276), but at least now all of my folders are getting sync'ed. Going to request for testing on the mailing list then I'll merge this.
Actually, after testing it myself and fixing up a few issues, I think it's pretty solid, so I've landed this on master as commit b1d3d49, will look at getting it in 0.12.1 after it's been tested there a bit.
Ah, this is way to bigger change to land for 0.12, so just going to mark that as fixed.
*** Bug 773508 has been marked as a duplicate of this bug. ***