GNOME Bugzilla – Bug 564339
imap syncing performstoo much local I/O
Last modified: 2009-08-21 22:22:52 UTC
Please describe the problem: A regression from the previous version, syncing an imap folder reads in the entirety of every already present local email, which is extremely slow. This patch puts back the behaviour from the prior release where each already local mail is not read-entirely into memory. Steps to reproduce: 1. 2. 3. Actual results: Expected results: Does this happen every time? Other information:
Created attachment 124559 [details] [review] Reduce the IO done during sync by not reading into memory locally cached files. This adds a new virtal method on CamelFolder, with the same parameters as get_message but no return value. The default implementation for provides with no sync_message method is to call get_message and discard the result.
Created attachment 124622 [details] [review] read the cache directory to avoid opening files that represent complete messages. This further reduces the IO done during sync, though its certainly not minimised yet. evo will still open UID.HEADER/UID.X/UID.X.MIME files for some uids, and still works with the entire folder uid count rather than just those unsynced, but its not harmful AFAICT and is faster by my testing (with 70K messages).
Robert, your patch is awesome. Just a note, can't you use uid to lookup in cache->parts instread of re-reading the dir? This should be even better, I would say. Just test out.
I've updated the patch to use cache->parts. I had to reverse engineer what was in cache->parts so I've added some comments about this to the code. Its ~ the same as reading the directory from what I can see (though obviously it doesn't read the directory). There is something subtle with MIME parts going on that I don't [yet] understand - I'll attach a strace snippet with more discussion shortly.
Created attachment 124771 [details] [review] incremental patch to read from the cache->parts hash.
Created attachment 124772 [details] [review] Overall patch to fixup syncing to be 'better'. This is the current overall patch against HEAD.
Some mails still get opened and read, if I can get a pointer to the right part of the code I can look at fixing this. This is what strace shows for such a mail: open(".../14546.", O_RDONLY) = -1 ENOENT (No such file or directory) open(".../14546.", O_RDONLY) = -1 ENOENT (No such file or directory) open(".../14546.HEADER", O_RDONLY) = 40 open(".../14546.1.MIME", O_RDONLY) = 40 open(".../14546.1", O_RDONLY) = 40 open(".../14546.2.MIME", O_RDONLY) = 40 open(".../14546.2", O_RDONLY) = -1 ENOENT (No such file or directory) My uneducated guess is that there is something about the representation of multipart messages that means *even when fully cached* they don't actually have a "%d." message part, only some number of "%d.%d" parts.
I'd like to sketch out the next revision and have you tell me if it seems plausible: in the imap provider folder implementation, in imap_get_uncached_uids use a loop roughly (python syntax) needed_cache_mi = [] not_cached = [] for uid in uids: mi = (CamelImapMessageInfo *)camel_folder_summary_uid (summary, uid) if mi: if not content_info_incomplete(mi): needed_cache_mi.append(mi) else: not_cached.append(uid) else: not_cached.append(uid) missing_parts_uids = camel_imap_message_cache_missing_parts(folder->cache, needed_cache_mi) return not_cached + missing_parts_uids This would check the cache for the cache parts we expect to have in the cache, and ask for the completely absent uids and incomplete uids plus the parts that are missing from the cache. Thoughts?
Robert, I would look at the code and reply to you tonight or so. Thanks a lot foryour work. Its pretty good with your patch.
Robert, the design looks neat. Just a thing, we might endup downloading some parts that are already there, for partial downloaded messages's parts. we might have to address it in a better way where we download.
Robert, Im looking for a 2.24.2.1 later this week, and if you can do it before that, I can the release with this. Thanks
Thank you for the feedback Srinivasa. I would suggest applying the current patch for 2.24.2.1; I have looked into the work the system will do using the content_info approach, and it will query the sqlite db for every message, so we will end up doing a lot of database access - it will add a lot of disk IO. So the approach is right, but we have to push the query for 'what parts are incomplete' to the folder summary as a single method, and that needs a single method that can query the db using an index. So I'll poke at it, find out about folder summary schema updates and so on, to do this :).
Awesome, lets commit to stable/trunk. We are breaking CamelFolder ABI here. But my feel is that nothing apart from Evolution would be affected and too we don't construct CamelFolder outside of of EDS, so it shouldn't matter. I'm making Evo to depend on latest EDS and we will commit this without a ABI bump. Matt: FYI, I'm doing the 'avoid-abi-real-bump' and keeping you updated on the loop
Just be aware: # Debian Testing $ apt-cache rdepends libcamel1.2-11 libcamel1.2-11 Reverse Depends: mail-notification-evolution libtotem-plparser10 libebook1.2-9 libcamel1.2-dev evolution-plugins evolution-jescs evolution-exchange evolution-data-server evolution mail-notification and evolution-jescs both use CamelFolder, and may be affected. libtotem-plparser just uses Camel for the broken date parsing, so no issue there. Is bumping the libcamel soname really that big a deal? Looks like there's barely a handful of people on the planet we'd be inconveniencing.
Matt, No body subclass it afaik. The biggest think I'm trying to avoid is that, I dont want to bump Camel ABI on stable (again). For 2.24.1 I bumped it and it was needed as I had done quite a lot of changes.
Okay. I'm fine with it.
I don't have an opinion about the ABI change; I'll note that it is needed, and that the future work to use the summary to detect missing MIME parts and so on will also need new virtual methods.
In GPtrArray * camel_folder_get_uncached_uids (CamelFolder *folder, GPtrArray * uids, CamelException *ex) { g_return_if_fail (CAMEL_IS_FOLDER (folder)); ... } I assumed it was a typo and changed it to g_return_val_if_fail (CAMEL_IS_FOLDER (folder), NULL); Patch committed to stable (gnome-2-24) branch as r9910 http://svn.gnome.org/viewvc/evolution-data-server?view=revision&revision=9910 Patch committed to SVN trunk as r9911 http://svn.gnome.org/viewvc/evolution-data-server?view=revision&revision=9911 BTW, there are some serious compilation warnings in camel (r9911) that have come from a few previous commits.
The main patch for this has landed a while back