After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 564339 - imap syncing performstoo much local I/O
imap syncing performstoo much local I/O
Status: RESOLVED FIXED
Product: evolution-data-server
Classification: Platform
Component: Mailer
2.24.x (obsolete)
Other All
: Normal normal
: ---
Assigned To: evolution-mail-maintainers
Evolution QA team
Depends on:
Blocks:
 
 
Reported: 2008-12-12 23:45 UTC by Robert Collins
Modified: 2009-08-21 22:22 UTC
See Also:
GNOME target: ---
GNOME version: 2.23/2.24


Attachments
Reduce the IO done during sync by not reading into memory locally cached files. (8.31 KB, patch)
2008-12-13 00:05 UTC, Robert Collins
none Details | Review
read the cache directory to avoid opening files that represent complete messages. (14.68 KB, patch)
2008-12-14 04:54 UTC, Robert Collins
reviewed Details | Review
incremental patch to read from the cache->parts hash. (6.11 KB, patch)
2008-12-16 07:02 UTC, Robert Collins
none Details | Review
Overall patch to fixup syncing to be 'better'. (16.87 KB, patch)
2008-12-16 07:04 UTC, Robert Collins
committed Details | Review

Description Robert Collins 2008-12-12 23:45:25 UTC
Please describe the problem:
A regression from the previous version, syncing an imap folder reads in the entirety of every already present local email, which is extremely slow. This patch puts back the behaviour from the prior release where each already local mail is not read-entirely into memory.

Steps to reproduce:
1. 
2. 
3. 


Actual results:


Expected results:


Does this happen every time?


Other information:
Comment 1 Robert Collins 2008-12-13 00:05:36 UTC
Created attachment 124559 [details] [review]
Reduce the IO done during sync by not reading into memory locally cached files.

This adds a new virtal method on CamelFolder, with the same parameters as get_message but no return value. The default implementation for provides with no sync_message method is to call get_message and discard the result.
Comment 2 Robert Collins 2008-12-14 04:54:01 UTC
Created attachment 124622 [details] [review]
read the cache directory to avoid opening files that represent complete messages.

This further reduces the IO done during sync, though its certainly not minimised yet. evo will still open UID.HEADER/UID.X/UID.X.MIME files for some uids, and still works with the entire folder uid count rather than just those unsynced, but its not harmful AFAICT and is faster by my testing (with 70K messages).
Comment 3 Srinivasa Ragavan 2008-12-15 09:24:39 UTC
Robert, your patch is awesome. Just a note, can't you use uid to lookup in cache->parts instread of re-reading the dir? This should be even better, I would say. Just test out.
Comment 4 Robert Collins 2008-12-16 07:01:07 UTC
I've updated the patch to use cache->parts. I had to reverse engineer what was in cache->parts so I've added some comments about this to the code. Its ~ the same as reading the directory from what I can see (though obviously it doesn't read the directory). There is something subtle with MIME parts going on that I don't [yet] understand - I'll attach a strace snippet with more discussion shortly.
Comment 5 Robert Collins 2008-12-16 07:02:19 UTC
Created attachment 124771 [details] [review]
incremental patch to read from the cache->parts hash.
Comment 6 Robert Collins 2008-12-16 07:04:17 UTC
Created attachment 124772 [details] [review]
Overall patch to fixup syncing to be 'better'.

This is the current overall patch against HEAD.
Comment 7 Robert Collins 2008-12-16 07:10:26 UTC
Some mails still get opened and read, if I can get a pointer to the right part of the code I can look at fixing this. 

This is what strace shows for such a mail:
open(".../14546.", O_RDONLY) = -1 ENOENT (No such file or directory)
open(".../14546.", O_RDONLY) = -1 ENOENT (No such file or directory)
open(".../14546.HEADER", O_RDONLY) = 40
open(".../14546.1.MIME", O_RDONLY) = 40
open(".../14546.1", O_RDONLY) = 40
open(".../14546.2.MIME", O_RDONLY) = 40
open(".../14546.2", O_RDONLY) = -1 ENOENT (No such file or directory)

My uneducated guess is that there is something about the representation of multipart messages that means *even when fully cached* they don't actually have a "%d." message part, only some number of "%d.%d" parts.
Comment 8 Robert Collins 2008-12-16 07:35:44 UTC
I'd like to sketch out the next revision and have you tell me if it seems plausible:

in the imap provider folder implementation, in imap_get_uncached_uids
use a loop roughly (python syntax)

needed_cache_mi = []
not_cached = []
for uid in uids:
  mi = (CamelImapMessageInfo *)camel_folder_summary_uid (summary, uid)
  if mi:
    if not content_info_incomplete(mi):
      needed_cache_mi.append(mi)
    else:
      not_cached.append(uid)
  else:
    not_cached.append(uid)
missing_parts_uids = camel_imap_message_cache_missing_parts(folder->cache, needed_cache_mi)
return not_cached + missing_parts_uids


This would check the cache for the cache parts we expect to have in the cache, and ask for the completely absent uids and incomplete uids plus the parts that are missing from the cache.

Thoughts?
Comment 9 Srinivasa Ragavan 2008-12-16 11:15:44 UTC
Robert, I would look at the code and reply to you tonight or so. Thanks a lot foryour work. Its pretty good with your patch.
Comment 10 Srinivasa Ragavan 2008-12-16 11:36:02 UTC
Robert, the design looks neat. Just a thing, we might endup downloading some parts that are already there, for partial downloaded messages's parts. we might have to address it in a better way where we download.
Comment 11 Srinivasa Ragavan 2008-12-16 17:04:47 UTC
Robert, Im looking for a 2.24.2.1 later this week, and if you can do it before that, I can the release with this. Thanks
Comment 12 Robert Collins 2008-12-16 20:03:22 UTC
Thank you for the feedback Srinivasa. I would suggest applying the current patch for 2.24.2.1; I have looked into the work the system will do using the content_info approach, and it will query the sqlite db for every message, so we will end up doing a lot of database access - it will add a lot of disk IO.

So the approach is right, but we have to push the query for 'what parts are incomplete' to the folder summary as a single method, and that needs a single method that can query the db using an index. So I'll poke at it, find out about folder summary schema updates and so on, to do this :).

Comment 13 Srinivasa Ragavan 2008-12-17 03:14:23 UTC
Awesome, lets commit to stable/trunk.

We are breaking CamelFolder ABI here. But my feel is that nothing apart from Evolution would be affected and too we don't construct CamelFolder outside of of EDS, so it shouldn't matter. I'm making Evo to depend on latest EDS and we will commit this without a ABI bump. 


Matt: FYI, I'm doing the 'avoid-abi-real-bump' and keeping you updated on the loop
Comment 14 Matthew Barnes 2008-12-17 04:18:06 UTC
Just be aware:

   # Debian Testing

   $ apt-cache rdepends libcamel1.2-11
   libcamel1.2-11
   Reverse Depends:
      mail-notification-evolution
      libtotem-plparser10
      libebook1.2-9
      libcamel1.2-dev
      evolution-plugins
      evolution-jescs
      evolution-exchange
      evolution-data-server
      evolution

mail-notification and evolution-jescs both use CamelFolder, and may be affected. libtotem-plparser just uses Camel for the broken date parsing, so no issue there.

Is bumping the libcamel soname really that big a deal?  Looks like there's barely a handful of people on the planet we'd be inconveniencing.
Comment 15 Srinivasa Ragavan 2008-12-17 14:27:58 UTC
Matt, No body subclass it afaik. The biggest think I'm trying to avoid is that, I dont want to bump Camel ABI on stable (again). For 2.24.1 I bumped it and it was needed as I had done quite a lot of changes. 
Comment 16 Matthew Barnes 2008-12-17 15:14:21 UTC
Okay.  I'm fine with it.
Comment 17 Robert Collins 2008-12-18 01:17:14 UTC
I don't have an opinion about the ABI change; I'll note that it is needed, and that the future work to use the summary to detect missing MIME parts and so on will also need new virtual methods.
Comment 18 Suman Manjunath 2009-01-12 03:49:39 UTC
In 

GPtrArray *
camel_folder_get_uncached_uids (CamelFolder *folder, GPtrArray * uids, CamelException *ex)
{
	g_return_if_fail (CAMEL_IS_FOLDER (folder));
...
}

I assumed it was a typo and changed it to 

g_return_val_if_fail (CAMEL_IS_FOLDER (folder), NULL);

Patch committed to stable (gnome-2-24) branch as r9910
http://svn.gnome.org/viewvc/evolution-data-server?view=revision&revision=9910

Patch committed to SVN trunk as r9911
http://svn.gnome.org/viewvc/evolution-data-server?view=revision&revision=9911

BTW, there are some serious compilation warnings in camel (r9911) that have come from a few previous commits. 
Comment 19 Robert Collins 2009-08-21 22:22:52 UTC
The main patch for this has landed a while back