GNOME Bugzilla – Bug 605126
Writing ArtworkDB too slow
Last modified: 2010-04-01 09:32:49 UTC
I have a new iPhone 3GS and successfully synchronized it using the iFuse filesystem and gtkpod and after that I tried the gvfs-afc-backend. So I plug in my iPhone and it appears *magically* in Rhythmbox. I can copy song from my Library to it and everything is fine, except that the Rhythmbox ipod plugin starts synchronizing just after a few song, not at the end of the whole process and the speed of the copy drops dramatically. After a few minutes the sync process is done and the speed returns to normal but, as there is still some file copying, a new sync process starts a few seconds after that. So there is maybe a bug inside Rhythmbox ipod plugin and maybe it should not behave like that (synchronizing and synchronizing, ... ) but the real problem here is that the sync process takes a lot of time. I have on my iPhone 158 albums, 1695 songs and all of them have the album cover set by Rhythmbox. A sync process using the iFuse mount and gtkpod takes 25 seconds while using Rhythmbox and gvfs-afc it takes 2 minutes 40 seconds. Using lsof I found that all but 5 seconds of those times are used by libgpod to read and write the ArtworkDB files. The rest of the process is really fast (about 5 seconds) in either case. Here are the size of the files total 104M 304K ArtworkDB 34M F3001_1.ithmb 8,5M F3002_1.ithmb 2,2M F3003_1.ithmb 53M F3005_1.ithmb 1,6M F3006_1.ithmb 4,1M F3007_1.ithmb Thats all the information I have. I did not find how to activate the debug of gvfs. So here I am, before reporting a bug to Rhythmbox telling them that they should not start a sync until the end of the process (I patched my version to do that because it was really unusable with gvfs-afc), maybe something can be done to speed up gvfs-afc.
This is a known bug in libgpod, it's doing tons of stat all the time and this is really slow when done through libiphone or through the gvfs afc backend. Fwiw, you are using libgpod git, so it's normal not to have an optimal experience ;)
As this is not an issue with iFuse I wonder if there is not something that could be done inside gvfs-afc to speed up this process. In either case I am the same libgpod/libiphone.
It's not obvious from your bug report that your test with gtkpod+iFuse and rhythmbox+gvfs can be compared. Might be worth testing with gtkpod pointed at ~/.gvfs/... and pointed at an iFuse mountpoint.
Well it's not obvious from my bug report. :) At first I thought it was a bug in libgpod. I used gtkpod+iFuse previously without an album cover (SNL joke aside:) and I started using gvfs-afc+rhythmbox just for the covers, and maybe gaining in speed (5Mbit/s on USB2 is not what I was expecting). So when I saw the sync times growing and growing I thought it was a problem in libgpod as I could not guess gvfs-afc could behave differently that iFuse. I was ready to start looking at the libgpod source code and try to figure it out. And then I made a final test with gtkpod+iFuse and I did not have this problem. Remounted with gvfs and the problem was there. This is clearly a gvfs-afc issue.
1) gtkpod with my ~/.gvfs/iPhone mount point using the gvfs-afc backend for the gvfs fuse module 2) gtkpod with my /media/iphone mount point using iFuse without gvfs Same libgpod, same ligiphone, same gtkpod, usbmuxd, etc... : 1) 2m40s 2) 25s
You won't be able to get the exact same performance from gvfs' fuse as you would from a bare iFuse mount. The possible solutions are: - for the AFC backend to provide a cache (unlikely, it's enough of a mess already) - for libiphone to provide a cache (better, but it would need to be per client, not such a huge problem if gvfs is used everywhere) - for gvfs itself to provide caching - for gvfs' fuse backend I think the first step would be to greatly reduce libgpod's stat'ing(). Adding a simple hash-based cache in the gvfs-afc backend might help short term, but it would need to implement a reverse-lookup mechanism to evict old entries from the list. I'm not sure that'd be terribly efficient, or a good idea in the long-term. I'll put this on NEEDINFO until somebody can provide some information about the pattern of stat()'s libgpod does when saving its artwork database. We can't optimise until we know that.
This apparently was also an issue with libgpod. As Christophe pointed out there was some recursive stat()'ing going on where it was not required. However, this has been fixed/optimized in the latest libgpod releases thus the sync times should now be re-evaluated. Regarding adding caching; first of all why is the bare iFuse mount that much faster stat()'ing than the GVFS fuse mount? Essentially, they are doing the same operations (along GVFS's slight overhead), aren't they? Won't caching introduce VFS inconsistency here?
I'm going to make another benchmark tonight with all the latest versions, after I compiled them, and let you all know how are things now.
So here is my latest benchmark with the libgpod4=0.7.90git20100210-0ubuntu2~ppa1. This version is less than a month old. There is a newer libgpod but there is still no package for it and I doubt it does more for this problem (see the changelog). I will update this bench as soon as I find a newer version. My test is done with 38 audio files for 264MB and an iPhone 3GS 32GB. I have on it already 2390 files for 18.2GB . ====== GTKPod ====== First let's take a look at using GTKPod. I have a Music Libray containing only those files. I will copy them, press the "Save" button, wait for the transfert to finish and wait for the sync process to finish writing down both the times it took. I delete those files and sync between each test of course. Using GTKPod + GVFS-AFC ----------------------- Copy process : 3:21 (two second gap between the two process) Sync process : 3:02 Total Process Time : 6:25 Using GTKPod + iFuse -------------------- Copy process : 4:38 (two second gap between the two process) Sync process : 0:29 Total Process Time : 5:09 Conclusion ---------- Strange, iFuse is slower to transfert the data but the sync process is way faster (6 times faster). I checked again and the benchmark were similar. ========= Rhythmbox ========= Now for the most annoying problem. Using Rhythmbox. Rhythmbox unpatched ------------------- Rhythmbox ipod plugin does not just transfert the files and sync after, it start a sync 15 seconds after the first transfert is started. The transfert rate drops 3 to 4 times slower and when the sync process is over, if others file were copied a new sync process is started 15 seconds later, slowing down again the process, etc... When the last file is transfered, if the device is still sync'ing a last sync process is started... 15 seconds later. So for this bench, two times : 1-the duration of the transfert to the end of the last sync process which started during the file transfert 2-the duration of the sync process that started 15 seconds after the last file transfert and ongoing sync process Copy+Ongoing Sync : 12:24 (~15 seconds gap) Last Sync : 3:07 Total process : 15:47 15 minutes and 47 seconds. For just 38 tracks and 264 MB. At this rate it will take me more than 24 hours to fill my device (using the 12:24 times). Rhythmbox patched ----------------- Now I patched Rhythmbox for resheduling the sync process for 15 more seconds if a file started to be transfered since the process was scheduled. Now no sync process is done durint file transfert and this process is not slowed down. Still a last sync process occurs after few seconds at the end of the transfert process. Copy process : 2:59 (~25 seconds gap) Sync process : 3:07 Total Process : 6:32 The sync process time did not changed at all but the transfert process was hugely improved. ========== CONCLUSION ========== GTKPod + GVFS-AFC : 6:25 GTKPod + iFuse : 5:09 Rhythmbox unpatched : 15:47 Rhythmbox patched : 6:32 Even if GVFS-AFC could be improved with much coding and suffering and caching and stuff the problem will still be here !!!!! I repeat it would take about 24 hours to fill my device using Rhythmbox without patching it first!!! For huge device like those huge iPod Touch, up to 64GB which would take two days to fill, and iPhone 3GS there should not have a continuous sync during transfert!!!
(In reply to comment #7) > This apparently was also an issue with libgpod. As Christophe pointed out there > was some recursive stat()'ing going on where it was not required. > > However, this has been fixed/optimized in the latest libgpod releases thus the > sync times should now be re-evaluated. Hmm, this was patched in ubuntu, but I haven't added the patch to the main repository for now.
(In reply to comment #9) > > Even if GVFS-AFC could be improved with much coding and suffering and caching > and stuff the problem will still be here !!!!! > > I repeat it would take about 24 hours to fill my device using Rhythmbox without > patching it first!!! Let's add more !!!!!!!!!!!!!!!!! since it seems it will help fixing this bug faster!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! So what you are saying is that gvfs-afc performance is good enough but that the way rhythmbox writes to the device with this writing every 15 seconds is slowing things down too much, but if it wrote only once, performance would be acceptable?
I mean what do you prefer? A process that takes 24h and 5 minutes for a ending sync or a process that takes 6 hours and a 5 minutes sync? Of course a 6 hours + 30 seconds sync would be perfect and I am not against it... I first reported a bug for Rhythmbox, asking to correct the continuous sync process policy and filled a bug to gvfs-afc because it was a bit related but not the main issue. And as my benchmark proved there is not much difference between gtkpod+gvfs or gtkpod+ifuse for ~40 songs (~3 albums) but a huge difference with Rhythmbox+gvfs. The only different factor here is Rhythmbox.
My question was an honest one, you seem to have made lots of investigation and measures, but in the end I'm a bit drowned in all the information, and I wanted to be 100% sure that what I think is the conclusion about which place should be fixed is what you had in mind too. So a "Yes that sums up what I noticed and thinks need to be done" or a "no, that's not it, I think X and Y needs to be fixed instead" to my "So what you are saying is that gvfs-afc performance is good enough but that the way rhythmbox writes to the device with this writing every 15 seconds is slowing things down too much, but if it wrote only once, performance would be acceptable?" question would be useful :) Thanks
Christophe, I think it's Rhythmbox now. ;) You told me once that RB waits for a few minutes before writing the iTunesDB aswell. The overall workflow RB uses seems wrong. It must copy tracks, once finished write the database contents immediately and all in one go. I see lot of issues for people complaining for the new iPhone/iPod Touch sync that while RB appeared to have copied their tracks to the device, those tracks never "show up" on their device. For most of the time this is due to them plugging the device before RB every triggered an iTunesDB sync as they think "oh, it copied my tracks so it's done". Same from my tests, the tracks in my tests where finished coping quite a few minutes already before RB "decided" some minutes later to sync. I had plenty of opportunity to plug the device and end up having tracks on the device without entries in my iTunesDB (or respectively sqlite dbs).
(In reply to comment #14) > Christophe, I think it's Rhythmbox now. ;) Yep, that's my feeling too, but I want to be 100% sure Simon agrees with this conclusion > > You told me once that RB waits for a few minutes before writing the iTunesDB > aswell. a few minutes = 15 seconds > The overall workflow RB uses seems wrong. It must copy tracks, once > finished write the database contents immediately and all in one go. When multiple tracks are being copied, the ipod plugin doesn't know it. It's asked to copy one track, then another one and so on. So there's no "once finished" notion. The delay is here to avoid writing the database over and over. Imo, what needs to be done is * make this delay shorter (let's say 3 seconds) * don't write the DB every 3 seconds while transferring but only once at the end * add itdb_start_sync/itdb_stop_sync functions to libgpod (as suggested by Marcan) which would show the "transfer in progress screen". These functions would be called when the 1st track is added and when the gtimeout callback triggers PS: at this point we should stop flooding the gvfs guys with this talk and move it to the rhythmbox bug :) or reassign this one to rb
Created attachment 155992 [details] [review] proof of concept patch This patch should achieve what is described in the previous comment. It uses new API that I added to my master at http://gitorious.org/~teuf/libgpod/teuf-sandbox and hasn't been tested except for compilation (doesn't compile with older libgpod though)
First let me assure you Christophe I really appreciate all the work you have done and I am still amazed to be able to sync my iPhone on Linux. But I really thought there was some improvement to be done as the AFC protocol from Apple slows down everything giving USB 1.1 like speed (1.5MB/s) over a USB 2 connexion. The way iTunes is doing it is a copy process and a sync process not a copy and sync simultaneously. The way I see RB doing it is : - copying the files while incrementing the status bar - start the sync process at the end of the last file transfert. - The device should be visible as locked inside RB (icon changed, impossible to copy files not played, etc...) - the status/progress bar should stay at 100% with a message (maybe the same as the iPhone/iPad : "Synchronizing..." or whatever). That's it. Like that would be perfect.
@Simon: First comes a working implementation, performance optimization comes later. Currently I notice transfer speeds along 2.5MB/s for me using the original USB cable. Agree with the way you describe how RB should be syncing is the way it should be. Regarding sync performance, I changed my packages to use this patch: http://gitorious.org/~teuf/libgpod/teuf-sandbox/commit/ab969a6ece7c674074b40ef08c0f62762fb00c20 It brings the "sync" step from around 26 seconds down to 6 seconds for me using gvfs-afc and gtkpod. You might try again with this patch applied and report if it is faster for you. If yes, this bug will be resolved. Please open a new bug regarding the Rythmbox sync workflow issue.
@Martin: I tested the latest git version of libgpod (git clone git://gitorious.org/libgpod/libgpod.git) with and without the patch. Same set of files as before. ( 38 songs/264MB ). My iPhone has more files now than in the best before so the sync times should be higher... I tested the patched version with gtkpod+gvfs-afc and gtkpod+ifuse : - gtkpod+gvfs-afc copy : 3min07s (1.41 MB/s) sync : 47s total: 3min54s - gtkpod+gvfs-ifuse copy : 4min36s (0.96 MB/s) sync : 34s total: 5min10s The sync process has improved quite a lot. When ifuse was 6 times better, it is now only 38% better. Of course I tested Rhythmbox (with my patch to do one sync) with and without the patched libgpod : - RB & libgpod git copy : 3min02s sync : 3min56s total: 6min58s - RB & libgpod git+patch copy : 2min58s sync : 46s total: 3min44s I only have one thing to say : commit !!! :)
(In reply to comment #19) > > I only have one thing to say : commit !!! :) It's not that easy, it might break file copying to some FAT32 iPods (the code was added after some case-sensitivity issues with such iPods). I asked the ubuntu guys to put it in their package to try to get some testing to see if it's safe to remove it now (it might have been a bug in the kernel fat modules).
I must say I was completely wrong. It is true that libgpod is doing a lot of stat() and that's where the optimization must be done. Sorry Christophe, you were right. I looked at the patch mentioned by Martin Szulecki and trace the usage of itdb_resolve_path(). I have traced that most of the call to itdb_resolve_path() are done when ithmb_rearrange_existing_thumbnails() in ithumb-writer.c is calling itdb_thumb_ipod_get_filename() to resolve the filenames of the thumbnails database files. On my device I only have 6 thumbnails database files. Each call to itdb_thumb_ipod_get_filename() calls also itdb_get_artwork_dir() and itdb_get_control_dir() which generates at the end about 4 calls to itdb_resolve_path() for each song. For 2000 songs with cover that's about 8000 calls to itdb_resolve_path() when in fact I only have 6 thumbnails database file. So I patched ithmb_rearrange_existing_thumbnails() in ithumb-writer.c to cache the result of itdb_thumb_ipod_get_filename(). So now instead of calling itdb_resolve_path() 2000 times 4, it only need to call it 6 times 4. Also I recoded the patch previously mentioned. Now itdb_resolve_path() does : - Test and return the path if it exist (like the previous patch did). - Test if we are using gvfs and return NULL if that's the case. - Test the path like previously done for the vfat problem fix. I tested this patch while filling my iPhone from 0 to 29GB. When it was empty the sync process took 8sec and now the sync process takes about 12sec. All of this using GVFS-AFC. http://dl.dropbox.com/u/904402/libgpod/libgpod_optimized-stat.patch So here you go, if Christophe could have a look at this patch. I know it needs to do some clean up I didn't code but it works fine.
Created attachment 156841 [details] [review] attach patch mentioned above by Simon
The patch Martin linked to would be the best solution to that issue, but I'm still unsure if it's still needed or if it was just papering over a bug in older kernels. If it's not possible to have that patch, then yours make sense (but the if (g_strrstr(root, ".gvfs")) part is ugly :-/ ). I'm not sure why you return NULL and not absolute_path in the .gvfs case though.
I tried to work to keep the vfat code but without penalizing gvfs. If we can get rid of this vfat code that would be the best otherwise I put this g_strrstr(root, ".gvfs") to avoid doing this vfat code if we are using GVFS. 1) Test the path like the patch Martin made. Return absolute_path if found. 2) Test for gvfs (the g_strrstr is the ugliest but fastest way doing so in this function, at that point). If we are using gvfs then the vfat code won't find anything more, the path will not be found and anyway NULL will be returned at the end. So better return NULL here (if we are using gvfs). 3) Well if we are not using gvfs then maybe this vfat code can apply, returning the correct path or NULL at the end.
Ah yeah, after thinking more about it, I understand the return NULL; Adding a comment like /* gvfs fuse mount, we're handling an AFC device, so filesystem is case sensitive. We won't find a variant of the tested path with another case, so bail out early (ie file won't be found) */ would make things clearer I think. This implies apps have been using the correct case for music dirs (F00 VS f00) (correct case is defined as "whatever iTunes does) even if I don't think the iPod will be overly strict with it and can live with having both f00 and F00.
If itdb_resolve_path() resolve path is called that often, maybe it could do with an object/struct being associated to the root. So you'd create your root object, and get its properties (including checking what type of filesystem it's on), so that the data is cached, rather than computed each time. RootObject *itdb_root_new(const char *path); char *itdb_resolve_path_for_root (RootObject *object, const gchar * const * components) itdb_root_free (RootObject *object);
Yep, having an itdb_device_resolve_path (Itdb_Device *device, const gchar * const * components) is an option that has been considered if we don't find any better way of handling the issue. It just that it needs to be done, and we wanted to 1st release something that works even if it's a bit slow ;) Currently I'm hoping we can get rid of itdb_resolve_path altogether (ie go with Martin's patch)
(In reply to comment #27) > Yep, having an itdb_device_resolve_path (Itdb_Device *device, const gchar * > const * components) is an option that has been considered if we don't find any > better way of handling the issue. It just that it needs to be done, and we > wanted to 1st release something that works even if it's a bit slow ;) Currently > I'm hoping we can get rid of itdb_resolve_path altogether (ie go with Martin's > patch) I don't think the original problem of mixed cases on VFAT has gone away. mount.vfat has different options to select what type of case you want, and unless the mount was forcefully mounted with a particular one, we cannot guess what the case of those directories will be. Add an itdb_device_resolve_path() is probably the best idea, while marking itdb_resolve_path() as deprecated (and documenting it as slow).
I agree with Bastien. Variation about models should be in a structure. On a related note. I was amazed that the function itdb_get_control_dir() which return iTunes top directory is guessed each time. gchar *itdb_get_control_dir (const gchar *mountpoint) { gchar *p_ipod[] = {"iPod_Control", NULL}; gchar *p_mobile[] = {"iTunes", "iTunes_Control", NULL}; gchar *p_iphone[] = {"iTunes_Control", NULL}; /* Use an array with all possibilities, so further extension will be easy */ gchar **paths[] = {p_ipod, p_mobile, p_iphone, NULL}; gchar ***ptr; gchar *result = NULL; g_return_val_if_fail (mountpoint, NULL); for (ptr=paths; *ptr && !result; ++ptr) { result = itdb_resolve_path (mountpoint, (const gchar **)*ptr); } return result; } Me using an iPhone means that two calls to itdb_resolve_path is done before it get it right. That's why Martin patch is efficient because it removed rapidly failed attempt to discover where iTunes is. And this iTunes directory is needed for everything. This value (itdb_get_control_dir) should not be guessed each time but cached as it doesn't change over time. In functions like ithmb_rearrange_existing_thumbnails() values should be cached. From a sync process taking 46sec (with Martin patch) with caching it only takes 12sec.
Fwiw, I've started moving libgpod to an itdb_device_resolve_path function, about 80% of the work is already done in http://gitorious.org/~teuf/libgpod/teuf-sandbox/commits/itdbresolvepath
Thanks Christophe for all this work. I just tried this branch and it's faster than the master branch. Nevertheless "ithmb_rearrange_existing_thumbnails" still does not implements caching it's data. Please take a look at this patch where I put some debug to show on the console and implements catching : http://dl.dropbox.com/u/904402/libgpod/libgpod_rearrange-existing-thumbnails_caching.patch This is what goes on the console : Starting ithmb_rearrange_existing_thumbnails. Resolved filename: /home/simon/.gvfs/iPhone de Simon HUET/iTunes_Control/Artwork/F3003_1.ithmb. Stopping ithmb_rearrange_existing_thumbnails, done 3772. Starting ithmb_rearrange_existing_thumbnails. Resolved filename: /home/simon/.gvfs/iPhone de Simon HUET/iTunes_Control/Artwork/F3002_1.ithmb. Stopping ithmb_rearrange_existing_thumbnails, done 3772. Starting ithmb_rearrange_existing_thumbnails. Resolved filename: /home/simon/.gvfs/iPhone de Simon HUET/iTunes_Control/Artwork/F3001_1.ithmb. Stopping ithmb_rearrange_existing_thumbnails, done 3772. Starting ithmb_rearrange_existing_thumbnails. Resolved filename: /home/simon/.gvfs/iPhone de Simon HUET/iTunes_Control/Artwork/F3007_1.ithmb. Stopping ithmb_rearrange_existing_thumbnails, done 3772. Starting ithmb_rearrange_existing_thumbnails. Resolved filename: /home/simon/.gvfs/iPhone de Simon HUET/iTunes_Control/Artwork/F3006_1.ithmb. Stopping ithmb_rearrange_existing_thumbnails, done 3772. Starting ithmb_rearrange_existing_thumbnails. Resolved filename: /home/simon/.gvfs/iPhone de Simon HUET/iTunes_Control/Artwork/F3005_1.ithmb. Stopping ithmb_rearrange_existing_thumbnails, done 3772. This patch cache the result of each thumbnail file path resolution and prints "Resolved filename...." message each time a new file is found. As you can see I only have 6 thumbnails files. The function "ithmb_rearrange_existing_thumbnails" is called once for each thumbnail file and this function resolve 3772 times the SAME path to the thumbnail file each time it is called to a total of 3772*6=22632 functions calls for only 6 files. Without a caching process like this patch is doing, using this branch code, each calls to "ithmb_rearrange_existing_thumbnails" takes 10 seconds to complete while with a caching it takes only a second for all the "ithmb_rearrange_existing_thumbnails" calls to complete. 1) Why does it's the same artwork filename for each calls to "ithmb_rearrange_existing_thumbnails" and why not resolve it once for each song? 2) If it's needed to resolve the path for each song, even if it seems it's always the same path, why not implemening a cache like this patch is doing.
(In reply to comment #31) > As you can see I only have 6 thumbnails files. The function > "ithmb_rearrange_existing_thumbnails" is called once for each thumbnail file > and this function resolve 3772 times the SAME path to the thumbnail file each > time it is called to a total of 3772*6=22632 functions calls for only 6 files. > > Without a caching process like this patch is doing, using this branch code, > each calls to "ithmb_rearrange_existing_thumbnails" takes 10 seconds to > complete while with a caching it takes only a second for all the > "ithmb_rearrange_existing_thumbnails" calls to complete. > > 1) Why does it's the same artwork filename for each calls to > "ithmb_rearrange_existing_thumbnails" and why not resolve it once for each > song? > > 2) If it's needed to resolve the path for each song, even if it seems it's > always the same path, why not implemening a cache like this patch is doing. Mainly because I hoped the bottleneck would go away after these changes. I hope the situation improved a bit :) And actually, it's good I didn't add the cache just yet, because looking again at the code, it seems it's possible to remove more code to get the same behaviour instead of adding new code paths ;) http://gitorious.org/libgpod/teuf-sandbox/commit/6f44f2e8fb49dd3e4c2c344190282ca38e36781b will do the trick hopefully. If a cache is really needed in the end, the good news is that we can do something more generic with the new API instead of having ad-hoc caching code here and there.
The situation improved a bit. The latest libgpod/libgpod.git version gives me a 10 minutes sync. Your last ~teuf/libgpod/teuf-sandbox:itdbresolvepath version gives me a 1 minute sync. Your last commit seems not to have change the time it takes to complete the sync process. I find my previous comment a bit incomprehensible. :) I will try to reformulate it. 1) ithmb_rearrange_existing_thumbnails starts 2) it enters a loop on all the songs (for me right now 3772 files) 3) it resolves the path to the SAME thumbnail file 3772 times. 4) ithmb_rearrange_existing_thumbnails stops This process is done 6 times, each time it is ONE another thumbnail file that is resolved 3772 times. This results in 3772*6=22632 calls to itdb_thumb_ipod_get_filename and now takes about a minute to complete. When caching that value and making only 6 calls to itdb_thumb_ipod_get_filename the process takes less than a second to complete. Maybe it is not needed to resolve 3772 times the SAME thumbnail file? Either resolving it for the first file suffice or caching all the results. I hope I am clear. :) Once more great work on implementing the *device* functions. :)
(In reply to comment #33) > I find my previous comment a bit incomprehensible. :) I will try to reformulate > it. Nope, your comment was fine, don't worry :) > When caching that value and making only 6 calls to itdb_thumb_ipod_get_filename > the process takes less than a second to complete. > > Maybe it is not needed to resolve 3772 times the SAME thumbnail file? Either > resolving it for the first file suffice or caching all the results. It can change on large databases... You get a filename like F1234_1.ithmb, if the file grows larger than 256MB (?), there will be a F1234_2.ithmb and so on. And if the resolving is fast enough, why not? Would be better refactoring the whole thing to be less wasteful instead of adding a cache as a bandaid. Looking at the code again, there's a g_file_test remaining in the inner loop (and only one per iteration I think), it's in itdb_thumb_ipod_get_filename with a /* FIXME: Hack */ before it... I guess it's time for a bit more refactoring ;) If you have a bit of time, can you drop the whole block (if (!g_file_test...) {...} ) and see if it helps? Thanks for all the tests!
I dropped the block and the code is faster. It takes about 20 seconds to do all the "ithmb_rearrange_existing_thumbnails". The sync process drops to 30 seconds in total. But again adding a cache makes the calls to ithmb_rearrange_existing_thumbnails takes about a second and the all process takes about 10 seconds. I understand your point of view of trying to make a clean API that can be used and relied on for everything. But I don't see caching API calls as a bandaid but more as an optimization. :) It is the "not calling the API when it is not needed" approach as of those 22632 calls, for my case, only 6 are needed.
(In reply to comment #35) > I understand your point of view of trying to make a clean API that can be used > and relied on for everything. But I don't see caching API calls as a bandaid > but more as an optimization. :) Well, I still don't know *why* this code is slow, most of the g_file_test calls should be gone now so it should not be noticeably slower than with the cache approach. So as long as I don't know exactly what makes this code slow despites the changes that were made, the caching will just hide things and will not be a valid optimization. It will be a worthwhile optimization when we get down to "ah yeah, we really have to do this here but this is slow, let's add a cache to fix this"
(In reply to comment #36) > (In reply to comment #35) > > I understand your point of view of trying to make a clean API that can be used > > and relied on for everything. But I don't see caching API calls as a bandaid > > but more as an optimization. :) > > Well, I still don't know *why* this code is slow, most of the g_file_test calls > should be gone now so it should not be noticeably slower than with the cache > approach. So as long as I don't know exactly what makes this code slow despites > the changes that were made, the caching will just hide things and will not be a > valid optimization. It will be a worthwhile optimization when we get down to > "ah yeah, we really have to do this here but this is slow, let's add a cache to > fix this" I will add some more serious benchmarking using GTimers and find where things get slowed down. Nevertheless you can't make 22632 calls as fast as 6. And we have a good implementation, I think 2263 calls/seconds is fast. There's just too much calls to be done.
(In reply to comment #37) > I think 2263 calls/seconds is fast. There's just too much calls > to be done. Your CPU can chew 1E9 instructions per second, 20.000 calls is nothing. I just made a test, and doing a malloc/free within a function called 1 million time in a loop takes 0.1 second to run on this CPU which is a few years old.
As expected, there was a bug remaining which explains why things didn't work as I expected. Can you fetch the branch again (I force-pushed it, sorry for the inconvenience), apply the hack I mentioned in comment #34 and try again? With the tests I made locally, I'm down to maybe 10 or 20 g_file_test calls for all the ithumb_rearrange_thumbnails calls.
(In reply to comment #38) > Your CPU can chew 1E9 instructions per second, 20.000 calls is nothing. I just > made a test, and doing a malloc/free within a function called 1 million time in > a loop takes 0.1 second to run on this CPU which is a few years old. I am sorry but we are not talking about the same thing. I was not talking about my CPU, which of course can do zillions of thing at the same time, we are talking about making network like connections over a slow USB cable to a slow device. You cannot expect it to behave like a local filesystem. It's more like querying an SQL server running on a PentiumII processor over a 10Mbps half-duplex Ethernet link. You will never be able to make 20.000 calls under one second. I am not saying that because I think so. I added some timers in ithmb_rearrange_existing_thumbnails. Two timers : 1-One in ithmb_rearrange_existing_thumbnails, continued and stop around itdb_thumb_ipod_get_filename so I get all the time it spend in that function 2-One passed as an argument to itdb_thumb_ipod_get_filename (more exactly a copy of that function that accept a GTimer argument). This timer is passed down the chain and let me know where is the bottleneck. http://dl.dropbox.com/u/904402/libgpod/libgpod_benchmark.patch And what do I get : Timer 1 : 2.845s (around ithmb_rearrange_existing_thumbnails calls to itdb_thumb_ipod_get_filename) Timer 2 : 2.838s (around g_file_test in itdb_device_get_file_in_dir) So let's be clear, you implementation is the fastest it can be. The problem comes when using g_file_test(). As we are in a bug report for GVFS I don't know what we can do. Again either something can be made to GVFS-AFC to speed those stat() or you have to remove unneeded stat() from libgpod. It's where we were before. Now the only thing remaining to remove those unneeded stat() is either never use g_file_test() which will be stupid or make sure those function calling it does not make unneeded stat() and a cache for ithmb_rearrange_existing_thumbnails seems to be for me a good solution.
(In reply to comment #39) > As expected, there was a bug remaining which explains why things didn't work as > I expected. Can you fetch the branch again (I force-pushed it, sorry for the > inconvenience), apply the hack I mentioned in comment #34 and try again? With > the tests I made locally, I'm down to maybe 10 or 20 g_file_test calls for all > the ithumb_rearrange_thumbnails calls. I will try that. :)
(In reply to comment #40) > It's where we were before. Now the only thing remaining to remove those > unneeded stat() is either never use g_file_test() which will be stupid or make > sure those function calling it does not make unneeded stat() and a cache for > ithmb_rearrange_existing_thumbnails seems to be for me a good solution. Blindly adding a cache would have been a bad solution in that case since the bug in itdb_device_get_file_in_dir would have remained hidden and would have slowed other code paths as well...
(In reply to comment #39) > As expected, there was a bug remaining which explains why things didn't work as > I expected. Can you fetch the branch again (I force-pushed it, sorry for the > inconvenience), apply the hack I mentioned in comment #34 and try again? With > the tests I made locally, I'm down to maybe 10 or 20 g_file_test calls for all > the ithumb_rearrange_thumbnails calls. Well now I feel bad about my lengthy comment. :) I have only one thing to say : Timer 1 elapsed : 0.005 Timer 1 elapsed : 0.003 Timer 1 elapsed : 0.003 Timer 1 elapsed : 0.003 Timer 1 elapsed : 0.003 Timer 1 elapsed : 0.003 Almost perfect :) Well you nailed it good. I think now we might close this bug report. :)
To be honest, we spammed everyone working on an issue unrelated to this bug, now I don't know if this bug can be closed or not ;)
Now what would be great, if you can, is to push this code to Ubuntu 10.4 before the freeze. :)
I can't do anything about that. Especially since itdb_device_resolve_path needs some slight polish before I'm fine with merging it in the main libgpod git repository...
(In reply to comment #44) > To be honest, we spammed everyone working on an issue unrelated to this bug, > now I don't know if this bug can be closed or not ;) Closing then, thanks for the work.