GNOME Bugzilla – Bug 136185
FileChooser performance...
Last modified: 2011-02-04 16:16:07 UTC
Bringing up the file chooser for gnumeric in its src directory takes far too long, something like 1s wall clock. Some info: 1. That directory is under NFS. 2. Four files match the filter. 3. There are seven subdirectories. 4. There are ~420 files in total. I am going to attach Quantify logs for the specific time period from clicking "Open" and until the dialog is drawn. Some points: a. There are 1927 calls to stat(), allmost five calls per file! b. 666k calls to strlen seems excessive. c. Letext is a gcc (2.95) bug.
Created attachment 25163 [details] Quantify log
Not sure we can do any real performance work for 2.4.0, but thanks for the log. It would be great if you could do some further investigation of where all the stat() calls are coming from. It is very likely that GtkFileSystemUnix needs to do more caching, as right now it stat()s almost every time you call gtk_file_folder_unix_get_info(). In contrast, GtkFileSystemGnomeVFS does read the entire folder when it is requested, asynchronously, and keeps the info structures around. It has no problems with files changing underneath as it uses FAM, so we get notification about such changes and deal with them properly. Reducing the number of stat() calls should be a priority. I don't think strcmp() is very important to look at right now. The bare framework to do caching in GtkFileSystemUnix is in place, as it does keep the open folders alive. It does this to aggregate requested info types in ::get_folder(). So, implementing a scheme to stat things just once should not be horribly hard. It should be something along these lines: 1. Create an internal FileInfo structure that holds a struct stat buffer and flags to indicate which info types have been read for that file. Add an array of FileInfo structures to GtkFileFolderUnix. 2. get_folder() - No changes should be needed, I think; it already aggregates new info types to the folder's mask. 3. folder_unix_list_children() - This is essentially a readdir(). Store the list in the folder's array of FileInfo structures -- just filenames, with the "available info" flags turned off. * Do we need to opendir() and readdir() every time, so that we can pick up changes to the file sytsem? If we do that, we should update the array of files in such a way that we can emit the appropriate files-added, files-removed signals. 4. folder_unix_get_info() - Return the available info from the FileInfo array if available. If the requested file is not in the array, stat() it, and emit files-added if appropriate. If the requested file is in the array but its available-info flags don't match the folder's potentially more current requested-info flags, re-read the file info and update the flags. 5. render_icon() - we have a simple cache for icons; do we need to move some of this into the FileInfo structures? In summary, GtkFileSystemGnomeVFS should be a good reference for how to structure all of this. The only difference is that there we can get notifications from the file system, and in GtkFileSystemUnix we have to assume that the underlying file system will not notify us about anything.
If you start in /home/foo/bar, it will stat all files in /home/foo/bar [ok], /home/foo, /home, and / [3 * not ok]. NFS based sites can have a _huge_ /home directory and NFS is not a sprinter to begin with --> no good. AFS based sites will be totally killed by this. You just don't stat all the directories in /afs. Given downed sites that will take ages (probably days).
Even with all the caching stuff in place, we should not need to stat() all the way up to the root. What is doing that in the code? [If it is the GtkFileSystemModel used for the folder tree, ignore it; that's going to be changed by a plain list very soon. In any case, you may want to see what's going on in GtkFileSystemModel that is causing everything to be stat()ed.]
(gdb) cond 6 (filename[0] == '/' && filename[1] == 'h' && filename[2] == 'o' && filename[3] == 'm' && filename[4] == 'e' && filename[5] == '/' && filename[6] == 'w' && filename[7] == 'e' && filename[8] == 'l' && filename[9] == 'i' && filename[10] == 'n' && filename[11] == 'd' && filename[12] == 'e' && filename[13] == 'r' && filename[14] == '/' && filename[15] == 'R' && filename[16] == 'M' && filename[17] == 'A' && filename[18] == 'I' && filename[19] == 'L' && filename[20] == 0) (gdb) c Continuing. Breakpoint 6, filename_get_info (filename=0x6a4ca8 "/home/welinder/RMAIL", types=7, error=0x0) at gtkfilesystemunix.c:1388 1388 info = gtk_file_info_new (); (gdb) where
+ Trace 44842
(gdb) p (char*)0x6a2430 $2 = 0x6a2430 "/home/welinder/private" (gdb) p (char*)0x6a3db0 $3 = 0x6a3db0 "/home/welinder/private/gnome" (gdb) p (char*)0x6a4240 $4 = 0x6a4240 "/home/welinder/private/gnome/gnumeric" (gdb) p (char*)0x6a1ba0 $5 = 0x6a1ba0 "/home/welinder/private/gnome/gnumeric/src" ...so it's find_and_ref_path that walks up the tree.
Created attachment 25345 [details] [review] Patch 1
Patch 1 adds a "name_only" argument to get_file_info and makes the latter only get file information when and as needed.
Created attachment 25350 [details] [review] Patch 1a -- variant that moves path checking up from the pathbar
Patch 1 is to be ignored. I already committed Patch 1a.
Created attachment 25505 [details] [review] Patch 2 -- FYI only -- read and cache whole directories
Created attachment 25524 [details] [review] Patch 3 -- another patch to try
Created attachment 25527 [details] [review] Patch 4 -- fully functional patch
Apart from the g_print statements, I think that's an official patch.
Further steps needed (comments welcome): 1. Persist mime type in the same way stat-info is being done in gtkfilesystemunix.c 2. Make get_icon_type use the stat result we already have. The easiest way probably is to persist it at the time we stat.
Created attachment 25532 [details] [review] Patch 5 -- caching mime_type and icon_type too
Those further steps implemented.
Created attachment 25544 [details] [review] Patch 5a -- updated to resolve conflicts with jrb's recent check-in
Created attachment 25655 [details] [review] Path 6 -- Updated and merged patch
Committed to CVS. I'm going to close the bug and open new ones for the missing notification about changes.