GNOME Bugzilla – Bug 322195
Yelp startup time is greatly increased when man pages are enabled
Last modified: 2005-12-12 04:33:35 UTC
I'm copying an email I sent to gnome-doc-devel-list: I've have been looking at fixing the startup time for yelp when man page support is enabled (as it is by default in Ubuntu). During my research I found one major bottleneck in when yelp starts up and there are a large number of man pages to index. (In my case 40,503 - needless to say I haven't reinstalled in quite awhile). The bottleneck occurs in the yelp_doc_info_new function, in a call to gnome_vfs_make_uri_from_input_with_dirs(). Everytime this function is called, a system call is made to access the current working directory and then an attempt to the access the file. If you do an "strace -e file" on yelp HEAD, you will system calls like the following: getcwd("/extra/cvs/gnome2/yelp-head5", 4096) = 29 access("/extra/cvs/gnome2/yelp-head5/man:/usr/share/man/man1/msgcomm.1.gz", F_OK) = -1 ENOENT (No such file or directory) More disk access occurs in the convert_man_uri function, due to a g_file_test() call: stat64("/usr/share/man/man1/msgcomm.1.gz", {st_mode=S_IFREG|0644, st_size=1648, ...}) = 0 Since we are reading all the man files from the disk anyway, there's no reason to check for relative paths, and to see if the file exists; we know it does - therefore, I made a change to the API for the following functions: -static gchar * convert_man_uri (gchar *uri); +static gchar * convert_man_uri (gchar *uri, gboolean trust_uri); -yelp_doc_info_new (const gchar *uri) +yelp_doc_info_new (const gchar *uri, gboolean trust_uri) -yelp_doc_info_get (const gchar *uri) +yelp_doc_info_get (const gchar *uri, gboolean trust_uri) Basically, if we know the uri we are passing exists and is an absolute uri, we set trust_uri = TRUE; this in turn bypasses the call to gnome_vfs_make_uri_from_input_with_dirs() with a call to gnome_vfs_make_uri_from_input(), which doesn't suffer the performance issues of getting the cwd and checking for the existence of a relative path. The g_file_test() in convert_man_uri is also bypassed. In addition to this simple change, I've been working on a cache patch, which on first run creates an index of man pages in ~/.gnome2/yelp.d/manindex.xml. Any subsequent running of yelp will use this file instead of scanning the disk. In conjunction with the API changes above, on reboot yelp comes up pretty quickly (knocked about 8 seconds off from my tests) The disadvantage with the cache patch is that it doesn't check the mtime of the directories to see if any new manpages have been added, and recreate the index. However, I designed the patch with this in mind, so it shouldn't be too hard to add this functionality. I've attached three patches to this email. [1] yelp-head-timing.patch [2] yelp-trust-timing.patch [3] yelp-mancache-timing.patch [1] simply adds a function to print timing information on how long process_mandir_pending takes. [2] adds the API changes discussed above and the timing print stuff [3] is the full blown cache, API changes, and timing print stuff I've included the results from my tests below. I've performed a reboot before each test, and than ran yelp twice and recorded the results. The cache + API + timing patch is definitely the best since on reboot it doesn't have to touch the disk to create the TOC for the man pages. [1] yelp-head-timing.patch [2] yelp-trust-timing.patch [3] yelp-mancache-timing.patch (no cache at ~/.gnome2/yelp.d/manindex.xml) [4] yelp-mancache-timing.patch (cache present at ~/.gnome2/yelp.d/manindex.xml) Actual timings after a reboot: [1] 1st run: elapsed: sec=9, usec=117386 2nd run: elapsed: sec=0, usec=805525 [2] 1st run: elapsed: sec=1, usec=278251 2nd run: elapsed: sec=0, usec=652525 [3] 1st run: elapsed: sec=1, usec=497169 (cache file gets created during this run) 2nd run: elapsed: sec=0, usec=664619 (cache file exists for this run) [4] 1st run: elapsed: sec=0, usec=766423 2nd run: elapsed: sec=0, usec=644318 All patches are against yelp HEAD. Again sorry for the length. Any comments/testing are greatly appreciated.
Created attachment 55120 [details] [review] simple patch to just print out timing information about process_mandir_pending
Created attachment 55121 [details] [review] patch to add trust API
Created attachment 55122 [details] [review] full blown man page cache patch This applies to yelp HEAD as of Nov-23-2005; Still needs some work; The cache file needs to have the directory listing in it updated when the mtime of that directory has changed. This will allow the user to install new man pages and still have an up to date cache file. Also I've been lazy and the memory structures need to be de-allocated at the end of process_mandir_pending I hope to get the above fixed within the next two weeks...
Created attachment 55850 [details] [review] updated version of the mancache patch
Here is the updated patch I promised for man page speedups. [This is a copy of the email I sent to gnome-doc-devel-list@gnome.org] As of right now, it handles three cases: 1) no index file exists at ~/.gnome2/yelp.d/manindex.xml Yelp will traverse the directory structure and create an index file at this location. This is the most time intensive case - but I would say that it is an order of magnitude faster than current yelp HEAD since with the patch it only takes about 1.5 seconds for 46,000 man pages on cold start, where as yelp HEAD takes about 9-10 seconds. 2) index file exists at ~/.gnome2/yelp.d/manindex.xml and is up to date Yelp pulls the location for each man page from the index file and stores it in the TOC. This is the fastest case since yelp doesn't have to touch the disk (except for checking mtimes of directories, see next case) 3) index file exists at ~/.gnome2/yelp.d/manindex.xml, but it is out of date Yelp compares the modified times for the directories specified in the file to those on disk. If the time on disk is newer, yelp will reindex that directory and update the index file. While it does this, it also stores the location of each man page in the TOC. I would ask that people please test this patch and make sure that it performs correctly, doesn't crash, etc. I would like to commit to head before the 2.13.3 release which is on Monday at 23:59 UTC so as to get maximum testing from the community (yeah, I know man pages aren't turned on by default but I'm trying to change that). Specifically, the following would be a good test case: 1) compile with --enable-man 2) run yelp once to create the index at ~/.gnome2/yelp.d/manindex.xml 3) install/modify a man page somewhere on your system 4) run yelp once again to create the updated index. Yelp will also backup the old index to manindex.xml.<mtime> - this will be removed before committing to HEAD. 5) run a diff -u manindex.xml.<mtime> manindex.xml to make sure that a) the directory mtime is updated b) the man page that you installed is added to the index (this won't happen if you just modified a man page in step 3) 6) Let me know of any problems! There are still some minor considerations I would like some feedback on: 1) Is ~/.gnome2/yelp.d/manindex.xml an appropriate location for the index? I think this is the mozilla profile directory, so I'm not sure if it makes sense to put it in there. Yelp stores bookmarks, printing, and geometry files in ~/.gnome2/ so maybe it makes more sense to put it in there, or create a new directory ~/.gnome2/yelp/ for all these files... 2) In the XML file each man page name is enclosed in <page></page> tags. Should the tag name be something smaller, like <p></p> in the interest of saving space? With 46,000 man pages, using <page></page> over <p></p> is an extra 6 bytes per man page. This is 276,000 byte overhead for the index file... not sure if this is significant. 3) ... I'm sure I'll think of more. Other notes: * All the timing stuff needs to be removed before committing to head. * ditto for creating a backup of the index file when writing a new one Thanks, -- Brent Smith <gnome@nextreality.net> IRC: smitten
Created attachment 55877 [details] [review] final mancache patch this was committed to cvs HEAD