After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 322195 - Yelp startup time is greatly increased when man pages are enabled
Yelp startup time is greatly increased when man pages are enabled
Status: RESOLVED FIXED
Product: yelp
Classification: Applications
Component: Man Pages
git master
Other All
: Normal normal
: ---
Assigned To: Yelp maintainers
Yelp maintainers
Depends on:
Blocks:
 
 
Reported: 2005-11-23 01:23 UTC by Brent Smith (smitten)
Modified: 2005-12-12 04:33 UTC
See Also:
GNOME target: ---
GNOME version: 2.13/2.14


Attachments
simple patch to just print out timing information about process_mandir_pending (1.45 KB, patch)
2005-11-23 01:26 UTC, Brent Smith (smitten)
none Details | Review
patch to add trust API (7.52 KB, patch)
2005-11-23 01:27 UTC, Brent Smith (smitten)
none Details | Review
full blown man page cache patch (20.87 KB, patch)
2005-11-23 01:30 UTC, Brent Smith (smitten)
none Details | Review
updated version of the mancache patch (24.42 KB, patch)
2005-12-10 20:40 UTC, Brent Smith (smitten)
none Details | Review
final mancache patch (23.09 KB, patch)
2005-12-12 04:32 UTC, Brent Smith (smitten)
committed Details | Review

Description Brent Smith (smitten) 2005-11-23 01:23:58 UTC
I'm copying an email I sent to gnome-doc-devel-list:

I've have been looking at fixing the startup time for yelp when
man page support is enabled (as it is by default in Ubuntu).

During my research I found one major bottleneck in when yelp
starts up and there are a large number of man pages to index.
(In my case 40,503 - needless to say I haven't reinstalled
in quite awhile).

The bottleneck occurs in the yelp_doc_info_new function, in a
call to gnome_vfs_make_uri_from_input_with_dirs().  Everytime
this function is called, a system call is made to access the
current working directory and then an attempt to the access
the file.  If you do an "strace -e file" on yelp HEAD, you
will system calls like the following:

getcwd("/extra/cvs/gnome2/yelp-head5", 4096) = 29
access("/extra/cvs/gnome2/yelp-head5/man:/usr/share/man/man1/msgcomm.1.gz",
F_OK) = -1 ENOENT (No such file or directory)

More disk access occurs in the convert_man_uri function, due
to a g_file_test() call:

stat64("/usr/share/man/man1/msgcomm.1.gz", {st_mode=S_IFREG|0644, st_size=1648,
...}) = 0

Since we are reading all the man files from the disk anyway,
there's no reason to check for relative paths, and to see if
the file exists; we know it does - therefore, I made a change to the API
for the following functions:

-static gchar *      convert_man_uri    (gchar   *uri);
+static gchar *      convert_man_uri    (gchar   *uri, gboolean trust_uri);

-yelp_doc_info_new (const gchar *uri)
+yelp_doc_info_new (const gchar *uri, gboolean trust_uri)

-yelp_doc_info_get (const gchar *uri)
+yelp_doc_info_get (const gchar *uri, gboolean trust_uri)

Basically, if we know the uri we are passing exists and is an
absolute uri, we set trust_uri = TRUE; this in turn bypasses
the call to gnome_vfs_make_uri_from_input_with_dirs() with a
call to gnome_vfs_make_uri_from_input(), which doesn't
suffer the performance issues of getting the cwd and checking
for the existence of a relative path.  The g_file_test()
in convert_man_uri is also bypassed.

In addition to this simple change, I've been working on a
cache patch, which on first run creates an index of man pages
in ~/.gnome2/yelp.d/manindex.xml.  Any subsequent running
of yelp will use this file instead of scanning the disk.  In
conjunction with the API changes above, on reboot yelp comes
up pretty quickly (knocked about 8 seconds off from my tests)

The disadvantage with the cache patch is that it doesn't
check the mtime of the directories to see if any new manpages
have been added, and recreate the index.  However, I designed
the patch with this in mind, so it shouldn't be too hard to
add this functionality.

I've attached three patches to this email.

[1] yelp-head-timing.patch
[2] yelp-trust-timing.patch
[3] yelp-mancache-timing.patch

[1] simply adds a function to print timing information on
how long process_mandir_pending takes.

[2] adds the API changes discussed above and the timing print stuff

[3] is the full blown cache, API changes, and timing print stuff

I've included the results from my tests below.  I've performed a
reboot before each test, and than ran yelp twice and recorded the
results.  The cache + API + timing patch is definitely the best
since on reboot it doesn't have to touch the disk to create the
TOC for the man pages.

[1] yelp-head-timing.patch
[2] yelp-trust-timing.patch
[3] yelp-mancache-timing.patch (no cache at ~/.gnome2/yelp.d/manindex.xml)
[4] yelp-mancache-timing.patch (cache present at ~/.gnome2/yelp.d/manindex.xml)

Actual timings after a reboot:
[1] 1st run: elapsed: sec=9, usec=117386
    2nd run: elapsed: sec=0, usec=805525

[2] 1st run: elapsed: sec=1, usec=278251
    2nd run: elapsed: sec=0, usec=652525

[3] 1st run: elapsed: sec=1, usec=497169  (cache file gets created during this run)
    2nd run: elapsed: sec=0, usec=664619  (cache file exists for this run)

[4] 1st run: elapsed: sec=0, usec=766423
    2nd run: elapsed: sec=0, usec=644318


All patches are against yelp HEAD.


Again sorry for the length.  Any comments/testing are greatly
appreciated.
Comment 1 Brent Smith (smitten) 2005-11-23 01:26:44 UTC
Created attachment 55120 [details] [review]
simple patch to just print out timing information about process_mandir_pending
Comment 2 Brent Smith (smitten) 2005-11-23 01:27:24 UTC
Created attachment 55121 [details] [review]
patch to add trust API
Comment 3 Brent Smith (smitten) 2005-11-23 01:30:21 UTC
Created attachment 55122 [details] [review]
full blown man page cache patch

This applies to yelp HEAD as of Nov-23-2005;

Still needs some work;	The cache file needs to have the directory listing in
it
updated when the mtime of that directory has changed.  This will allow the user

to install new man pages and still have an up to date cache file.

Also I've been lazy and the memory structures need to be de-allocated at the
end of process_mandir_pending

I hope to get the above fixed within the next two weeks...
Comment 4 Brent Smith (smitten) 2005-12-10 20:40:34 UTC
Created attachment 55850 [details] [review]
updated version of the mancache patch
Comment 5 Brent Smith (smitten) 2005-12-10 20:44:11 UTC
Here is the updated patch I promised for man page speedups.  

[This is a copy of the email I sent to gnome-doc-devel-list@gnome.org]

As of right now, it handles three cases:

1) no index file exists at ~/.gnome2/yelp.d/manindex.xml

Yelp will traverse the directory structure and create an index file
at this location.  This is the most time intensive case - but I would
say that it is an order of magnitude faster than current yelp HEAD 
since with the patch it only takes about 1.5 seconds for 46,000 man
pages on cold start, where as yelp HEAD takes about 9-10 seconds.

2) index file exists at ~/.gnome2/yelp.d/manindex.xml and is up
   to date

Yelp pulls the location for each man page from the index file and
stores it in the TOC.  This is the fastest case since yelp doesn't
have to touch the disk (except for checking mtimes of directories,
see next case)

3) index file exists at ~/.gnome2/yelp.d/manindex.xml, but it is
   out of date

Yelp compares the modified times for the directories specified
in the file to those on disk.  If the time on disk is newer,
yelp will reindex that directory and update the index file.
While it does this, it also stores the location of each man page
in the TOC.

I would ask that people please test this patch and make sure that
it performs correctly, doesn't crash, etc.  I would like to
commit to head before the 2.13.3 release which is on Monday at
23:59 UTC so as to get maximum testing from the community (yeah,
I know man pages aren't turned on by default but I'm trying to 
change that).

Specifically, the following would be a good test case:
1) compile with --enable-man
2) run yelp once to create the index at ~/.gnome2/yelp.d/manindex.xml
3) install/modify a man page somewhere on your system
4) run yelp once again to create the updated index. Yelp will also  
   backup the old index to manindex.xml.<mtime> - this will be removed
   before committing to HEAD.
5) run a diff -u manindex.xml.<mtime> manindex.xml to make sure that
   a) the directory mtime is updated
   b) the man page that you installed is added to the index (this
      won't happen if you just modified a man page in step 3)
6) Let me know of any problems!

There are still some minor considerations I would like some
feedback on:

1) Is ~/.gnome2/yelp.d/manindex.xml an appropriate location for
   the index?  I think this is the mozilla profile directory, so
   I'm not sure if it makes sense to put it in there.  Yelp 
   stores bookmarks, printing, and geometry files in ~/.gnome2/
   so maybe it makes more sense to put it in there, or create a
   new directory ~/.gnome2/yelp/ for all these files...

2) In the XML file each man page name is enclosed in <page></page>
   tags.  Should the tag name be something smaller, like <p></p>
   in the interest of saving space?  With 46,000 man pages, using
   <page></page> over <p></p> is an extra 6 bytes per man page.
   This is 276,000 byte overhead for the index file... not sure if
   this is significant.

3) ... I'm sure I'll think of more.

Other notes:
 * All the timing stuff needs to be removed before committing to head.
 * ditto for creating a backup of the index file when writing a new one

Thanks,

-- 
Brent Smith <gnome@nextreality.net>
IRC: smitten 
Comment 6 Brent Smith (smitten) 2005-12-12 04:32:45 UTC
Created attachment 55877 [details] [review]
final mancache patch

this was committed to cvs HEAD