GNOME Bugzilla – Bug 655417
build, tinderbox, list, info commands are slow
Last modified: 2015-01-04 23:23:21 UTC
# time jhbuild info vala real 0m19.755s info & list should take less than 0.5s, not 20s. build and tinderbox are slow too. I have 'use_local_modulesets = True'. But with use_local_modulesets True or False, JHBuild should not be this slow. run & shell commands are OK, they are not slow.
Hmmm...it's not that slow here on my laptop - about 3s. I assume you are on a laptop/workstation too? It looks to me from perf that we spend most of the CPU time in Python parsing XML. Let's look at some profiling: $ python -m cProfile -s cumulative /src/build/jhbuild/bin/jhbuild list ... 5404707 function calls (5403632 primitive calls) in 4.296 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 2/1 0.000 0.000 4.209 4.209 jhbuild:3(<module>) 2/1 0.000 0.000 4.207 4.207 main.py:91(main) 2/1 0.000 0.000 4.207 4.207 __init__.py:110(run) 5 0.000 0.000 4.032 0.806 moduleset.py:52(__init__) 5 0.000 0.000 4.032 0.806 packagedb.py:88(__init__) 5 0.001 0.000 4.032 0.806 packagedb.py:93(_read_cache) 249 0.600 0.002 3.841 0.015 {built-in method Parse} 5 0.000 0.000 3.789 0.758 ElementTree.py:1175(parse) 5 0.001 0.000 3.789 0.758 ElementTree.py:644(parse) 230 0.001 0.000 3.781 0.016 ElementTree.py:1620(feed) 1 0.000 0.000 3.411 3.411 __init__.py:50(execute) 1 0.001 0.001 3.411 3.411 base.py:494(run) 1 0.002 0.002 1.667 1.667 moduleset.py:344(load) 3/1 0.002 0.001 1.666 1.666 moduleset.py:388(_parse_module_set) 158192 0.080 0.000 1.574 0.000 pyexpat.c:566(StartElement) Yep, 95% of the time is parsing the XML file. I'm curious what it is for you, though. So if we want this to be faster, I think we may need to switch to sqlite (or some data format designed for local storage and not interchange).
I have: moduleset = ['gnome-apps-3.2', 'gnome-world-3.2'] python -m cProfile -s cumulative jhbuild list 19280456 function calls (19279157 primitive calls) in 43.798 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 8 0.000 0.000 42.039 5.255 moduleset.py:52(__init__) 8 0.000 0.000 42.038 5.255 packagedb.py:88(__init__) 8 0.006 0.001 42.038 5.255 packagedb.py:93(_read_cache) 794 4.646 0.006 39.681 0.050 {built-in method Parse} 8 0.000 0.000 39.214 4.902 ElementTree.py:1175(parse) 8 0.003 0.000 39.213 4.902 ElementTree.py:644(parse) 752 0.003 0.000 39.178 0.052 ElementTree.py:1620(feed) 2/1 0.001 0.001 38.475 38.475 jhbuild.py:3(<module>) 2/1 0.006 0.003 33.147 33.147 main.py:91(main) 2/1 0.000 0.000 22.562 22.562 __init__.py:110(run) 2/1 0.001 0.001 22.562 22.562 __init__.py:50(execute) 8/2 0.016 0.002 21.590 10.795 moduleset.py:388(_parse_module_set) 2/1 0.001 0.000 17.253 17.253 base.py:494(run) 2/1 0.007 0.003 16.334 16.334 moduleset.py:344(load)
I now understand the profile numbers (I think). 42 seconds in ModuleSet.__init__ which just loads the packagedb.xml. My packagedb.xml is 5.9M big. 1. jhbuild list does not need any info out of the packagedb.xml and should not load it (I raised bug 655501 for this issue). 2. jhbuild info does need install data info from packagedb.xml, but JHBuild design should not be 'parse 5.9M file to get the 1 line of information'. Split the file, use a database, whatever. 3. jhbuild buildone needs to update the filelist, install date, version etc but JHBuild shouldn't need to load the metadata for *every* package. This bug is about point 2 & 3.
Created attachment 192919 [details] [review] Keep manifest in external files Alternatively the manifest part could be moved out of packagedb.xml, to get that file back to a size that didn't cause problems. Attached is a quickly made patch; review most welcome.
Review of attachment 192919 [details] [review]: ::: jhbuild/utils/packagedb.py @@ +82,3 @@ if self.manifest is not None: + if not os.path.exists(self.manifests_dir): + os.makedirs(self.manifests_dir) Why isn't this in PackageDB.__init__()? @@ +84,3 @@ + os.makedirs(self.manifests_dir) + fd = file(os.path.join(self.manifests_dir, self.package), 'w') + fd.write('\n'.join(self.manifest)) We should test that no filenames have '\n' in them (and just throw if they do). Otherwise it's possible for us to attempt to unlink() files at arbitrary paths. And also since you're calling .strip() at read time above, we should do it at write time too for consistency. @@ +85,3 @@ + fd = file(os.path.join(self.manifests_dir, self.package), 'w') + fd.write('\n'.join(self.manifest)) + fd.close() I *strongly* prefer writing data files like this atomically. That means create a temporary file, write it out, call os.fdatasync(fd.fileno()) on it. fd.close(), then os.rename() the temporary file over the real one. (Yes, my laptop crashes periodically, and this is a real problem) See what we do for the main PackageDB XML. @@ +91,3 @@ def __init__(self, dbfile, config): self.dbfile = dbfile + self.manifests_dir = os.path.join(os.path.dirname(dbfile), 'manifests') The main thing this is missing is the transition code; we need to specially detect this case and do a one-time transition out of the XML and into files.
Note from Craig's profiling there is one other bug not addressed here, and what is seriously aggravating the XML - we load the packagedb once per moduleset right now. We should make it a global singleton.
Created attachment 192937 [details] [review] Keep manifests in external files Addressed comments. Loading the packagedb a few times is much less of a problem when it's 15 kilobytes, and not 5 megabytes. (just like bug 655501, it becomes nice to have, not necessary).
Review of attachment 192937 [details] [review]: Looks good, thanks! One procedural note - the file you attached here is the output from "git show", but in order for git to be able to apply the patch it has to be from "git format-patch". You might also try git-bz which makes this easier: http://git.fishsoup.net/cgit/git-bz It wants the bug link at the bottom, like I've been doing. More and more people committing to GNOME are using git-bz since it's just a relatively easy way to put patches in bugzilla for review.
Craig; I am closing this one as performance is back, but if you want you could file another one to implement comment 6.
(In reply to comment #9) > Craig; I am closing this one as performance is back, but if you want you could > file another one to implement comment 6. Thanks for fixing this Frederic. Colin is working on comment 6 in Bug 312910. Great!
This commit causes failures such as: Traceback (most recent call last):
+ Trace 228644
jhbuild.main.main(sys.argv[1:])
rc = jhbuild.commands.run(command, config, args, help=lambda: print_help(parser))
return cmd.execute(config, args, help)
return self.run(config, options, args, help)
return build.build()
error, altphases = module.run_phase(self, phase)
method(buildscript)
self.process_install(buildscript, self.get_revision())
buildscript.moduleset.packagedb.add(self.name, revision or '', destdir)
self._write_cache()
node = entry.to_xml(doc)
os.fdatasync(fd.fileno())
fdatasync(fd) Force write of file with filedescriptor fd to disk. Does not force update of metadata. Availability: Unix. You need to put that in !win !mac code ... i dunno why that isn't implemented for mac, but /shrug