After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 655417 - build, tinderbox, list, info commands are slow
build, tinderbox, list, info commands are slow
Status: RESOLVED FIXED
Product: jhbuild
Classification: Infrastructure
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: Jhbuild maintainers
Jhbuild QA
Depends on:
Blocks:
 
 
Reported: 2011-07-27 12:40 UTC by Craig Keogh
Modified: 2015-01-04 23:23 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Keep manifest in external files (4.43 KB, patch)
2011-07-30 19:26 UTC, Frederic Peters
reviewed Details | Review
Keep manifests in external files (4.84 KB, patch)
2011-07-31 06:34 UTC, Frederic Peters
committed Details | Review

Description Craig Keogh 2011-07-27 12:40:09 UTC
# time jhbuild info vala
real	0m19.755s

info & list should take less than 0.5s, not 20s. build and tinderbox are slow too.

I have 'use_local_modulesets = True'. But with use_local_modulesets True or False, JHBuild should not be this slow.

run & shell commands are OK, they are not slow.
Comment 1 Colin Walters 2011-07-27 18:19:41 UTC
Hmmm...it's not that slow here on my laptop - about 3s.  I assume you are on a laptop/workstation too?

It looks to me from perf that we spend most of the CPU time in Python parsing XML.  Let's look at some profiling:

$ python -m cProfile -s cumulative /src/build/jhbuild/bin/jhbuild list
...
         5404707 function calls (5403632 primitive calls) in 4.296 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      2/1    0.000    0.000    4.209    4.209 jhbuild:3(<module>)
      2/1    0.000    0.000    4.207    4.207 main.py:91(main)
      2/1    0.000    0.000    4.207    4.207 __init__.py:110(run)
        5    0.000    0.000    4.032    0.806 moduleset.py:52(__init__)
        5    0.000    0.000    4.032    0.806 packagedb.py:88(__init__)
        5    0.001    0.000    4.032    0.806 packagedb.py:93(_read_cache)
      249    0.600    0.002    3.841    0.015 {built-in method Parse}
        5    0.000    0.000    3.789    0.758 ElementTree.py:1175(parse)
        5    0.001    0.000    3.789    0.758 ElementTree.py:644(parse)
      230    0.001    0.000    3.781    0.016 ElementTree.py:1620(feed)
        1    0.000    0.000    3.411    3.411 __init__.py:50(execute)
        1    0.001    0.001    3.411    3.411 base.py:494(run)
        1    0.002    0.002    1.667    1.667 moduleset.py:344(load)
      3/1    0.002    0.001    1.666    1.666 moduleset.py:388(_parse_module_set)
   158192    0.080    0.000    1.574    0.000 pyexpat.c:566(StartElement)


Yep, 95% of the time is parsing the XML file.

I'm curious what it is for you, though.

So if we want this to be faster, I think we may need to switch to sqlite (or some data format designed for local storage and not interchange).
Comment 2 Craig Keogh 2011-07-28 11:46:14 UTC
I have:
moduleset = ['gnome-apps-3.2', 'gnome-world-3.2']

python -m cProfile -s cumulative jhbuild list

         19280456 function calls (19279157 primitive calls) in 43.798 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        8    0.000    0.000   42.039    5.255 moduleset.py:52(__init__)
        8    0.000    0.000   42.038    5.255 packagedb.py:88(__init__)
        8    0.006    0.001   42.038    5.255 packagedb.py:93(_read_cache)
      794    4.646    0.006   39.681    0.050 {built-in method Parse}
        8    0.000    0.000   39.214    4.902 ElementTree.py:1175(parse)
        8    0.003    0.000   39.213    4.902 ElementTree.py:644(parse)
      752    0.003    0.000   39.178    0.052 ElementTree.py:1620(feed)
      2/1    0.001    0.001   38.475   38.475 jhbuild.py:3(<module>)
      2/1    0.006    0.003   33.147   33.147 main.py:91(main)
      2/1    0.000    0.000   22.562   22.562 __init__.py:110(run)
      2/1    0.001    0.001   22.562   22.562 __init__.py:50(execute)
      8/2    0.016    0.002   21.590   10.795 moduleset.py:388(_parse_module_set)
      2/1    0.001    0.000   17.253   17.253 base.py:494(run)
      2/1    0.007    0.003   16.334   16.334 moduleset.py:344(load)
Comment 3 Craig Keogh 2011-07-28 12:13:22 UTC
I now understand the profile numbers (I think).
42 seconds in ModuleSet.__init__ which just loads the packagedb.xml. My packagedb.xml is 5.9M big.

1. jhbuild list does not need any info out of the packagedb.xml and should not load it (I raised bug 655501 for this issue).
2. jhbuild info does need install data info from packagedb.xml, but JHBuild design should not be 'parse 5.9M file to get the 1 line of information'. Split the file, use a database, whatever.
3. jhbuild buildone needs to update the filelist, install date, version etc but JHBuild shouldn't need to load the metadata for *every* package.

This bug is about point 2 & 3.
Comment 4 Frederic Peters 2011-07-30 19:26:54 UTC
Created attachment 192919 [details] [review]
Keep manifest in external files

Alternatively the manifest part could be moved out of packagedb.xml, to get that file back to a size that didn't cause problems. Attached is a quickly made patch; review most welcome.
Comment 5 Colin Walters 2011-07-30 20:11:00 UTC
Review of attachment 192919 [details] [review]:

::: jhbuild/utils/packagedb.py
@@ +82,3 @@
         if self.manifest is not None:
+            if not os.path.exists(self.manifests_dir):
+                os.makedirs(self.manifests_dir)

Why isn't this in PackageDB.__init__()?

@@ +84,3 @@
+                os.makedirs(self.manifests_dir)
+            fd = file(os.path.join(self.manifests_dir, self.package), 'w')
+            fd.write('\n'.join(self.manifest))

We should test that no filenames have '\n' in them (and just throw if they do).  Otherwise it's possible for us to attempt to unlink() files at arbitrary paths.

And also since you're calling .strip() at read time above, we should do it at write time too for consistency.

@@ +85,3 @@
+            fd = file(os.path.join(self.manifests_dir, self.package), 'w')
+            fd.write('\n'.join(self.manifest))
+            fd.close()

I *strongly* prefer writing data files like this atomically.  That means create a temporary file, write it out, call os.fdatasync(fd.fileno()) on it. fd.close(), then os.rename() the temporary file over the real one.

(Yes, my laptop crashes periodically, and this is a real problem)

See what we do for the main PackageDB XML.

@@ +91,3 @@
     def __init__(self, dbfile, config):
         self.dbfile = dbfile
+        self.manifests_dir = os.path.join(os.path.dirname(dbfile), 'manifests')

The main thing this is missing is the transition code; we need to specially detect this case and do a one-time transition out of the XML and into files.
Comment 6 Colin Walters 2011-07-30 20:12:15 UTC
Note from Craig's profiling there is one other bug not addressed here, and what is seriously aggravating the XML - we load the packagedb once per moduleset right now.

We should make it a global singleton.
Comment 7 Frederic Peters 2011-07-31 06:34:33 UTC
Created attachment 192937 [details] [review]
Keep manifests in external files

Addressed comments.

Loading the packagedb a few times is much less of a problem when it's 15 kilobytes, and not 5 megabytes. (just like bug 655501, it becomes nice to have, not necessary).
Comment 8 Colin Walters 2011-07-31 13:53:03 UTC
Review of attachment 192937 [details] [review]:

Looks good, thanks!

One procedural note - the file you attached here is the output from "git show", but in order for git to be able to apply the patch it has to be from "git format-patch".

You might also try git-bz which makes this easier: http://git.fishsoup.net/cgit/git-bz

It wants the bug link at the bottom, like I've been doing.  More and more people committing to GNOME are using git-bz since it's just a relatively easy way to put patches in bugzilla for review.
Comment 9 Frederic Peters 2011-07-31 16:55:12 UTC
Craig; I am closing this one as performance is back, but if you want you could file another one to implement comment 6.
Comment 10 Craig Keogh 2011-07-31 23:34:24 UTC
(In reply to comment #9)
> Craig; I am closing this one as performance is back, but if you want you could
> file another one to implement comment 6.

Thanks for fixing this Frederic. Colin is working on comment 6 in Bug 312910. Great!
Comment 11 Jeremy Huddleston 2011-09-29 16:49:22 UTC
This commit causes failures such as:


Traceback (most recent call last):
  • File "/usr/local/bin/jhbuild", line 32 in <module>
    jhbuild.main.main(sys.argv[1:])
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/main.py", line 147 in main
    rc = jhbuild.commands.run(command, config, args, help=lambda: print_help(parser))
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/commands/__init__.py", line 123 in run
    return cmd.execute(config, args, help)
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/commands/__init__.py", line 52 in execute
    return self.run(config, options, args, help)
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/commands/autobuild.py", line 84 in run
    return build.build()
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/frontends/buildscript.py", line 152 in build
    error, altphases = module.run_phase(self, phase)
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/modtypes/__init__.py", line 306 in run_phase
    method(buildscript)
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/modtypes/autotools.py", line 292 in do_install
    self.process_install(buildscript, self.get_revision())
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/modtypes/__init__.py", line 239 in process_install
    buildscript.moduleset.packagedb.add(self.name, revision or '', destdir)
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/utils/packagedb.py", line 198 in add
    self._write_cache()
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/utils/packagedb.py", line 145 in _write_cache
    node = entry.to_xml(doc)
  • File "/Users/jeremy/src/freedesktop/src/jhbuild/jhbuild/utils/packagedb.py", line 108 in to_xml
    os.fdatasync(fd.fileno())
AttributeError: 'module' object has no attribute 'fdatasync'

Comment 12 Jeremy Huddleston 2011-09-29 16:51:40 UTC
fdatasync(fd)
Force write of file with filedescriptor fd to disk. Does not force update of metadata. Availability: Unix.

You need to put that in !win !mac code ... i dunno why that isn't implemented for mac, but /shrug