After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 646820 - Increase speed of gmmproc
Increase speed of gmmproc
Status: RESOLVED WONTFIX
Product: glibmm
Classification: Bindings
Component: build
2.27.x
Other All
: Normal enhancement
: ---
Assigned To: gtkmm-forge
gtkmm-forge
Depends on:
Blocks:
 
 
Reported: 2011-04-05 14:09 UTC by Kjell Ahlstedt
Modified: 2011-07-19 14:42 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch: Increased speed of gmmproc (glibmm) (33.05 KB, patch)
2011-04-16 19:45 UTC, Kjell Ahlstedt
none Details | Review
patch: Increased speed of gmmproc (mm-common) (5.18 KB, patch)
2011-04-16 19:49 UTC, Kjell Ahlstedt
none Details | Review

Description Kjell Ahlstedt 2011-04-05 14:09:53 UTC
I've noticed that when you build e.g. gtkmm, the generation of .h/.cc files
from .hg/.ccg files is a slow process. It takes more time than the compilation
of the .h/.cc files. I suspected that gmmproc spends most its time reading the
.defs and _docs.xml files. This is done once for each processed .hg file.

I made a test, where I let gmmproc process all .hg/.ccg files in gtkmm/gtk/src
(181 pairs of files). I compared the present version of gmmproc and a version
where gmmproc is given a file containing a list of filenames, names of the
files to process. It then reads the .defs and _docs.xml files only once.
The result, when run on my PC, is:

   Present gmmproc:  13 min 25 s
   Modified gmmproc:  1 min 17 s

The total size of the .defs and _docs.xml files in gtkmm/gtk/src is about 4 MB,
so it's perhaps not surprising that you can save time by reading them once
instead of 181 times.

My modified version of gmmproc is a quick and dirty fix, not suitable for
distribution. And to make use of such a feature in gmmproc, the Makefiles must
also be modified.

This matter has been discussed in a thread on gtkmm-list:
http://mail.gnome.org/archives/gtkmm-list/2011-April/msg00018.html

Krzesimir Nowak has mentioned a few points that must be considered:
- 'make' can start concurrent jobs (argument --jobs).
- The stamp files are now handled by make. With gmmproc processing several
  files in one invocation, gmmproc must probably take over the responsibility
  for the stamp files.
- gmmproc must handle interrupts (e.g. Ctrl+C) gracefully.
Comment 1 Kjell Ahlstedt 2011-04-11 14:50:55 UTC
There is another way of speeding up gmmproc, mentioned by Krzysztof Kosiński at
http://mail.gnome.org/archives/gtkmm-list/2011-April/msg00019.html.

  "To get the best of both worlds, there could be an extra program which
  reads the XML files and outputs a binary file which maps directly to
  the data structures used in gmmproc. Each instance of gmmproc could
  then use the binary files, which would avoid the parsing overhead on
  every run."

I tested with the Perl functions Storable::store() and Storable::retrieve().
Processing times on my PC for this test, and the 2 tests mentioned earlier:

   Present gmmproc:                                13 min 25 s
   All files in one invocation:                     1 min 17 s
   One file per inv., defs and xml in binary file:  2 min 13 s

Using Storable::store() and Storable::retrieve() is a good compromise between
high speed and only moderate complication of gmmproc and the Makefiles.

It would be complicated to combine the many-files-per-invocation version of
gmmproc with letting make start concurrent jobs. I don't want to ruin the
possibility to use concurrent jobs for those who can make good use of it in a
multi-core processor. 

I'd prefer to let gmmproc itself write the binary file, rather than adding an
extra program. That can be part of 'make' (='make all') in the src directories.
New flags can be added to gmmproc, telling it what to do. In gtkmm/gtk/src it
takes about 4 s to read the defs and xml files once and write the binary file.
Comment 2 Kjell Ahlstedt 2011-04-16 19:45:55 UTC
Created attachment 186101 [details] [review]
patch: Increased speed of gmmproc (glibmm)

Here's a patch that will increase the speed of gmmproc. It writes and reads a
binary file with the information from the .defs and .xml files.

New options in gmmproc:
  --bin dir   Read defs info from a binary file, if possible.
  --writebin  Write a binary file with info read from name.defs,
              name_docs.xml, name_docs_override.xml.

If gmmproc is called without any of these options, it functions as before,
reading from name.defs, name_docs.xml, name_docs_override.xml.

In order to activate the new function, a new version of mm-common/build/
generate-binding.am is necessary. See comment 3.
Comment 3 Kjell Ahlstedt 2011-04-16 19:49:56 UTC
Created attachment 186106 [details] [review]
patch: Increased speed of gmmproc (mm-common)

This is the new version of mm-common/build/generate-binding.am.

This patch is not as backwards compatible as the glibmm patch in comment 2.
When this version of generate-binding.am is copied to a module, e.g. by
mm-common-prepare in autogen.sh, the new function is activated in that module.
There's no easy way to avoid that. Is this behaviour acceptable?

The glibmm patch can be installed without installing the mm-common patch. Then
no one will notice any change.
The mm-common patch shall not be installed without also installing the glibmm
patch.

There are three new overridable make variables with default values:
  defs_name        Base name of defs files. Default: binding_name with the
                   trailing mm removed (e.g. gtkmm -> gtk).
  defs_bindir      Where to put the binary defs file. Default: binding_stampdir
  files_codegen_pm Perl source files for gmmproc.
                   Default: $(GMMPROC_DIR)/gmmproc $(GMMPROC_DIR)/pm/*.pm

The default value of files_codegen_pm is probably correct for all modules
except glibmm, which uses uninstalled tools files.

The default value of defs_bindir is probably acceptable for all modules. Or
isn't it?

The default value of defs_name is correct for many modules, but not for all.
E.g. in gstreamermm the correct value is defs_name=gst, and in goocanvasmm it's
defs_name=libgoocanvas. When those modules are built, there will be some
warnings about not finding the files to read, and gmmproc will continue to read
from the defs and xml files, until the correct value of defs_name has been set
in the Makefile.am that includes generate-binding.am.
Comment 4 Murray Cumming 2011-04-16 19:59:24 UTC
How much faster is this, for instance, when building gtkmm from scratch? Actual times for the complete "make all" would be nice, please.
Comment 5 Kjell Ahlstedt 2011-04-18 08:14:00 UTC
(In reply to comment #4)
> How much faster is this, for instance, when building gtkmm from scratch? Actual
> times for the complete "make all" would be nice, please.

Present gmmproc, but without 'use open IO => ":utf8";' in GtkDefs.pm:
  real  24m 31.252s
  user  22m 35.077s
  sys    1m  7.412s

gmmproc in comments 2 and 3, without 'use open IO => ":utf8";' in GtkDefs.pm:
  real  13m 57.591s
  user  12m 14.002s
  sys    1m  6.072s

According to an ongoing email conversation, 'use open IO => ":utf8";' in
GtkDefs.pm (added in bug #644037) can cause a enormous increase in execution
time with one particular version of Perl. In my system (Perl v5.10.1 on Ubuntu
10.10) it has only a negligible influence on the execution time. (Perhaps +10 s
to the present gmmproc, +0.2 s to the proposed new gmmproc)
Comment 6 Murray Cumming 2011-05-05 09:51:47 UTC
Do these numbers need to be updated now that the change in bug #644037 has gone in?
Comment 7 Kjell Ahlstedt 2011-05-06 18:27:38 UTC
Here are the results from new time measurements.

Present gmmproc (with patch in bug 644037 comment 10):
  real  24m 27.937s
  user  21m 28.873s
  sys    1m 39.110s

gmmproc in comments 2 and 3:
  real  15m  5.547s
  user  12m 58.533s
  sys    1m 24.241s

Between the two sets of measurements (the one in comment 5 and the one in this
comment) I have upgraded from Ubuntu 10.10 to 11.04. The Perl version did not
change (still 5.10.1), but the gcc compiler was upgraded from 4.4.5 to 4.5.2.
Thus the differences between comments 5 and 7 are not due entirely to the
change of gmmproc in bug 644037.

The difference between the present gmmproc and the proposed future one has
decreased. The patch in bug 644037 comment 10 made gmmproc slightly faster,
but it has a negligible effect on the execution time after the patches in this
bug are applied (at least with my version of Perl).
Comment 8 Murray Cumming 2011-07-19 09:48:30 UTC
I'm hesitant to add the complexity. It's nice to win 10 minutes back, but I don't really feel it's worth it. Does anyone want to persuade me that this makes a big difference to them?
Comment 9 Kjell Ahlstedt 2011-07-19 14:42:56 UTC
(In reply to comment #8)
> I'm hesitant to add the complexity. It's nice to win 10 minutes back, but I
> don't really feel it's worth it.

I'm inclined to agree.

My first measurement of execution times, when the time for gmmproc was
reduced from 13 minutes to 1 minute, looked very encouraging. But I couldn't
combine that version of gmmproc with concurrent jobs in 'make'. The version of
gmmproc that _can_ be executed concurrently, is not quite as fast, and when
the time for compilation is added, the difference between the present gmmproc
and the one we could get, looks much more moderate.