GNOME Bugzilla – Bug 543489
slow xslt processing
Last modified: 2014-02-08 20:32:52 UTC
xsltproc is very slow -- the slowest part of building all of glib, for example. it's also the only part that has to be done completely serially. gtk-doc should somehow support parallel building so that it can use all my cores
There is some help comming. I could e.g. cut down the build time fro gtk's gtk-doc from 20 to about 9 minutes. Regarding the -j options I take patches :)
Also to clarify, if you want xsltproc to use your cores, file a bug against libxslt please.
Right now there are bugs in the Makefile dependencies ./autogen.sh --enable-gtk-doc make -j2 ... make[2]: Entering directory `/xxx/docs' Making all in api make[3]: Entering directory `/xxx/docs/api' make[3]: *** No rule to make target `xxx-docs.sgml', needed by `html-build.stamp'. Stop. make[3]: *** Waiting for unfinished jobs.... make[3]: Leaving directory `/xxx/docs/api' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/xxx/docs'
*** Bug 592355 has been marked as a duplicate of this bug. ***
This change fixes parallel builds for me - not sure if it is correct though diff --git a/tests/gtk-doc.make b/tests/gtk-doc.make index 61a9eac..9d59689 100644 --- a/tests/gtk-doc.make +++ b/tests/gtk-doc.make @@ -102,7 +102,7 @@ sgml-build.stamp: tmpl.stamp $(DOC_MODULE)-sections.txt $(srcdir)/tmpl/*.sgml $( gtkdoc-mkdb --module=$(DOC_MODULE) --source-dir=$(DOC_SOURCE_DIR) --output-format=xml --expand-content-files="$(expand_content_files)" --main-sgml-file=$(DOC_MAIN_SGML_FILE) $(MKDB_OPTIONS) @touch sgml-build.stamp -sgml.stamp: sgml-build.stamp +sgml.stamp $(DOC_MAIN_SGML_FILE): sgml-build.stamp @true #### html ####
when i filed this bug i was thinking more like some way of breaking up the XSL templates so that you can generate the .html output one file at a time. that way you could run multiple html files in parallel. it's probably a fairly substantial change.
> you can generate the .html output one file at a time This would e.g. mean that you have resolve all cross-references in the XML manually, find a way to correctly generate index, toc, etc. without processing the entire document. That's not a `fairly substantial change', that's pretty insane.
Ryan, I have made some attempt regarding this already, but it is quite complex. I wanted to rebuild single html page when the related xml file has changes to avoid rebuilding all files. This can only work under some assumptions: - one has to uses the xi-include indexes (not he xsl generated ones) - there should be no autogenerated ids Getting the makefiles rules right is tricky too. Before we are going there it would still be good to poke the libxml/libxslt people more and tell them we are not satisfied with the performances. E.g. having a means to cache a parsed style sheet could help us (as we can't keep it loaded like webapps do).
There much more assumptions. If the title, indexable stuff (including Since status, added functions, ...), object hierarchy, ..., changes, it is not sufficient to [re]build a single HTML page.
David, if you change one header, gtkdoc-mkdb would rebuild: - one or more xml/<section>.xml files - eventualy also indexes such as xml/{tree_index,api_index,...}.xml In most cases it would be enough to rebuild the html files. If the hierarchy changes, gtkdoc-mkdb would change the xml files of objects that are affected (new prerequisite iface, new object in hierarchy).
The trouble is that at this moment you can use any inference and content generation mechanisms DocBook offers. This will be lost. The processing is inherently global and I don't think is it reasonable to fight this. Consider also the single-page, PDF, man and other processing options... The questions should be: 1) why can't xsltproc itself make use of multiple threads? 2) why it takes so long -- even compared to typical compilation of other DocBook documents? We cannot do much with 1) but maybe we can do something with 2).
I will resubscribe to xml-devel list and try to get some answers.
I renamed the bug as imho there is nothing wrong with our makefile rules. Also a small status update: libxml/libxslt profiling - some functions show up hight in the profile - no easy candidates for optimizations - I tried addings G_LIKELY/UNLIKELY macros for libxml2, but it speeds up by 1% if at all - there is a *lot* of memcpy and strcmp as expected docbook stylesheet - if would be nice to have a mean to generate a variant of the official stylesheets, with customizations preapplied xslt compilers - I tried some xslt compilers, but did not even succeded to built them, old crufty c++ code :/ multithreading - after chunking the outputs could be generated by multiple worker threads (one shared readonly source document, multiple output files). I've asked on libxslt list - no reply.
Closing this now. if we want faster doc build, we need to speedup xslt proc (e.g. make the chunker multithreaded).