GNOME Bugzilla – Bug 544906
Very slow (4 min) transforming one element XML file with xsltproc
Last modified: 2008-07-27 16:13:01 UTC
This bug may possibly be related to the reports in bugs 165107 and 311857, although the only real apparent commonality is the slowness. The problem occurs in both 32-bit and 64-bit Linux plus Windows (at least CP) systems. The information herein was obtained on a 64-bit Athlon 3500+ running Fedora 9. The bug was first observed using XML Copy Editor (http://xml-copy-editor.sourceforge.net/) in conjunction witht eh IBM developerWorks XML author templates (download using Download link in http://www.ibm.com/developerworks/library/i-dwauthors/), where it took over 4 minutes to transform the basic article template (obtain this by running the dw-article.vbs or dw-article.sh in the above download). I have stripped down the set of stylesheets from the author package and run xsltproc with various options. I have also stripped down the XML file to the point that it contains only a single element. <?xml-stylesheet type="text/xsl" href="../xsl/5.10/dw-document-html-5.10.xsl"?> <dw-document xmlns:dw="http://www.ibm.com/developerWorks/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../schema/5.10/dw-document-5.10.xsd"> </dw-document> The stripped down stylesheets now only emit a single comment as processing for the top-level element in the XML file. After all this reduction, it appears that xsltproc parses the stylesheet almost instantly, then spends over 4 minutes doing something (compiling the stylesheet perhaps?), then processes the input XML, again nearly instantaneously. I wrote a basic script to put timer ticks on the console every 30 seconds while the xsltproc is running to illustrate where the problem occurs, and to give comparison time with IBM Java and Xalan. #!/bin/bash echo "" echo "$(date) Xalan Java transform start" echo "$(date) Running time /opt/java2-i386-50/bin/java org.apache.xalan.xslt.Process -IN index.xml -XSL main-stylesheet.xsl -OUT index.html" time /opt/ibm/java2-i386-50/bin/java org.apache.xalan.xslt.Process -IN index.xml -XSL main-stylesheet.xsl -OUT index.html echo "$(date) Xalan Java transform end" echo "$(date) background timer start" for n in $(seq 1 10); do echo "$(date) bg tick $n";sleep 30;done& echo "$(date) xsltproc transform start" echo "$(date) Running time xsltproc --timing --profile main-stylesheet.xsl index.xml " time xsltproc --timing --profile main-stylesheet.xsl index.xml echo "$(date) xsltproc transform end" Output from thsi si as follows: Sat Jul 26 18:08:30 EDT 2008 Xalan Java transform start Sat Jul 26 18:08:30 EDT 2008 Running time /opt/java2-i386-50/bin/java org.apache.xalan.xslt.Process -IN index.xml -XSL main-stylesheet.xsl -OUT index.html real 0m2.703s user 0m2.391s sys 0m0.277s Sat Jul 26 18:08:33 EDT 2008 Xalan Java transform end Sat Jul 26 18:08:33 EDT 2008 background timer start Sat Jul 26 18:08:33 EDT 2008 xsltproc transform start Sat Jul 26 18:08:33 EDT 2008 bg tick 1 Sat Jul 26 18:08:33 EDT 2008 Running time xsltproc --timing --profile main-stylesheet.xsl index.xml Parsing stylesheet main-stylesheet.xsl took 0 ms Sat Jul 26 18:09:03 EDT 2008 bg tick 2 Sat Jul 26 18:09:33 EDT 2008 bg tick 3 Sat Jul 26 18:10:03 EDT 2008 bg tick 4 Sat Jul 26 18:10:33 EDT 2008 bg tick 5 Sat Jul 26 18:11:03 EDT 2008 bg tick 6 Sat Jul 26 18:11:33 EDT 2008 bg tick 7 Sat Jul 26 18:12:03 EDT 2008 bg tick 8 Sat Jul 26 18:12:33 EDT 2008 bg tick 9 Sat Jul 26 18:13:03 EDT 2008 bg tick 10 Parsing document index.xml took 0 ms number match name mode Calls Tot 100us Avg 0 / 1 2 2 Total 1 2 Applying stylesheet took 4 ms <!--Do nothing much--> Saving result took 0 ms real 4m36.223s user 4m22.007s sys 0m0.521s Sat Jul 26 18:13:09 EDT 2008 xsltproc transform end You can see the 4.5 minute delay between the xsltproc output times for parsing stylesheet and parsing document. Note that this process is using about 98% CPU during all this time. The stripped-down XML files and XSL files are at http://www.cybershields.com/testcase1.zip Ian Shields.
There is a loop when compiling the stylesheet about excluding result prefix xsl and fo . Just run with xsltproc -v to trace the loop (I gave that option on the oo.o mailing-list BTW). So clearly there is a bug there happening at stylesheet compilation time. After a bit of debugging, it's just that in the stylesheets there is lots of redundants xsl:exclude-result-prefixes="xsl fo" , not a mistake in itself but which generated a quadratic behaviour in libxslt at stylesheet compile time. Adding just a little check to avoid redefining duplicated excluded namespaces fixes this: paphio:~/tmp/544906 -> ~veillard/XSLT/xsltproc/xsltproc --timing -o res main-stylesheet.xsl index.xml Parsing stylesheet main-stylesheet.xsl took 0 ms Parsing document index.xml took 0 ms Running stylesheet and saving result took 2 ms paphio:~/tmp/544906 -> fixed in revision 1485. To work around the problem in existing libxslt, just clean up your dw-common-5.10.xsl, you have exclude-result-prefixes="xsl fo" on the top stylesheet element, you're not redefining the namespaces in the XSL so there is absolutely no reason to carry those extra declarations. I removed them from dw-common-5.10.xsl and reran with the old xsltproc paphio:~/tmp/544906 -> /usr/bin/xsltproc --timing -o res2 main-stylesheet.xsl index.xml Parsing stylesheet main-stylesheet.xsl took 0 ms Parsing document index.xml took 0 ms Running stylesheet and saving result took 2 ms paphio:~/tmp/544906 -> diff res res2 paphio:~/tmp/544906 -> voila, back to 2ms and no difference ... No idea why someone though it was useful to redeclare them all over the place. Daniel
Thanks for the prompt fix and workaround. I can't explain the redeclarations, but I'll ask our developers to remove them. Ian.