After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 544906 - Very slow (4 min) transforming one element XML file with xsltproc
Very slow (4 min) transforming one element XML file with xsltproc
Status: RESOLVED FIXED
Product: libxslt
Classification: Platform
Component: general
1.1.x
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2008-07-26 22:34 UTC by Ian Shields
Modified: 2008-07-27 16:13 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Ian Shields 2008-07-26 22:34:59 UTC
This bug may possibly be related to the reports in bugs 165107 and 311857, although the only real apparent commonality is the slowness.

The problem occurs in both 32-bit and 64-bit Linux plus Windows (at least CP) systems. The information herein was obtained on a 64-bit Athlon 3500+ running Fedora 9.

The bug was first observed using XML Copy Editor (http://xml-copy-editor.sourceforge.net/) in conjunction witht eh IBM developerWorks XML author templates (download using Download link in http://www.ibm.com/developerworks/library/i-dwauthors/), where it took over 4 minutes to transform
the basic article template (obtain this by running the dw-article.vbs or dw-article.sh in the above download).

I have stripped down the set of stylesheets from the author package and run xsltproc with various options. I have also stripped down the XML file to the point that it contains only a single element.

<?xml-stylesheet type="text/xsl" href="../xsl/5.10/dw-document-html-5.10.xsl"?>
<dw-document xmlns:dw="http://www.ibm.com/developerWorks/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../schema/5.10/dw-document-5.10.xsd">
</dw-document>

The stripped down stylesheets now only emit a single comment as processing for the top-level element in the XML file.

After all this reduction, it appears that xsltproc parses the stylesheet almost instantly, then spends over 4 minutes doing something (compiling the stylesheet perhaps?), then processes the input XML, again nearly instantaneously.

I wrote a basic script to put timer ticks on the console every 30 seconds while the xsltproc is running to illustrate where the problem occurs, and to give comparison time with IBM Java and Xalan.

#!/bin/bash
echo ""
echo "$(date) Xalan Java transform start"
echo "$(date) Running time /opt/java2-i386-50/bin/java org.apache.xalan.xslt.Process -IN index.xml -XSL main-stylesheet.xsl -OUT index.html"
time /opt/ibm/java2-i386-50/bin/java org.apache.xalan.xslt.Process -IN index.xml -XSL main-stylesheet.xsl -OUT index.html
echo "$(date) Xalan Java transform end"
echo "$(date) background timer start"
for n in $(seq 1 10); do echo "$(date) bg tick $n";sleep 30;done&
echo "$(date) xsltproc transform start"
echo "$(date) Running time xsltproc --timing --profile main-stylesheet.xsl index.xml "
time xsltproc --timing --profile main-stylesheet.xsl index.xml 
echo "$(date) xsltproc transform end"

Output from thsi si as follows:
Sat Jul 26 18:08:30 EDT 2008 Xalan Java transform start
Sat Jul 26 18:08:30 EDT 2008 Running time /opt/java2-i386-50/bin/java org.apache.xalan.xslt.Process -IN index.xml -XSL main-stylesheet.xsl -OUT index.html

real	0m2.703s
user	0m2.391s
sys	0m0.277s
Sat Jul 26 18:08:33 EDT 2008 Xalan Java transform end
Sat Jul 26 18:08:33 EDT 2008 background timer start
Sat Jul 26 18:08:33 EDT 2008 xsltproc transform start
Sat Jul 26 18:08:33 EDT 2008 bg tick 1
Sat Jul 26 18:08:33 EDT 2008 Running time xsltproc --timing --profile main-stylesheet.xsl index.xml 
Parsing stylesheet main-stylesheet.xsl took 0 ms
Sat Jul 26 18:09:03 EDT 2008 bg tick 2
Sat Jul 26 18:09:33 EDT 2008 bg tick 3
Sat Jul 26 18:10:03 EDT 2008 bg tick 4
Sat Jul 26 18:10:33 EDT 2008 bg tick 5
Sat Jul 26 18:11:03 EDT 2008 bg tick 6
Sat Jul 26 18:11:33 EDT 2008 bg tick 7
Sat Jul 26 18:12:03 EDT 2008 bg tick 8
Sat Jul 26 18:12:33 EDT 2008 bg tick 9
Sat Jul 26 18:13:03 EDT 2008 bg tick 10
Parsing document index.xml took 0 ms
number               match                name      mode  Calls Tot 100us Avg

    0                    /                                    1      2      2

                         Total                                1      2
Applying stylesheet took 4 ms
<!--Do nothing much-->
Saving result took 0 ms

real	4m36.223s
user	4m22.007s
sys	0m0.521s
Sat Jul 26 18:13:09 EDT 2008 xsltproc transform end

You can see the 4.5 minute delay between the xsltproc output times for parsing stylesheet and parsing document. Note that this process is using about 98% CPU during all this time.

The stripped-down XML files and XSL files are at http://www.cybershields.com/testcase1.zip

Ian Shields.
Comment 1 Daniel Veillard 2008-07-27 12:01:52 UTC
There is a loop when compiling the stylesheet about excluding result
prefix xsl and fo . Just run with xsltproc -v to trace the loop (I
gave that option on the oo.o mailing-list BTW).
So clearly there is a bug there happening at stylesheet compilation time.
After a bit of debugging, it's just that in the stylesheets there is 
lots of redundants xsl:exclude-result-prefixes="xsl fo" , not a mistake
in itself but which generated a quadratic behaviour in libxslt at stylesheet
compile time.
Adding just a little check to avoid redefining duplicated excluded namespaces
fixes this:

paphio:~/tmp/544906 ->  ~veillard/XSLT/xsltproc/xsltproc --timing -o res  main-stylesheet.xsl index.xml
Parsing stylesheet main-stylesheet.xsl took 0 ms
Parsing document index.xml took 0 ms
Running stylesheet and saving result took 2 ms
paphio:~/tmp/544906 -> 

fixed in revision 1485.

To work around the problem in existing libxslt, just clean up your 
dw-common-5.10.xsl, you have exclude-result-prefixes="xsl fo" on the
top stylesheet element, you're not redefining the namespaces in the XSL
so there is absolutely no reason to carry those extra declarations. I
removed them from dw-common-5.10.xsl and reran with the old xsltproc

paphio:~/tmp/544906 -> /usr/bin/xsltproc --timing -o res2  main-stylesheet.xsl index.xml
Parsing stylesheet main-stylesheet.xsl took 0 ms
Parsing document index.xml took 0 ms
Running stylesheet and saving result took 2 ms
paphio:~/tmp/544906 -> diff res res2
paphio:~/tmp/544906 -> 

  voila, back to 2ms and no difference ...
No idea why someone though it was useful to redeclare them all over the place.

Daniel
Comment 2 Ian Shields 2008-07-27 16:13:01 UTC
Thanks for the prompt fix and workaround. I can't explain the redeclarations, but I'll ask our developers to remove them.

Ian.