Bug 133289 – xsl:key degenerate performance

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 133289 - xsl:key degenerate performance


Summary:	xsl:key degenerate performance


Status:	VERIFIED FIXED

Product:	libxslt
Classification:	Platform
Component:	general
Version:	1.1.2
Hardware:	Other Windows

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Daniel Veillard
QA Contact:	libxml QA maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2004-02-03 09:44 UTC by Chris Trengove
Modified:	2009-08-15 18:40 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Test files to demonstrate key behaviour. (566 bytes, application/octet-stream) 2004-02-28 19:42 UTC, Chris Trengove	Details

Description Chris Trengove 2004-02-03 09:44:38 UTC

I have come across a performance issue with the implementation of xsl:key 
when dealing with a large number of nodes. To demonstrate this, if one 
processes documents like

<KeyTest>
<Cell><Value>0</Value></Cell>
<Cell><Value>1</Value></Cell>
<Cell><Value>2</Value></Cell>
<!--Lots more similar lines follow-->
</KeyTest>

using 

<xsl:key name="cells-key" match="/KeyTest/Cell" use="Value"/>

the processing time seems to grow exponentially/geometrically as the 
number of nodes grows. For 1000 nodes, the processing is virtually 
instantaneous, for 10000 nodes, it takes about 6 seconds, while for 
100000 nodes processing time is in the order of 1 hour!

I have been able to check what is happening in a profiler, and it turns 
that virtually all of this time is spent in the single call to 
xmlXPathNodeSetSort() which occurs as part of the compilation of 
the "match" expression, the idea being to put the resulting node set into 
document order. For example, for the 10000 node case, this sort operation 
takes all but about 0.1 second of the total processing time. This is 
particularly unfortunate in this case, since the nodes are already 
sorted, and the whole operation is unnecessary!

The next question is what to do about this. Some possibilities might be:

(a) Don't sort the result of the xsl:key "match" expression. From my 
reading of the XSLT standard, it does not suggest that the result 
returned by the key() function should be in document order (or any 
particular order).

(b) Since implementing (a) might break some existing stylesheets which 
relied on the document order assumption, introduce some global setting to 
force sorting on or off.

(c) Try and be a bit smarter in detecting when it is actually necessary 
to sort, since in almost all cases, the nodes will be in document order 
in any case.

Along the lines of (c), I have added the following code to the start of 
xmlXPathNodeSetSort().

    /* See if already sorted */
    len = set->nodeNr;
    sorted = 1;
    for (i = 0; i < len - 1; ++i) {
	if (xmlXPathCmpNodes(set->nodeTab[i],
	                     set->nodeTab[i + 1]) == -1) {
	    sorted = 0;
	    break;
	}
    }
    if (sorted)
	return;

In other words, you just compare each node with the next one in the list, 
and don't bother sorting if they are already in ascending order. In my 
test case (and in many other situations too, I imagine), this works 
especially quickly because the node comparison routine (xmlXPathCmpNodes) 
is optimised for sibling nodes (using the ->next and ->previous 
pointers). Anyway, the performance improvement by just adding this is 
staggering!

Comment 1 Daniel Veillard 2004-02-27 15:17:01 UTC

What version of libxslt are you using, is it really 1.1.2, because
in recent versions nodes in the input are numbered so such sorting
should be just fine ?
Can you provide full stylesheets and a limited input example ?

Daniel

Comment 2 Chris Trengove 2004-02-28 19:42:43 UTC

Created attachment 24908 [details]
Test files to demonstrate key behaviour.

Comment 3 Chris Trengove 2004-02-28 19:57:23 UTC

I have now compiled up libxml2 and libxslt using the latest sources
from the CVS, and the behaviour is definitely still there. I have
attached a zip file containing a small Python script I use to generate
the input XML documents. You would run it as

python xmlcreate.py 10000

to create keytest10000.xml, for example. Then transform the file with

xsltproc --timing keytest.xsl keytest10000.xml

You should find that the performance deteriorates spectacularly as you
increase the number of nodes. On my machine, the transform is
instantaneous for 1000 nodes, takes 7 seconds for 10000, and 18
minutes for 100000 nodes! However, when I modify xpath.c along the
lines of my earlier suggestion (to test whether the nodes are already
sorted), the transform proceeds quickly.

Incidentally, when tracing through this stuff using a debugger, I
noticed the test for "numbered" nodes, but they did not in fact appear
to be numbered - the comparison function always ends up iterating
through the "next" pointers to find the relative ordering of the
sibling nodes.

Comment 4 Daniel Veillard 2004-07-06 09:16:19 UTC

libxslt sorts all document nodes now by default,
I assume this closed the bug,

Daniel

Comment 5 Chris Trengove 2004-07-07 03:06:43 UTC

Daniel,

I am afraid that this does not resolve the issue. Once again, I have tested this
with the latest sources from the CVS.

The problem is that, although the document nodes are sorted, the implementation
of  xsltInitCtxtKey() (in keys.c) ends up calling xmlXPathNodeSetSort() (in
xpath.c) on the entire document node set. This happens via the call to
xmlXPathCompiledEval() (at line 564 of keys.c).

This attempt to "sort" the (already sorted) node set is extremely expensive. To
take the example of my 10,000 node test document, the Shell Sort implementation
in xmlXPathNodeSetSort() results in 120,000 calls to xmlXPathCmpNodes(). In
turn, this results in the following loop (at lines 1641 to 1643 if xpath.c)

    for (cur = node1->next;cur != NULL;cur = cur->next)
        if (cur == node2)
            return(1);

being executed over 133 million times! Over 99 per cent of program execution
time is spent in this loop.

As far as I can see, there are only two ways to fix this.

(1) Stop the evaluation of the xsl::key "match" expression from attempting to
sort the results. I don't understand the code well enough to see if this is
feasible or not. It appears that the XPATH_OP_SORT token gets routinely added to
all of these XPath expressions.

(2) Put a check inside of xmlXPathCmpNodes() to test whether the node set is
already sorted (along the lines of my earlier posting). This obviously has a
minor impact on all XPath expressions, but it certainly gets rid of the
degenerate behaviour.

Comment 6 William M. Brack 2004-08-16 19:34:37 UTC

Well, all of the above was generally true.  The problems were within 
XPATH_OP_SORT, and libxslt generally sorts all documents by default.  However, 
there was one instance in which the document was not sorted, and that was for 
the main input doc :-\.  The routine which accomplishes this optimisation is 
xmlXPathOrderDocElems, and it was (correctly) called for all new documents 
opened by the library.  However, when the user supplied a "pre-parsed" document 
(which is the normal usage for the library), that document was not given to 
xmlXPathOrderDocElems, and that's where the problem arose.

I fixed that, and on my system got the following results:

bill@bbrack bug133289 $ time ./xsltproc keytest.xsl keytest10000.xml

real    0m0.278s
user    0m0.280s
sys     0m0.000s
bill@bbrack bug133289 $ python xmlcreate.py 100000
bill@bbrack bug133289 $ time ./xsltproc keytest.xsl keytest100000.xml

real    0m2.961s
user    0m2.870s
sys     0m0.090s

Thanks for the report, and for your patience in waiting for a solution.  The 
corrected source (transform.c) is in CVS.

Bill

Comment 7 Chris Trengove 2004-08-18 01:36:24 UTC

Well done, I think you have nailed it! My test gives 0.109s to apply the
stylesheet to the 10,000 element document, and 1.125s for the 100,000 element
case. In other words, the time taken is now roughly linear, which is what it
should be.

Comment 8 Daniel Veillard 2004-08-22 20:50:47 UTC

 This should be fixed in release libxslt-1.1.9.
                                                                                
   thanks,
                                                                                
Daniel