GNOME Bugzilla – Bug 133289
xsl:key degenerate performance
Last modified: 2009-08-15 18:40:50 UTC
I have come across a performance issue with the implementation of xsl:key when dealing with a large number of nodes. To demonstrate this, if one processes documents like <KeyTest> <Cell><Value>0</Value></Cell> <Cell><Value>1</Value></Cell> <Cell><Value>2</Value></Cell> <!--Lots more similar lines follow--> </KeyTest> using <xsl:key name="cells-key" match="/KeyTest/Cell" use="Value"/> the processing time seems to grow exponentially/geometrically as the number of nodes grows. For 1000 nodes, the processing is virtually instantaneous, for 10000 nodes, it takes about 6 seconds, while for 100000 nodes processing time is in the order of 1 hour! I have been able to check what is happening in a profiler, and it turns that virtually all of this time is spent in the single call to xmlXPathNodeSetSort() which occurs as part of the compilation of the "match" expression, the idea being to put the resulting node set into document order. For example, for the 10000 node case, this sort operation takes all but about 0.1 second of the total processing time. This is particularly unfortunate in this case, since the nodes are already sorted, and the whole operation is unnecessary! The next question is what to do about this. Some possibilities might be: (a) Don't sort the result of the xsl:key "match" expression. From my reading of the XSLT standard, it does not suggest that the result returned by the key() function should be in document order (or any particular order). (b) Since implementing (a) might break some existing stylesheets which relied on the document order assumption, introduce some global setting to force sorting on or off. (c) Try and be a bit smarter in detecting when it is actually necessary to sort, since in almost all cases, the nodes will be in document order in any case. Along the lines of (c), I have added the following code to the start of xmlXPathNodeSetSort(). /* See if already sorted */ len = set->nodeNr; sorted = 1; for (i = 0; i < len - 1; ++i) { if (xmlXPathCmpNodes(set->nodeTab[i], set->nodeTab[i + 1]) == -1) { sorted = 0; break; } } if (sorted) return; In other words, you just compare each node with the next one in the list, and don't bother sorting if they are already in ascending order. In my test case (and in many other situations too, I imagine), this works especially quickly because the node comparison routine (xmlXPathCmpNodes) is optimised for sibling nodes (using the ->next and ->previous pointers). Anyway, the performance improvement by just adding this is staggering!
What version of libxslt are you using, is it really 1.1.2, because in recent versions nodes in the input are numbered so such sorting should be just fine ? Can you provide full stylesheets and a limited input example ? Daniel
Created attachment 24908 [details] Test files to demonstrate key behaviour.
I have now compiled up libxml2 and libxslt using the latest sources from the CVS, and the behaviour is definitely still there. I have attached a zip file containing a small Python script I use to generate the input XML documents. You would run it as python xmlcreate.py 10000 to create keytest10000.xml, for example. Then transform the file with xsltproc --timing keytest.xsl keytest10000.xml You should find that the performance deteriorates spectacularly as you increase the number of nodes. On my machine, the transform is instantaneous for 1000 nodes, takes 7 seconds for 10000, and 18 minutes for 100000 nodes! However, when I modify xpath.c along the lines of my earlier suggestion (to test whether the nodes are already sorted), the transform proceeds quickly. Incidentally, when tracing through this stuff using a debugger, I noticed the test for "numbered" nodes, but they did not in fact appear to be numbered - the comparison function always ends up iterating through the "next" pointers to find the relative ordering of the sibling nodes.
libxslt sorts all document nodes now by default, I assume this closed the bug, Daniel
Daniel, I am afraid that this does not resolve the issue. Once again, I have tested this with the latest sources from the CVS. The problem is that, although the document nodes are sorted, the implementation of xsltInitCtxtKey() (in keys.c) ends up calling xmlXPathNodeSetSort() (in xpath.c) on the entire document node set. This happens via the call to xmlXPathCompiledEval() (at line 564 of keys.c). This attempt to "sort" the (already sorted) node set is extremely expensive. To take the example of my 10,000 node test document, the Shell Sort implementation in xmlXPathNodeSetSort() results in 120,000 calls to xmlXPathCmpNodes(). In turn, this results in the following loop (at lines 1641 to 1643 if xpath.c) for (cur = node1->next;cur != NULL;cur = cur->next) if (cur == node2) return(1); being executed over 133 million times! Over 99 per cent of program execution time is spent in this loop. As far as I can see, there are only two ways to fix this. (1) Stop the evaluation of the xsl::key "match" expression from attempting to sort the results. I don't understand the code well enough to see if this is feasible or not. It appears that the XPATH_OP_SORT token gets routinely added to all of these XPath expressions. (2) Put a check inside of xmlXPathCmpNodes() to test whether the node set is already sorted (along the lines of my earlier posting). This obviously has a minor impact on all XPath expressions, but it certainly gets rid of the degenerate behaviour.
Well, all of the above was generally true. The problems were within XPATH_OP_SORT, and libxslt generally sorts all documents by default. However, there was one instance in which the document was not sorted, and that was for the main input doc :-\. The routine which accomplishes this optimisation is xmlXPathOrderDocElems, and it was (correctly) called for all new documents opened by the library. However, when the user supplied a "pre-parsed" document (which is the normal usage for the library), that document was not given to xmlXPathOrderDocElems, and that's where the problem arose. I fixed that, and on my system got the following results: bill@bbrack bug133289 $ time ./xsltproc keytest.xsl keytest10000.xml real 0m0.278s user 0m0.280s sys 0m0.000s bill@bbrack bug133289 $ python xmlcreate.py 100000 bill@bbrack bug133289 $ time ./xsltproc keytest.xsl keytest100000.xml real 0m2.961s user 0m2.870s sys 0m0.090s Thanks for the report, and for your patience in waiting for a solution. The corrected source (transform.c) is in CVS. Bill
Well done, I think you have nailed it! My test gives 0.109s to apply the stylesheet to the 10,000 element document, and 1.125s for the 100,000 element case. In other words, the time taken is now roughly linear, which is what it should be.
This should be fixed in release libxslt-1.1.9. thanks, Daniel