GNOME Bugzilla – Bug 165547
Performance issue when matching text()|*
Last modified: 2021-07-05 10:59:34 UTC
Please describe the problem: We encountered a strange performance problem related to the cost of building large nodesets. In particular, it seems that the cost of building a nodeset is not linear in the size. Steps to reproduce: To demonstrate this we created three XML files: <?xml version="1.0" ?> <topnode> <inner-node/> <inner-node/> ... </topnode> The first had 1,000 of the inner nodes (and the corresponding newlines), the second 3,000 and the last 10,000. In the tests, we use the following stylesheet: <?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0" > <xsl:template match="topnode"> <xsl:call-template name="nodes-only"/> <xsl:call-template name="text-only"/> <xsl:call-template name="nodes-or-text"/> </xsl:template> <xsl:template name="nodes-only"> <xsl:value-of select="count(*)"/> <xsl:value-of select="' '"/> </xsl:template> <xsl:template name="text-only"> <xsl:value-of select="count(text())"/> <xsl:value-of select="' '"/> </xsl:template> <xsl:template name="nodes-or-text"> <xsl:value-of select="count(*|text())"/> <xsl:value-of select="' '"/> </xsl:template> </xsl:stylesheet> >From the profiling using version 1.1.12, we see the following times: Template Name 1000 nodes 3000 nodes 10,000 nodes nodes-or-text 4890 77834 1870993 text-only 1274 12357 213519 nodes-only 27 257 1526 which corresponds to the following speed ratios Template Name 1000 nodes 3000 nodes 10,000 nodes nodes-or-text 1 15.9 times 382.6 times text-only 1 9.7 times 167.6 times nodes-only 1 9.5 times 56.5 times This is definitely not scaling linearly. And, a second interesting point, is that count *|text() is 5 times slower than counting * and then counting text() separately. Regards, Jerome Pesenti Actual results: Expected results: Does this happen every time? Other information:
"This is definitely not scaling linearly. And, a second interesting point, is that count *|text() is 5 times slower than counting * and then counting text() separately." That's normal ! When you evaluate * | text() you must return a sorted list, and sorting is not linear. There are a lot of optimizations possibles, but the basic default processing won't be linear and that's normal ! Discussions about optimizations should go on the list, not in a bugzilla I will be the only one to look at. So I'm closing it, followup on the list if interested Daniel
Daniel, the count(*|text()) is just the second point. The main point is that even doing count(*) or count(text()) is NOT linear, it's even worse than quadratic!!! Looks like a bug to me so I am reopening the bug.
I've tried to optimize the function xmlXPathCmpNodes() which is called by xmlXPathNodeSetSort() (I'll commit the code later today). I'll describe here the issues/thoughts/solutions I came accross. The analysis of the expression "count(/foo/text())" revealed the simplified XPath callstack shown below. Not that we have 2 XPATH_OP_SORT here. I used a similar scenario as the one already reported (about 3000 elements and 3001 text-nodes): xmlXPathRunEval --> xmlXPathCompOpEval() XPATH_OP_SORT --> xmlXPathCompOpEval() XPATH_OP_FUNCTION --> xmlXPathCompOpEval() XPATH_OP_ARG --> xmlXPathCompOpEval() XPATH_OP_SORT --> xmlXPathCompOpEval() XPATH_OP_COLLECT --> xmlXPathCompOpEval() XPATH_OP_NODE --> xmlXPathNewNodeSet() --> valuePush() --> xmlXPathNodeCollectAndTest(): Here 3001 text-nodes are put into a node-set --> xmlXPathNodeSetSort() --> xmlXPathCmpNodes() --> xmlXPathCountFunction The sorting is responsible for the slowdown of the evaluation of your scenario. The reasons are: 1) Text-nodes (and all other non-element nodes) are not indexed. 2) All the text-nodes (3001) in this scenario are siblings. Together with 1) this results in a worst-case scenario for the comparison algorithm called by the sorting algorithm. This means: that the node-comparison algorithm cannot use an index, so it has to compare many nodes (in the worst case 3000) in xmlXPathCmpNodes() for every call of this function. I don't know exactly how the sorting algorithm works, but if we frankly assume that xmlXPathCmpNodes() is called about 15000 times and has to compare about 15000 nodes, then this sums up and takes much time. On the other hand, for element-nodes, indexes can be used; such a comparison will always be much faster. The worst-case scenario, as described above for text-nodes, does not apply to sorting of nodes with indexes. An other question is why a sorting is performed at all for an expression like "count(/foo/text())". First we just want to count the nodes; so an internal optimization of the expression could recognize this fact and would not sort. Further an expression like "/foo/text()" also need not be sorted, since the returned nodes are put in document order into the resulting node-set. So there's also room for optimization at compilation time of XPath expressions.
To solve this wort-case scenario a similar technique as for attributes was applied for text-nodes, cdata-section-nodes, comment-nodes and processing-instruction-nodes: we'll resolve to the nearest element of such nodes in order to use the indexes of those elements for faster comparison. Note that the fix won't have an effect for non-indexed trees, so always call xmlXPathOrderDocElems() (xpath.c) if you have big trees with many siblings. Libxslt automatically creates indexes for input trees. Current results ---------------- - run in a VMWare with win - run with the refactored code (defined XSLT_REFACTORED); but this should not have impact wrt to speed in this scenario Basic structure of the scenario: <foo> <bar/> </foo> 1000 "bar" elements ------------------- xsltproc --profile bug165547.xsl big-list-1000.xml number match name mode Calls Tot 100us Avg 0 nodes-or-text 1 1130 1130 1 nodes-only 1 73 73 2 text-only 1 73 73 3 foo 1 2 2 Total 4 1278 3000 "bar" elements ------------------- xsltproc --profile bug165547.xsl big-list-3000.xml number match name mode Calls Tot 100us Avg 0 nodes-or-text 1 9826 9826 1 text-only 1 219 219 2 nodes-only 1 197 197 3 foo 1 3 3 Total 4 10245 10000 "bar" elements ------------------- xsltproc --profile bug165547.xsl big-list-1 0000.xml Max inodes: 4 Max LREs : 2 number match name mode Calls Tot 100us Avg 0 nodes-or-text 1 593353 593353 1 nodes-only 1 1831 1831 2 text-only 1 1723 1723 3 foo 1 4 4 Total 4 596911
The optimization is in CVS now, xpath.c, revisiion 1.304.
Eliminated sorting of node-sets if used for the XPath count() function. Committed to CVS, xpath.c, revision 1.306 The union-expression "count(*|text())" still takes longer. This is due to the fact that those 2 result node-sets need to be merged, so (for 10000 "bar" elements) 10000 text-nodes need to be pointer-compared with 10000 element-nodes, thus we get 10000000 comparisons and additional node-type checks. One (theoretical) way I see to optimize this is to rewrite the XPath module to use a tree-traversal/node-testing/predicate-applying mechanism which can evaluate each of the unioned expressions against the same node, so we wouldn't have to merge. But I don't think we want to do that ;-) An other way to enhance this is to expand the xmlXPathObj to hold more information. E.g. a field like nodeSetType would be used to identify if a node-list consist of nodes of a specific type or of mixed types. For our scenario this would mean that we wouldn't need to take care of duplicates during the merge, since one node-set consists of element-nodes and the other of text-nodes. (By the way, I would like the xmlXPathObj to hold a flag indicating if a node-set was already sorted) Current results ---------------- - still run in a VMWare with win, memory-debug dll - still run with the refactored code (defined XSLT_REFACTORED); but this should not have impact wrt to speed in this scenario 1000 "bar" elements ------------------- xsltproc --profile bug165547.xsl big-list-1000.xml number match name mode Calls Tot 100us Avg 0 nodes-or-text 1 754 754 1 nodes-only 1 43 43 2 text-only 1 33 33 3 foo 1 3 3 Total 4 833 3000 "bar" elements ------------------- xsltproc --profile bug165547.xsl big-list-3000.xml number match name mode Calls Tot 100us Avg 0 nodes-or-text 1 8513 8513 1 nodes-only 1 75 75 2 text-only 1 65 65 3 foo 1 2 2 Total 4 8655 10000 "bar" elements -------------------- xsltproc --profile bug165547.xsl big-list-10000.xml number match name mode Calls Tot 100us Avg 0 nodes-or-text 1 528952 528952 1 nodes-only 1 279 279 2 text-only 1 250 250 3 foo 1 5 5 Total 4 529486
Is there still some work to do with regard to this bug/rfe?
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxslt/-/issues/ Thank you for your understanding and your help.