GNOME Bugzilla – Bug 620102
xsltApplyStripSpaces() modifies the *input* document
Last modified: 2021-07-05 11:00:59 UTC
This bug was found while investigating a crash that a user reported in lxml.etree during XSLT processing. https://bugs.launchpad.net/lxml/+bug/583249 The crashing stylesheet (appended to the bug above) uses <xsl:strip-space>, and valgrind reveals that a node was freed during processing that is still in use. I copied the section that frees that node from transform.c, it's in line 5743: ------------------------------ /** * xsltApplyStripSpaces: * @ctxt: a XSLT process context * @node: the root of the XML tree * * Strip the unwanted ignorable spaces from the input tree */ void xsltApplyStripSpaces(xsltTransformContextPtr ctxt, xmlNodePtr node) { xmlNodePtr current; #ifdef WITH_XSLT_DEBUG_PROCESS int nb = 0; #endif current = node; while (current != NULL) { /* * Cleanup children empty nodes if asked for */ if ((IS_XSLT_REAL_NODE(current)) && (current->children != NULL) && (xsltFindElemSpaceHandling(ctxt, current))) { xmlNodePtr delete = NULL, cur = current->children; while (cur != NULL) { if (IS_BLANK_NODE(cur)) delete = cur; cur = cur->next; if (delete != NULL) { xmlUnlinkNode(delete); xmlFreeNode(delete); /**** <== *****/ delete = NULL; #ifdef WITH_XSLT_DEBUG_PROCESS nb++; #endif } } } ------------------------------ This code seems to modify the input document in place. I can't see any reason why something like this should *ever* happen during XSLT processing, and I don't see any way to work around this in lxml.etree. If the node is in use, e.g. because it was found by a previously executed XPath selection, there is no way to prevent user code from crashing.
This bug (still unfixed) seems to be caused by lxml, not libxslt. Reproducing this crash using any other libxslt binding (Python, Perl, JavaScript) wasn't possible.
Well, the crash might be specific to lxml, because it makes certain assumptions about documents. That doesn't means it is *caused* by lxml. The cause is that libxslt modifies the *input* document in place when applying an XSLT. That is definitely a bug in libxslt, whether it results in a crash or not. I'm sure there are ways to make other wrappers crash, too. Maybe it's more difficult to do than for lxml, but knowing the described behaviour of libxslt, I'm sure it's possible to write code also in Perl or JavaScript that crashes because of this. At least the input modification should be visible in all wrappers. See line 6008 in current transform.c, that's where xsltApplyStripSpaces() is used to modify the input document before running the actual transform.
> I'm sure it's possible to write code also in Perl or JavaScript that crashes because of this. Please do it. Debugging anything related to lxml is a mess...
Do we actually agree that the current behaviour is wrong, or do you consider it acceptable behaviour that an XSLT transformation modifies the input document in place?
Specifically, do you consider the following behaviour correct? In [1]: import libxml2 as lx, libxslt as xsl In [2]: d = b"""<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:strip-space elements="*" /><xsl:template match="/"><xsl:copy-of select="*" /></xsl:template></xsl:stylesheet>""" In [3]: xs = lx.parseMemory(d, len(d)) In [4]: t = xsl.parseStylesheetDoc(xs) In [5]: xml = lx.parseFile("tests/XSLTMark/trend.xml") In [6]: c = xml.get_children() In [7]: text = c.get_children() In [8]: text.get_type(), text.get_name() Out[8]: ('text', 'text') In [9]: out = t.applyStylesheet(xml, {}) In [10]: text.get_type(), text.get_name() Out[10]: ('element', 'val') This was executed in the project root directory of libxslt, using the libxml2/libxslt Python bindings.
I agree that this behavior is highly surprising. But there's no crashes...
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxslt/-/issues/ Thank you for your understanding and your help.