GNOME Bugzilla – Bug 354980
out of memory crash on large-ish input
Last modified: 2009-08-23 14:24:11 UTC
Steps to reproduce: 1. xsltproc -o out.xml style.xsl in.xml Stack trace: Frame Function Args 0022C128 77E6BA42 (000006AC, 0000EA60, 000000A4, 0022C170) 0022C248 61096A1C (00000000, 77E6BAFD, 77E6BA42, 000000A4) 0022C338 6109459B (00000000, 003B0023, 00230000, 00000000) 0022C398 61094A7B (0022C3B0, 00000000, 00000094, 610A00AA) 0022C458 61094C32 (00001238, 00000006, 0022C488, 61094E32) 0022C468 61094C5C (00000006, 600301DC, 0022C598, 61096ADC) 0022C488 61094E32 (00000000, 61105B14, 00000001, 00000001) 483873 [sig] xsltproc 4664 C:\unix\cygwin\bin\xsltproc.exe: *** fatal error - C:\unix\cygwin\bin\xsltproc.exe: *** called with threadlist_ix -1 Other information: I'm using xsltproc compiled for Cygwin. Using libxml 20622, libxslt 10115-CVS1027 and libexslt 812-CVS1027 xsltproc was compiled against libxml 20622, libxslt 10115 and libexslt 812 libxslt 10115 was compiled against libxml 20622 libexslt 812 was compiled against libxml 20622 Running under a Cygwin 1.5.21 bash produces parser error : out of memory error. Running the same executable under a regular cmd.exe shell produces the stack trace above. The input is a file like this, where the page element is repeated a lot. The size of the input is 450MB. <root> <page> <p>This is a plain english sentence.</p> <images> <img attr1="foo" attr2="bar" attr3="baz" /> <img attr1="asdf" attr2="jklj" attr3="sdf" /> </images> </page> ... </root> The stylesheet is this: <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:strip-space elements="*"/> <xsl:param name="Term" select="'barf'" /> <xsl:template match="root"> <root> <xsl:apply-templates/> </root> </xsl:template> <xsl:template match="page"> <xsl:choose> <xsl:when test="child::images/image[@attr1 = $Term]"> <page> <xsl:element name="p"> <xsl:value-of select="child::p" /> </xsl:element> <xsl:element name="images"> <xsl:for-each select="child::images/img[@attr1 = $Term]"> <xsl:element name="img"> <xsl:copy> <xsl:copy-of select="@*" /> </xsl:copy> </xsl:element> </xsl:for-each> </xsl:element> </doc> </xsl:when> </xsl:choose> </xsl:template> </xsl:stylesheet> The same input and stylesheet get processed just fine by xalan 2.7.0, albeit not quickly, with a 1GB heap, i.e., java -Xmx1024m. (A smaller heap may also work, I haven't tried.) xsltproc allocates 1.7GB RAM and then crashes.
Hi Jack, I tried to reproduce your problem with a linux version of xsltproc from Ubuntu and I've got this error message: error : out of memory error input.xml:1306388: error: (null)<img attr1="asdf" attr2="jklj" attr3="sdf" /> ^ parser error : out of memory error input.xml:1306389: parser error : out of memory error </images> ^ unable to parse input.xml This is my understanding of the problem: in order to apply XSLT template, we have to build the complete tree of the input XML. The internal representation of this tree in libxml2/libxslt is not as efficient as in xalan, therefore xsltproc requires (much) more memory than xalan. Now I am not sure what would you like to "fix". Is it the problem that out-of-memory is not reported by cygwin executable run from cmd (it crashes instead)? Is it the memory inefficiency?
> Now I am not sure what would you like to "fix". Is it the problem that > out-of-memory is not reported by cygwin executable run from cmd (it crashes > instead)? Is it the memory inefficiency? Yes. :) Both of those seem like problems to me. Lets leave this bug to focus on the memory inefficiency. I'm not sure if the Cygwin issue is still relevant given the age of this bug.
(In reply to comment #2) > Lets leave this bug to focus on the memory inefficiency. I would suggest to change the severity of this bug to ENHANCEMENT in such case.
(In reply to comment #3) > (In reply to comment #2) > > Lets leave this bug to focus on the memory inefficiency. > > I would suggest to change the severity of this bug to ENHANCEMENT in such case. I don't see inefficient usage of memory as just an enhancement, but more like leaking.
I'm setting to normal to let the maintainer decide. I don't think it's a real issue because you have such a huge input. Working with XML is known to be demanding on memory. It might even be closed as WONTFIX because it's pretty hard to implement your own memory management sneaking around the OS. You might try to get a Valgrind log to see whether it actually leaks memory (I don't believe that btw) I'd leave it to Daniel to decide whether this is an issue. Volunteers are of course welcome to try to make it work anyway.
Yup sorry I'm not changing libxml2 tree format because you don't want to buy a couple of stick of ram. And "I don't see inefficient usage of memory as just an enhancement, but more like leaking" is just the last proof why I should not try to help, sorry ! Let's face it using XML to encode verbosely completely repetitive data is just bad design and using XSLT to just extract some filed on it is plain stupid. You can probably so the same in less code with libxml2 python binding using the Reader, and then that would use less than a megabyte, a few hundred K if you coded it directly in C. Now complaining after years because you want to shave some memory when doing that insanity, and that not focusing on it is just "leaking" would caracterize a good troll at best. NOTABUG ! Daniel
Oh, I didn't see the 450MB part in the report. Makes sense, though I don't consider myself a troll with 30 bugzilla points. :)
Makes sense to me too, but really... a troll? My apologies, and I'll never file another bug against your software.
Trolling comment was about comparing to "leaking" and I stand by that, heh ! I take bugs, I fix a number bunch, I don't care about bugzilla points (actually they seems to have disapeared), if you don't want to use my code or report bugs about it fine, but the fact that I'm offering code and help and fix should never being used as a mean to shut me up. It's the person using my code who complains, because I expressed the fact that the report was wrong and the way to use the software was inappropriate. I know people get used to get everything for free and complain when it's still not provided exactly the way they want, but look, where is my reward here for providing the stuff in the first place, expertise and answering those bugs when I have time ? Why should I continue doing so ? Beware by biting the hand which feeds you you it may just stop you know ! Daniel