After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 354980 - out of memory crash on large-ish input
out of memory crash on large-ish input
Status: RESOLVED NOTABUG
Product: libxslt
Classification: Platform
Component: general
1.1.15
Other All
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2006-09-08 15:07 UTC by Jack Tanner
Modified: 2009-08-23 14:24 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Jack Tanner 2006-09-08 15:07:07 UTC
Steps to reproduce:
1. xsltproc -o out.xml style.xsl in.xml


Stack trace:
Frame     Function  Args
0022C128  77E6BA42  (000006AC, 0000EA60, 000000A4, 0022C170)
0022C248  61096A1C  (00000000, 77E6BAFD, 77E6BA42, 000000A4)
0022C338  6109459B  (00000000, 003B0023, 00230000, 00000000)
0022C398  61094A7B  (0022C3B0, 00000000, 00000094, 610A00AA)
0022C458  61094C32  (00001238, 00000006, 0022C488, 61094E32)
0022C468  61094C5C  (00000006, 600301DC, 0022C598, 61096ADC)
0022C488  61094E32  (00000000, 61105B14, 00000001, 00000001)
 483873 [sig] xsltproc 4664 C:\unix\cygwin\bin\xsltproc.exe: *** fatal error - C:\unix\cygwin\bin\xsltproc.exe: *** called with threadlist_ix -1

Other information:
I'm using xsltproc compiled for Cygwin. 

Using libxml 20622, libxslt 10115-CVS1027 and libexslt 812-CVS1027
xsltproc was compiled against libxml 20622, libxslt 10115 and libexslt 812
libxslt 10115 was compiled against libxml 20622
libexslt 812 was compiled against libxml 20622

Running under a Cygwin 1.5.21 bash produces parser error : out of memory error. Running the same executable under a regular cmd.exe shell produces the stack trace above.

The input is a file like this, where the page element is repeated a lot. The size of the input is 450MB.
<root>
<page>
<p>This is a plain english sentence.</p>
<images>
<img attr1="foo" attr2="bar" attr3="baz" />
<img attr1="asdf" attr2="jklj" attr3="sdf" />
</images>
</page>
...
</root>

The stylesheet is this:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:param name="Term" select="'barf'" />

<xsl:template match="root">
  <root>
    <xsl:apply-templates/>
  </root>
</xsl:template>

<xsl:template match="page">
  <xsl:choose>
    <xsl:when test="child::images/image[@attr1 = $Term]">
      <page>
	<xsl:element name="p">
	  <xsl:value-of select="child::p" />
	</xsl:element>
	<xsl:element name="images">
	  <xsl:for-each select="child::images/img[@attr1 = $Term]">
	    <xsl:element name="img">
	      <xsl:copy>
		<xsl:copy-of select="@*" />
	      </xsl:copy>
	    </xsl:element>
	  </xsl:for-each>
	</xsl:element>
      </doc>
    </xsl:when>
  </xsl:choose>
</xsl:template>
</xsl:stylesheet>

The same input and stylesheet get processed just fine by xalan 2.7.0, albeit not quickly, with a 1GB heap, i.e., java -Xmx1024m. (A smaller heap may also work, I haven't tried.) xsltproc allocates 1.7GB RAM and then crashes.
Comment 1 Miroslav Bajtoš 2009-05-30 17:38:20 UTC
Hi Jack,
I tried to reproduce your problem with a linux version of xsltproc from Ubuntu and I've got this error message:

  error : out of memory error
  input.xml:1306388: error: (null)<img attr1="asdf" attr2="jklj" attr3="sdf" />
                                           ^
  parser error : out of memory error
  input.xml:1306389: parser error : out of memory error
  </images>
  ^
  unable to parse input.xml

This is my understanding of the problem: in order to apply XSLT template, we have to build the complete tree of the input XML. The internal representation of this tree in libxml2/libxslt is not as efficient as in xalan, therefore xsltproc requires (much) more memory than xalan.

Now I am not sure what would you like to "fix". Is it the problem that out-of-memory is not reported by cygwin executable run from cmd (it crashes instead)? Is it the memory inefficiency?
Comment 2 Jack Tanner 2009-06-01 15:21:39 UTC
> Now I am not sure what would you like to "fix". Is it the problem that
> out-of-memory is not reported by cygwin executable run from cmd (it crashes
> instead)? Is it the memory inefficiency?

Yes.

:)

Both of those seem like problems to me. Lets leave this bug to focus on the memory inefficiency. I'm not sure if the Cygwin issue is still relevant given the age of this bug.
Comment 3 Miroslav Bajtoš 2009-06-04 13:39:24 UTC
(In reply to comment #2)
> Lets leave this bug to focus on the memory inefficiency.

I would suggest to change the severity of this bug to ENHANCEMENT in such case.
Comment 4 André Klapper 2009-06-04 16:09:15 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Lets leave this bug to focus on the memory inefficiency.
> 
> I would suggest to change the severity of this bug to ENHANCEMENT in such case.

I don't see inefficient usage of memory as just an enhancement, but more like leaking.
Comment 5 Tobias Mueller 2009-06-04 16:45:36 UTC
I'm setting to normal to let the maintainer decide. I don't think it's a real
issue because you have such a huge input. Working with XML is known to be
demanding on memory. It might even be closed as WONTFIX because it's pretty
hard to implement your own memory management sneaking around the OS. You might try to get a Valgrind log to see whether it actually leaks memory (I don't believe that btw)
I'd leave it to Daniel to decide whether this is an issue. Volunteers are of course welcome to try to make it work anyway.
Comment 6 Daniel Veillard 2009-08-13 20:50:55 UTC
Yup sorry I'm not changing libxml2 tree format because you don't want to buy a couple of stick of ram. And 
  "I don't see inefficient usage of memory as just an enhancement,
   but more like leaking"

is just the last proof why I should not try to help, sorry !
Let's face it using XML to encode verbosely completely repetitive data
is just bad design and using XSLT to just extract some filed on it is plain
stupid. You can probably so the same in less code with libxml2 python binding
using the Reader, and then that would use less than a megabyte, a few hundred
K if you coded it directly in C.
Now complaining after years because you want to shave some memory when doing
that insanity, and that not focusing on it is just "leaking" would caracterize
a good troll at best.

  NOTABUG !

Daniel
Comment 7 André Klapper 2009-08-13 22:57:11 UTC
Oh, I didn't see the 450MB part in the report.
Makes sense, though I don't consider myself a troll with 30 bugzilla points. :)
Comment 8 Jack Tanner 2009-08-14 01:00:48 UTC
Makes sense to me too, but really... a troll? My apologies, and I'll never file another bug against your software.
Comment 9 Daniel Veillard 2009-08-23 14:24:11 UTC
Trolling comment was about comparing to "leaking" and I stand by that, heh !
I take bugs, I fix a number bunch, I don't care about bugzilla points (actually
they seems to have disapeared), if you don't want to use my code or report bugs
about it fine, but the fact that I'm offering code and help and fix should
never being used as a mean to shut me up.

It's the person using my code who complains, because I expressed the fact
that the report was wrong and the way to use the software was inappropriate.
I know people get used to get everything for free and complain when it's still
not provided exactly the way they want, but look, where is my reward here
for providing the stuff in the first place, expertise and answering those bugs
when I have time ? Why should I continue doing so ?

Beware by biting the hand which feeds you you it may just stop you know !

Daniel