GNOME Bugzilla – Bug 78729
Validation bug for strange content model
Last modified: 2009-08-15 18:40:50 UTC
When trying to validate (e.g. using xmllint --valid) the following invalid document, libxml seems to hang in some endless loop eating more and more memory and finally crashes with realloc failed !nSegmentation fault (I had .5 G physical plus 1G virtual memory, so I don't think that this is an effect of to few memory) --- <!DOCTYPE x [ <!ELEMENT x (a+ | ((b), (c?, d*)+)+)> <!ELEMENT a EMPTY> <!ELEMENT b EMPTY> <!ELEMENT c EMPTY> <!ELEMENT d EMPTY> ]> <x><b/><a/></x> --- Of course the form of the content model is a bit strange and can be replaced by the equivalent <!ELEMENT x (a+ | ((b), (c | d )*)+)> in which case xmllint perfectly finds, that the document is invalid. It seems that the realloc failes in valid.c:vstateVPush since this is the only error message missing the \ before the 'n' (so typos can be helpful ;-). I didn't manage to track the problem down, though. As I said, the content model is a bit strange, but it is perfectly legal. Actually I came across this problem, when playing around with the xmlValidGetValidElements function on some TEI-document using the xteilite.dtd which contains the element declaration <!ELEMENT publicationStmt (p+ | ((publisher | distributor | authority) , (pubPlace?, address?, idno*, availability?, date?)+)+)> I checked this with libxml 2.4.17 and 2.4.18.
Hum, that's serious ! There is 2 things: 1/ the unbounded use of memory, this has to be fixed ASAP 2/ the problem in the validation engine For the first, I just commited a fix to prevent the memory usage explosion: http://cvs.gnome.org/bonsai/cvsquery.cgi?module=gnome-xml&branch=HEAD&branchtype=match&dir=gnome-xml&file=&filetype=match&who=veillard&whotype=match&sortby=Date&hours=&date=explicit&mindate=04%2F15%2F02+06%3A14&maxdate=04%2F15%2F02+06%3A16&cvsroot=%2Fcvs%2Fgnome For the second part, I think the right approach will be to drop the current ad-hoc algorithm and use the regexp engine which I'm developping for XML Schemas. That integration will not make it in the next release but I expect the following one to have the fix based on a far more reliable core. Daniel
Okay this should be fixed for good now. The code in CVs now use the regexp implementation designed for XML schemas to do the DTD cvalidation of the element content model. This is a large change that you may get either from CVs or by waiting for the next release, thanks for the bug report and the example ! paphio:~/XML -> xmllint --valid tst.xml <?xml version="1.0"?> <!DOCTYPE x [ <!ELEMENT x (a+ | (b , c? , d*+)+)> <!ELEMENT a EMPTY> <!ELEMENT b EMPTY> <!ELEMENT c EMPTY> <!ELEMENT d EMPTY> ]> <x><b/><a/></x> paphio:~/XML -> cat .memdump 11:45:56 PM MEMORY ALLOCATED : 0, MAX was 23985 BLOCK NUMBER SIZE TYPE paphio:~/XML -> fixed ! Daniel
Shoud be closed in the last release, thanks, Daniel