GNOME Bugzilla – Bug 684774
xmlTextReader reports "validity error" with external DTD
Last modified: 2012-10-25 07:43:15 UTC
In libxml2-2.9.0, xmlTextReader reports validation errors when parsing a file that has an external DTD. Cannot reproduce with libxml-2.8.0. Also, an inline DTD does not stimulate the errors. The testReader application shipped with libxml2 can be used to reproduce the problem: libxml2-2.9.0/testReader --valid test/valid/REC-xml-19980210.xml test/valid/REC-xml-19980210.xml:52: element title: validity error : No declaration for element title <title>Extensible Markup Language (XML) 1.0</title> ^ test/valid/REC-xml-19980210.xml:53: element version: validity error : No declaration for element version <version></version> ^ test/valid/REC-xml-19980210.xml:54: element w3c-designation: validity error : No declaration for element w3c-designation <w3c-designation>REC-xml-&iso6.doc.date;</w3c-designation> ^ test/valid/REC-xml-19980210.xml:55: element spec: validity error : No declaration for element spec <snip>
The push parser suffers from the same bug in libxml2 2.9.0. Can be tested with xmllint. xmllint --valid test/valid/objednavka.xml # No validation errors xmllint --push --valid test/valid/objednavka.xml # Lots of "No declaration" xmllint --stream --valid test/valid/objednavka.xml # Lots of "No declaration" This error was reported on the libxmlplusplus-list, https://mail.gnome.org/archives/libxmlplusplus-list/2012-October/msg00000.html Since it was not immediately clear whether the bug is in libxml++ or libxml2, I made some investigations. Fedora 16, mentioned in the mailing list post, uses libxml2 2.7.8 + a subset of later patches. http://download.fedoraproject.org/pub/fedora/linux/updates/16/SRPMS/libxml2-2.7.8-8.fc16.src.rpm When I tested with different subsets of the 33 patches in the rpm, I could see that the error shows up in the push parser when Patch28 libxml2-More-fixups-on-the-push-parser-behaviour.patch is applied. That patch is identical (except for line numbers) to http://git.gnome.org/browse/libxml2/commit/?id=5353bbf7dda0a01462109337c5fa34859d3e6d0b
(In reply to comment #1) > Patch28 [details] libxml2-More-fixups-on-the-push-parser-behaviour.patch Ignore the links here. I didn't expect Bugzilla to add them.
That's something I need to fix, this was reported to me last week, but I hadn't time yet to investigate what is going on, certainly worth a 2.9.1 release, but I'm fairly busy at this point.
The commit that I mentioned in comment 1 adds the lines ctxt->progressive = 1; ctxt->checkIndex = 0; to xmlParseTryOrFinish() where the string "<!DOCTYPE ...>" has been found. If I delete ctxt->progressive = 1; the validation succeeds. When ctxt->progressive != 0, GROW and SHRINK do nothing. Does that mean that not all of the DTD file is read? I'm not very familiar with the code in libxml2. I don't know if deleting "ctxt->progressive = 1;" is an acceptable fix for this bug, or if it has bad side effects. I suppose there was a reason for adding it. Another thing that I find strange with this commit is that when "<!DOCTYPE" has been found, but '>' is not yet available in the input stream, ctxt->progressive = XML_PARSER_DTD; but when ctxt->progressive == XML_PARSER_DTD, then xmlParseCheckTransition() searches for ']' (i.e. not '>') in the input stream. I can't say if this is a bug, or if it's correct. I just find it strange.
Thanks Kjell, I really need to get my head on it again, there is a bug. For the ] search instead of > i think that's somehow normal as the internal subset should finish with ']' and then '>' http://www.w3.org/TR/REC-xml/#NT-doctypedecl Daniel
(In reply to comment #4) > If I delete > ctxt->progressive = 1; > the validation succeeds. No, deleting "ctxt->progressive = 1;" is not a solution to this bug. If the whole string "<!DOCTYPE ...>", including '>' is not delivered in one call to xmlParseChunk(), ctxt->progressive is set to XML_PARSER_DTD, and it remains so for too long. Replacing ctxt->progressive = 1; with ctxt->progressive = 0; works in the cases I have tested, but I don't know if there are other cases where it would not work. "xmllint --push ..." calls xmlParseChunk() with 4096-byte chunks. If you want to test with xmllint, you need an xml file where the string "<!DOCTYPE ...>" is more than 4096 bytes long, e.g. by inserting about 4100 blanks before '>'.
Okay, I really investigated that time :-) So yes i agree ctxt->progressive should be set to 0 there, that's the proper fix. There is a couple of other extra chunk in parser.c and SAX2.c which are needed to avoid some existing bugs too. The hard part is make sure this kind of mistake don't show up again. So first I added the test/VC/ and test/valid/ to reader validation testing within runtest. And second I added an xmllint option --pushsmall operating like --push but using 10 bytes chunks at a time. That works: thinkpad:~/XML -> xmllint --noout --pushsmall --valid test/valid/REC-xml-19980210.xml thinkpad:~/XML -> There is actually a new test called "testlimits" which test the behaviour of libxml2 SAX and reader handling of very large sections inserted at various places, I don't run it by default in "make check' because it's too heavy. http://git.gnome.org/browse/libxml2/commit/?id=6c91aa384f48ff6d406553a6dd47fd556c1ef2e6 http://git.gnome.org/browse/libxml2/commit/?id=1abd221be56e535fc2e3b2de27b55a8538a41fc6 http://git.gnome.org/browse/libxml2/commit/?id=a7982ce272016182afafb6f74f3b312c6e5f7faf I think it's all set in git now :-) Thanks a lot for the help ! Daniel