GNOME Bugzilla – Bug 684774
xmlTextReader reports "validity error" with external DTD
Last modified: 2012-10-25 07:43:15 UTC
In libxml2-2.9.0, xmlTextReader reports validation errors when parsing a file that has an external DTD.
Cannot reproduce with libxml-2.8.0. Also, an inline DTD does not stimulate the errors.
The testReader application shipped with libxml2 can be used to reproduce the problem:
libxml2-2.9.0/testReader --valid test/valid/REC-xml-19980210.xml
test/valid/REC-xml-19980210.xml:52: element title: validity error : No declaration for element title
<title>Extensible Markup Language (XML) 1.0</title>
test/valid/REC-xml-19980210.xml:53: element version: validity error : No declaration for element version
test/valid/REC-xml-19980210.xml:54: element w3c-designation: validity error : No declaration for element w3c-designation
test/valid/REC-xml-19980210.xml:55: element spec: validity error : No declaration for element spec
The push parser suffers from the same bug in libxml2 2.9.0.
Can be tested with xmllint.
xmllint --valid test/valid/objednavka.xml # No validation errors
xmllint --push --valid test/valid/objednavka.xml # Lots of "No declaration"
xmllint --stream --valid test/valid/objednavka.xml # Lots of "No declaration"
This error was reported on the libxmlplusplus-list,
Since it was not immediately clear whether the bug is in libxml++ or libxml2,
I made some investigations. Fedora 16, mentioned in the mailing list post,
uses libxml2 2.7.8 + a subset of later patches.
When I tested with different subsets of the 33 patches in the rpm, I could see
that the error shows up in the push parser when
is applied. That patch is identical (except for line numbers) to
(In reply to comment #1)
> Patch28 [details] libxml2-More-fixups-on-the-push-parser-behaviour.patch
Ignore the links here. I didn't expect Bugzilla to add them.
That's something I need to fix, this was reported to me last week,
but I hadn't time yet to investigate what is going on, certainly
worth a 2.9.1 release, but I'm fairly busy at this point.
The commit that I mentioned in comment 1 adds the lines
ctxt->progressive = 1;
ctxt->checkIndex = 0;
to xmlParseTryOrFinish() where the string "<!DOCTYPE ...>" has been found.
If I delete
ctxt->progressive = 1;
the validation succeeds. When ctxt->progressive != 0, GROW and SHRINK do
nothing. Does that mean that not all of the DTD file is read? I'm not very
familiar with the code in libxml2. I don't know if deleting
"ctxt->progressive = 1;" is an acceptable fix for this bug, or if it has bad
side effects. I suppose there was a reason for adding it.
Another thing that I find strange with this commit is that when "<!DOCTYPE"
has been found, but '>' is not yet available in the input stream,
ctxt->progressive = XML_PARSER_DTD;
but when ctxt->progressive == XML_PARSER_DTD, then xmlParseCheckTransition()
searches for ']' (i.e. not '>') in the input stream. I can't say if this is a
bug, or if it's correct. I just find it strange.
Thanks Kjell, I really need to get my head on it again, there is
a bug. For the ] search instead of > i think that's somehow normal
as the internal subset should finish with ']' and then '>'
(In reply to comment #4)
> If I delete
> ctxt->progressive = 1;
> the validation succeeds.
No, deleting "ctxt->progressive = 1;" is not a solution to this bug.
If the whole string "<!DOCTYPE ...>", including '>' is not delivered in one
call to xmlParseChunk(), ctxt->progressive is set to XML_PARSER_DTD, and it
remains so for too long. Replacing
ctxt->progressive = 1;
ctxt->progressive = 0;
works in the cases I have tested, but I don't know if there are other cases
where it would not work.
"xmllint --push ..." calls xmlParseChunk() with 4096-byte chunks. If you want
to test with xmllint, you need an xml file where the string "<!DOCTYPE ...>"
is more than 4096 bytes long, e.g. by inserting about 4100 blanks before '>'.
Okay, I really investigated that time :-)
So yes i agree ctxt->progressive should be set to 0 there, that's the
proper fix. There is a couple of other extra chunk in parser.c and
SAX2.c which are needed to avoid some existing bugs too.
The hard part is make sure this kind of mistake don't show up again.
So first I added the test/VC/ and test/valid/ to reader validation
testing within runtest. And second I added an xmllint option --pushsmall
operating like --push but using 10 bytes chunks at a time. That works:
thinkpad:~/XML -> xmllint --noout --pushsmall --valid test/valid/REC-xml-19980210.xml
There is actually a new test called "testlimits" which test the behaviour
of libxml2 SAX and reader handling of very large sections inserted at various
places, I don't run it by default in "make check' because it's too heavy.
I think it's all set in git now :-)
Thanks a lot for the help !