After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 684774 - xmlTextReader reports "validity error" with external DTD
xmlTextReader reports "validity error" with external DTD
Product: libxml2
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Reported: 2012-09-25 10:58 UTC by John Ellis
Modified: 2012-10-25 07:43 UTC
See Also:
GNOME target: ---
GNOME version: ---

Description John Ellis 2012-09-25 10:58:27 UTC
In libxml2-2.9.0, xmlTextReader reports validation errors when parsing a file that has an external DTD.

Cannot reproduce with libxml-2.8.0. Also, an inline DTD does not stimulate the errors.

The testReader application shipped with libxml2 can be used to reproduce the problem:

libxml2-2.9.0/testReader --valid test/valid/REC-xml-19980210.xml

test/valid/REC-xml-19980210.xml:52: element title: validity error : No declaration for element title
<title>Extensible Markup Language (XML) 1.0</title>
test/valid/REC-xml-19980210.xml:53: element version: validity error : No declaration for element version
test/valid/REC-xml-19980210.xml:54: element w3c-designation: validity error : No declaration for element w3c-designation
test/valid/REC-xml-19980210.xml:55: element spec: validity error : No declaration for element spec

Comment 1 Kjell Ahlstedt 2012-10-10 07:46:30 UTC
The push parser suffers from the same bug in libxml2 2.9.0.

Can be tested with xmllint.

xmllint --valid test/valid/objednavka.xml          # No validation errors
xmllint --push --valid test/valid/objednavka.xml   # Lots of "No declaration"
xmllint --stream --valid test/valid/objednavka.xml # Lots of "No declaration"

This error was reported on the libxmlplusplus-list,

Since it was not immediately clear whether the bug is in libxml++ or libxml2,
I made some investigations. Fedora 16, mentioned in the mailing list post,
uses libxml2 2.7.8 + a subset of later patches.

When I tested with different subsets of the 33 patches in the rpm, I could see
that the error shows up in the push parser when
Patch28 libxml2-More-fixups-on-the-push-parser-behaviour.patch
is applied. That patch is identical (except for line numbers) to
Comment 2 Kjell Ahlstedt 2012-10-10 07:51:53 UTC
(In reply to comment #1)
> Patch28 [details] libxml2-More-fixups-on-the-push-parser-behaviour.patch

Ignore the links here. I didn't expect Bugzilla to add them.
Comment 3 Daniel Veillard 2012-10-10 08:27:11 UTC
That's something I need to fix, this was reported to me last week,
but I hadn't time yet to investigate what is going on, certainly
worth a 2.9.1 release, but I'm fairly busy at this point.
Comment 4 Kjell Ahlstedt 2012-10-23 15:14:51 UTC
The commit that I mentioned in comment 1 adds the lines
  ctxt->progressive = 1;
  ctxt->checkIndex = 0;
to xmlParseTryOrFinish() where the string "<!DOCTYPE ...>" has been found.

If I delete 
  ctxt->progressive = 1;
the validation succeeds. When ctxt->progressive != 0, GROW and SHRINK do
nothing. Does that mean that not all of the DTD file is read? I'm not very
familiar with the code in libxml2. I don't know if deleting
"ctxt->progressive = 1;" is an acceptable fix for this bug, or if it has bad
side effects. I suppose there was a reason for adding it.

Another thing that I find strange with this commit is that when "<!DOCTYPE"
has been found, but '>' is not yet available in the input stream,
  ctxt->progressive = XML_PARSER_DTD;
but when ctxt->progressive == XML_PARSER_DTD, then xmlParseCheckTransition()
searches for ']' (i.e. not '>') in the input stream. I can't say if this is a
bug, or if it's correct. I just find it strange.
Comment 5 Daniel Veillard 2012-10-23 15:25:44 UTC
Thanks Kjell, I really need to get my head on it again, there is
a bug. For the ] search instead of > i think that's somehow normal
as the internal subset should finish with ']' and then '>'

Comment 6 Kjell Ahlstedt 2012-10-24 08:42:11 UTC
(In reply to comment #4)
> If I delete 
>   ctxt->progressive = 1;
> the validation succeeds.

No, deleting "ctxt->progressive = 1;" is not a solution to this bug.
If the whole string "<!DOCTYPE ...>", including '>' is not delivered in one
call to xmlParseChunk(), ctxt->progressive is set to XML_PARSER_DTD, and it
remains so for too long. Replacing
  ctxt->progressive = 1;
  ctxt->progressive = 0;
works in the cases I have tested, but I don't know if there are other cases
where it would not work.

"xmllint --push ..." calls xmlParseChunk() with 4096-byte chunks. If you want
to test with xmllint, you need an xml file where the string "<!DOCTYPE ...>"
is more than 4096 bytes long, e.g. by inserting about 4100 blanks before '>'.
Comment 7 Daniel Veillard 2012-10-25 07:43:15 UTC
Okay, I really investigated that time :-)

So yes i agree ctxt->progressive should be set to 0 there, that's the
proper fix. There is a couple of other extra chunk in parser.c and
SAX2.c which are needed to avoid some existing bugs too.

The hard part is make sure this kind of mistake don't show up again.
So first I added the test/VC/ and test/valid/ to reader validation
testing within runtest. And second I added an xmllint option --pushsmall
operating like --push but using 10 bytes chunks at a time. That works:

thinkpad:~/XML -> xmllint --noout --pushsmall --valid test/valid/REC-xml-19980210.xml
thinkpad:~/XML ->

There is actually a new test called "testlimits" which test the behaviour
of libxml2 SAX and reader handling of very large sections inserted at various
places, I don't run it by default in "make check' because it's too heavy.

  I think it's all set in git now :-)
Thanks a lot for the help !