After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 440415 - libxml2 fails to load an external DTD with a UTF-8 BOM
libxml2 fails to load an external DTD with a UTF-8 BOM
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: general
git master
Other All
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2007-05-22 10:59 UTC by Mark Rowe
Modified: 2009-08-21 16:08 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
XML file and DTD test case mentioned in description (1.78 KB, application/zip)
2007-05-22 11:01 UTC, Mark Rowe
Details

Description Mark Rowe 2007-05-22 10:59:28 UTC
Please describe the problem:
When loading an external DTD libxml2 fails to account for the presence of a BOM at the beginning of the content.  It bails out with an error of "APDataView.xsl.dtd:1: parser error : Content error in the external subset".


Steps to reproduce:
1. Extract the attached zip file.
2. Run xmllint --loaddtd LoadingActivity.xsl


Actual results:
The "Content error in the external subset" error is displayed, followed by many errors about undefined entities.

Expected results:
No errors.

Does this happen every time?
Yes, it happens all the time.

Other information:
I can't find any solid information about the acceptability of BOMs in DTDs, but the information I did find leads me to believe they should probably be supported.
Comment 1 Mark Rowe 2007-05-22 11:01:07 UTC
Created attachment 88592 [details]
XML file and DTD test case mentioned in description
Comment 2 Mark Rowe 2007-05-22 11:04:14 UTC
The patch below appears to address the issue, but I am not familiar with this part of code so there may be a better fix:

diff --git a/libxml2/parser.c b/libxml2/parser.c
index 2d84a74..7c12ef9 100644
--- a/libxml2/parser.c
+++ b/libxml2/parser.c
@@ -5840,6 +5840,19 @@ xmlParseExternalSubset(xmlParserCtxtPtr ctxt, const xmlChar *ExternalID,
                        const xmlChar *SystemID) {
     xmlDetectSAX2(ctxt);
     GROW;
+
+    if (ctxt->input->end - ctxt->input->cur >= 4) {
+        xmlChar start[4];
+        xmlCharEncoding enc;
+        start[0] = RAW;
+        start[1] = NXT(1);
+        start[2] = NXT(2);
+        start[3] = NXT(3);
+        enc = xmlDetectCharEncoding(start, 4);
+        if (enc != XML_CHAR_ENCODING_NONE)
+            xmlSwitchEncoding(ctxt, enc);
+    }
+
     if (CMP5(CUR_PTR, '<', '?', 'x', 'm', 'l')) {
        xmlParseTextDecl(ctxt);
        if (ctxt->errNo == XML_ERR_UNSUPPORTED_ENCODING) {

Comment 3 Daniel Veillard 2008-04-03 13:02:39 UTC
Sorry for the delay this got buried in the pile of bugs reports...

Hum, right, good catch, it's a parser bug.
The fix is fine, that's the same kind of things we
do in xmlParseDocument for the main entity.
I just added one text for ctxt->encoding and
commited it to SVN revision 3730,

  thanks a lot,

Daniel

P.S.: in the future try to put patches as attachments to bugzilla
  and flagged as patches, that way it's easier to find bugs
  with patches and process them quickly!

  
Comment 4 Daniel Veillard 2009-08-21 16:08:16 UTC
That was fixed last year