GNOME Bugzilla – Bug 440415
libxml2 fails to load an external DTD with a UTF-8 BOM
Last modified: 2009-08-21 16:08:16 UTC
Please describe the problem: When loading an external DTD libxml2 fails to account for the presence of a BOM at the beginning of the content. It bails out with an error of "APDataView.xsl.dtd:1: parser error : Content error in the external subset". Steps to reproduce: 1. Extract the attached zip file. 2. Run xmllint --loaddtd LoadingActivity.xsl Actual results: The "Content error in the external subset" error is displayed, followed by many errors about undefined entities. Expected results: No errors. Does this happen every time? Yes, it happens all the time. Other information: I can't find any solid information about the acceptability of BOMs in DTDs, but the information I did find leads me to believe they should probably be supported.
Created attachment 88592 [details] XML file and DTD test case mentioned in description
The patch below appears to address the issue, but I am not familiar with this part of code so there may be a better fix: diff --git a/libxml2/parser.c b/libxml2/parser.c index 2d84a74..7c12ef9 100644 --- a/libxml2/parser.c +++ b/libxml2/parser.c @@ -5840,6 +5840,19 @@ xmlParseExternalSubset(xmlParserCtxtPtr ctxt, const xmlChar *ExternalID, const xmlChar *SystemID) { xmlDetectSAX2(ctxt); GROW; + + if (ctxt->input->end - ctxt->input->cur >= 4) { + xmlChar start[4]; + xmlCharEncoding enc; + start[0] = RAW; + start[1] = NXT(1); + start[2] = NXT(2); + start[3] = NXT(3); + enc = xmlDetectCharEncoding(start, 4); + if (enc != XML_CHAR_ENCODING_NONE) + xmlSwitchEncoding(ctxt, enc); + } + if (CMP5(CUR_PTR, '<', '?', 'x', 'm', 'l')) { xmlParseTextDecl(ctxt); if (ctxt->errNo == XML_ERR_UNSUPPORTED_ENCODING) {
Sorry for the delay this got buried in the pile of bugs reports... Hum, right, good catch, it's a parser bug. The fix is fine, that's the same kind of things we do in xmlParseDocument for the main entity. I just added one text for ctxt->encoding and commited it to SVN revision 3730, thanks a lot, Daniel P.S.: in the future try to put patches as attachments to bugzilla and flagged as patches, that way it's easier to find bugs with patches and process them quickly!
That was fixed last year