GNOME Bugzilla – Bug 595128
Regression when using read callbacks with xmlReadIO breaking inkscape
Last modified: 2009-09-15 18:34:09 UTC
With libxml2 2.7.4, this is what the startup of inkscape looks like: /usr/share/inkscape/extensions/render_barcode.inx:1: parser error : XML / declaration allowed only at the start of the document <?xml version="1.0"?> ^ ** (inkscape:32006): CRITICAL **: Inkscape::Extension::Extension* ** Inkscape::Extension::build_from_reprdoc(Inkscape::XML::Document*, ** Inkscape::Extension::Implementation::Implementation*): assertion `doc != ** NULL' failed ** (inkscape:32006): WARNING **: Unable to create extension from definition ** file /usr/share/inkscape/extensions/render_barcode.inx. /usr/share/inkscape/templates/default.svg:1: parser error : XML declaration / allowed only at the start of the document <?xml version="1.0" encoding="UTF-8" standalone="no"?> ^ ** (inkscape:32006): CRITICAL **: SPDesktop* sp_file_new(const ** Glib::ustring&): assertion `doc != NULL' failed The reason for this happening is that only 4 bytes of input are in the input buffer when checking for <?xml<blank> in parser.c:10135. This, in turn, is caused by a change introduced in commit 7e385bd: diff --git a/parser.c b/parser.c index 0a10b0f..0d856b7 100644 --- a/parser.c +++ b/parser.c @@ -10109,8 +10109,9 @@ xmlParseDocument(xmlParserCtxtPtr ctxt) { /* * Check for the XMLDecl in the Prolog. + * do not GROW here to avoid the detected encoder to decode more + * than just the first line */ - GROW; if ((CMP5(CUR_PTR, '<', '?', 'x', 'm', 'l')) && (IS_BLANK_CH(NXT(5)))) { /* Since the buffer is not grown, and the input only has 4 bytes at this moment, the l and the following blank character can't be detected. I don't exactly understand why this GROW has been removed here, since we possibly have not enough input for the test to be done properly, but the added comment leads to think there is some reason behind this, except it breaks the read callback case. If it is not possible to add the GROW back, maybe more than 4 bytes should be read through the callback, then. What do you think?
Patched in git ... http://git.gnome.org/cgit/libxml2/commit/?id=9d3d141c412baa5c713ad3df48f1a4d179e 07b05 in practice if we didn't got the full first line it's a balance between the risk of corrupting data by using the wrong encoding as in https://bugzilla.gnome.org/show_bug.cgi?id=566012 and possibly failing to parse. Somehow I prefer a parsing failure error which is easy to detect than a silent data corruption. I think the code in HEAD is the right balance. Daniel