GNOME Bugzilla – Bug 787895
Incorrect usage of libxml2 parser API
Last modified: 2017-10-04 18:22:51 UTC
librsvg seems to use the libxml2 API in an unorthodox way. It creates a (memory) push parser with xmlCreatePushParserCtxt, then pushes its own parser input with xmlPushInput on top of the internal parser input. This confuses several parser checks and, with libxml2 2.9.5, completely breaks documents with internal subsets. I committed a fix for the 2.9.5 regression, but librsvg's API usage can still result in problems, also with older libxml2 versions. I'd suggest to switch to xmlCreateIOParserCtxt instead: http://xmlsoft.org/html/libxml-parser.html#xmlCreateIOParserCtxt
libxml2 workaround: https://git.gnome.org/browse/libxml2/commit/?id=b90d8989d3dc486519686f01490379c963bd1145
Thanks; this is interesting. I had no idea that one can stack input providers for libxml2 - the "read some xml" code is basically untouched for years. I'll modify librsvg to use xmlCreateIOParserCtxt(), since it seems to match our wish to be in control of the input data in some situations. From the libxml2 docs, it's not obvious how to set up parsers and how to feed them in various situations: * Create a parser and push memory buffers to it. Is this the correct sequence? ctx = xmlCreatePushParserCtxt(...); while (have_more_data) { xmlParseChunk (ctx, ..., FALSE); } xmlParseChunk (ctx, "", 0, TRUE); /* terminate = TRUE */ xmlFreeDoc (ctx->myDoc); xmlFreeParserCtxt (ctx); * Create a parser with custom callbacks for reading. Is this the correct sequence? ctx = xmlCreateIOParserCtxt(...); result = xmlParseDocument (ctx); xmlFreeDoc (ctx->myDoc); xmlFreeParserCtxt (ctx);
Yes, that's the correct sequence of API calls.
Perfect, thanks. I'll start working on this.
Pushed to master and librsvg-2.40.
Looks good. A small note regarding this commit: https://git.gnome.org/browse/librsvg/commit/?id=18773041f17c8de0a8bdd6a02135fd842fddc8f1 You're right that poking into the parser context is dangerous. I think that you should simply set the XML_PARSE_NOENT option if you want entities to be expanded (both internal and external).
I have filed bug #788528 about that.