After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 787895 - Incorrect usage of libxml2 parser API
Incorrect usage of libxml2 parser API
Status: RESOLVED FIXED
Product: librsvg
Classification: Core
Component: general
git master
Other All
: Normal normal
: ---
Assigned To: Federico Mena Quintero
librsvg maintainers
Depends on:
Blocks:
 
 
Reported: 2017-09-19 14:45 UTC by Nick Wellnhofer
Modified: 2017-10-04 18:22 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Nick Wellnhofer 2017-09-19 14:45:36 UTC
librsvg seems to use the libxml2 API in an unorthodox way. It creates a (memory) push parser with xmlCreatePushParserCtxt, then pushes its own parser input with xmlPushInput on top of the internal parser input. This confuses several parser checks and, with libxml2 2.9.5, completely breaks documents with internal subsets. I committed a fix for the 2.9.5 regression, but librsvg's API usage can still result in problems, also with older libxml2 versions. I'd suggest to switch to xmlCreateIOParserCtxt instead:

http://xmlsoft.org/html/libxml-parser.html#xmlCreateIOParserCtxt
Comment 2 Federico Mena Quintero 2017-10-03 15:05:50 UTC
Thanks; this is interesting.  I had no idea that one can stack input providers for libxml2 - the "read some xml" code is basically untouched for years.

I'll modify librsvg to use xmlCreateIOParserCtxt(), since it seems to match our wish to be in control of the input data in some situations.

From the libxml2 docs, it's not obvious how to set up parsers and how to feed them in various situations:

* Create a parser and push memory buffers to it.  Is this the correct sequence?

  ctx = xmlCreatePushParserCtxt(...);
  while (have_more_data) {
     xmlParseChunk (ctx, ..., FALSE);
  }
  xmlParseChunk (ctx, "", 0, TRUE); /* terminate = TRUE */
  xmlFreeDoc (ctx->myDoc);
  xmlFreeParserCtxt (ctx);

* Create a parser with custom callbacks for reading.  Is this the correct sequence?

  ctx = xmlCreateIOParserCtxt(...);
  result = xmlParseDocument (ctx);
  xmlFreeDoc (ctx->myDoc);
  xmlFreeParserCtxt (ctx);
Comment 3 Nick Wellnhofer 2017-10-03 15:22:34 UTC
Yes, that's the correct sequence of API calls.
Comment 4 Federico Mena Quintero 2017-10-03 16:38:20 UTC
Perfect, thanks.  I'll start working on this.
Comment 5 Federico Mena Quintero 2017-10-04 01:07:56 UTC
Pushed to master and librsvg-2.40.
Comment 6 Nick Wellnhofer 2017-10-04 11:40:36 UTC
Looks good. A small note regarding this commit:

https://git.gnome.org/browse/librsvg/commit/?id=18773041f17c8de0a8bdd6a02135fd842fddc8f1

You're right that poking into the parser context is dangerous. I think that you should simply set the XML_PARSE_NOENT option if you want entities to be expanded (both internal and external).
Comment 7 Federico Mena Quintero 2017-10-04 18:22:51 UTC
I have filed bug #788528 about that.