GNOME Bugzilla – Bug 762110
Parsing without entity substitution broken
Last modified: 2021-06-18 16:04:18 UTC
Created attachment 321320 [details] XML file that exhibits problem Commit 9cd1c3cfbd32655d60572c0a413e017260c854df broke parsing when not substituting entities or validating. This broke a regression test in itstool for the --keep-entities feature. Basically, this feature does not substitute entity values in translations. However, it still loads all DTDs, because libxml2 will fail if it encounters an unrecognized entity, even if not substituting. The basic Python is: ctxt = libxml2.createFileParserCtxt(filename) ctxt.loadSubset(1) ctxt.replaceEntities(0) ctxt.parseDocument() With this commit, it seems libxml2 is no longer even reading the external entity definitions, which causes it to fail when it runs across an entity reference it doesn't recognize. Attaching a test XML file that references DocBook 4.5 (best if your catalogs are correct), as well as a C file. You can use the XML file either with the C file or with the Python snippet above.
Created attachment 321321 [details] C test program
Ping? This breaks an itstool feature, and causes its test suite to fail. There seems to no longer be a way to keep entities that are potentially defined in an external resource. If there were a parser option I could set in itstool, I would do it. But I don't see anything from my reading of the source. xmllint also can't parse the sample file without either substituting entities or spewing errors.
You must pass the XML_PARSE_DTDLOAD option now: xmlCtxtUseOptions(ctxt, XML_PARSE_DTDLOAD);
Given that python 3 itstool is out, presumably this bug can be closed?
Closing per last comment