GNOME Bugzilla – Bug 162776
Importing libxslt affects libxml2's parseDoc
Last modified: 2021-07-05 10:59:52 UTC
I am using the Python bindings for libxml2-2.6.15 and libxslt-1.1.9. With the following file in foo.dtd: ----- <!ENTITY bar "baz"> ----- and the following script: ----- #!/usr/bin/env python import libxml2 libxml2.substituteEntitiesDefault(1) doc = libxml.parseDoc(''' <!DOCTYPE foo SYSTEM "foo.dtd"> <foo>&bar;</foo> ''') doc.freeDoc() ----- I would have expected no errors, but I get: ----- Entity: line 3: parser error : Entity 'bar' not defined <foo>&bar;</foo> ^ ----- But adding an "import libxslt" to the script enables it to run without errors! Importing a module without using it should have no effect, shouldn't it? Also surprising to me was that the substituteEntitiesDefault call has no effect on whether the parser attempts to load the DTD.
Using the global variables is a bad idea. To pass specific options to the parser use the new APIs like readDoc() Daniel
You reported this bug a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue for you. Can you please check again if the issue you reported here still happens in a recent version and update this report by adding a comment and adjusting the 'Version' field? Again thank you for reporting this and sorry that it could not be fixed for the version you originally used here. Set to NEEDINFO, so without feedback this report will be closed as INCOMPLETE after 6 weeks.
I tried exactly the same code (except the "libxml.parseDoc" should have been "libxml2.parseDoc" above) with the current versions on Ubuntu, and the outcome is the same. I've updated the version field for this bug. That said, I haven't played with libxml in a long time, so the existence of this bug hasn't been a problem for me in the past five years.
Importing libxslt seems to still have at least two side effects for libxml2. The examples I have encountered: w/o importing libxslt: - libxml2.registerErrorHandler has no effect - libxml2.parseDoc does not attempt to load DTDs (presumably XML_PARSER_LOADDTD is excluded from default options) after importing libxslt: - libxml2.registerErrorHandler works as expected - libxml2.parseDoc attempts to load DTDs Our difficulties were 1) the inability to move error output from /dev/stderr to a different stream in code without additional processes and fifos, and 2) the unexpected (and apparently often very slow, though of course that's not a libxml2 issue) network requests resulting from external DTDs (and w3c has apparently been inundated with this sort of request in recent memory). Excluding libxslt in the first case is not necessary, and as mentioned earlier, libxml2.readDoc respects options for the second case, so I have workarounds for both. However, I would still argue that silently making undocumented network requests without user input in one use case but not in another (which ought to be identical) is a defect, and the first problem is clearly a defect. In fact, I would say the "in one use case but not in another" language is not needed there, but there has been some valid discussion in both directions there when a related effect was found in SAX. See http://bugs.python.org/issue2124 This behavior is observed in libxml2-2.8.0 + libxslt-1.1.26 (both x86_64 builds) on a rhel-based x86_64 platform.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxslt/-/issues/ Thank you for your understanding and your help.