GNOME Bugzilla – Bug 683109
xmlDocDumpFormatMemoryEnc and xmlNodeDumpOutput do not remove characters that are illegal in an XML document
Last modified: 2021-07-05 13:20:44 UTC
Created attachment 223071 [details] Example source file (line 36 is relevant) Some characters may never appear in an XML document. http://www.w3.org/TR/2008/REC-xml-20081126/#dt-character While xmlSaveFormatFileEnc does remove such characters from the output, the functions xmlDocDumpFormatMemoryEnc and xmlNodeDumpOutput that dump xml in memory do not, at least not when an encoding is specified. To reproduce the problem, see the attached example.c, which is a slight modification of the io2 example. gcc -I./include example.c ./.libs/libxml2.so && ./a.out | ./xmllint - Note: When no encoding is specified (encoding parameter is NULL), we get a warning on the console "xmlEscapeEntities : char out of range" and the text node is removed from the tree.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.