GNOME Bugzilla – Bug 721239
xmlSaveTree() will ignore encoding from context
Last modified: 2021-07-05 13:24:42 UTC
Hello, if i have some document like this: *********************************************************** <?xml version="1.0"?> <h> <c tooltip="Nürburg Grand Prix Kurs"/> <c tooltip="Autódromo José Carlos Pace"/> </h> *********************************************************** loaded into pxdocin. Here pxdocin->encoding is set, obviously, to NULL because the declaration does not contain info about encoding. So when you try to create some save ctxt and set the output encoding to utf8, and use the xmlSaveDoc() function, the document will be automatically switched and the output, in a windows shell, would be like this *********************************************************** <?xml version="1.0" encoding="utf8"?> <h> <c tooltip="N├╝rburg Grand Prix Kurs"/> <c tooltip="Aut├│dromo Jos├® Carlos Pace"/> </h> *********************************************************** here is the code for this output: *********************************************************** xmlNodePtr pRoot=xmlDocGetRootElement(pxdocin); xmlSaveCtxtPtr pxsctxt=xmlSaveToFd(1,"utf8",XML_SAVE_FORMAT|XML_SAVE_AS_XML); xmlSaveDoc(pxsctxt,pRootNode->doc); xmlSaveClose(pxsctx); *********************************************************** which is ok (note that windows shell is defaulted to cp850) BUT: if you try to dump a node or a nodelist, using xmlSaveTree() function, then the output results to this: *********************************************************** <h> <c tooltip="Nürburg Grand Prix Kurs"/> <c tooltip="Autódromo José Carlos Pace"/> </h> *********************************************************** here is the code for this (note the absence of the decl, because it's the root node: *********************************************************** xmlNodePtr pRoot=xmlDocGetRootElement(pxdocin); xmlSaveCtxtPtr pxsctxt=xmlSaveToFd(1,"utf8",XML_SAVE_FORMAT|XML_SAVE_AS_XML); xmlSaveTree(pxsctxt,pRoot); xmlSaveClose(pxsctx); *********************************************************** so basically it's the same saving context but using the xmlSaveTree() function and not the xmlSaveDoc() function. The problem, here, is in xmlsave.c @ line:2101 *********************************************************** } else if ((*cur >= 0x80) && ((doc == NULL) || (doc->encoding == NULL))) { *********************************************************** because: - calling xmlSaveDoc(), will set doc->encoding to ctxt->encoding and dump to output correctly encoded - calling xmlSaveTree(), will not set doc->encoding, thus ti will escape chars using numeric entities. SO: my idea of fix is in file xmlsave.c @line 1953 long xmlSaveTree(xmlSaveCtxtPtr ctxt, xmlNodePtr node) { long ret = 0; const xmlChar* pSavedDocEncoding=NULL; if ((ctxt == NULL) || (node == NULL)) return(-1); // save document encoding pSavedDocEncoding=node->doc->encoding; // set the ctxt encoding node->doc->encoding=ctxt->encoding; // dump xmlNodeDumpOutputInternal(ctxt, node); // restore document encoding node->doc->encoding=pSavedDocEncoding; return(ret); } this will ensure that the encoding is set and evaluated and then restored when it ends bye.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.