GNOME Bugzilla – Bug 159547
escaping versus UTF8 in xmlNodeDump
Last modified: 2011-02-25 02:33:35 UTC
calling xmlNodeDump on a root-node of a UTF8 encoded document para0.xml <?xml version='1.0' encoding='utf-8'?> <para>...some UTF8 characters here...</para> under <= 2.6.8 and >= 2.6.15 behaves differently: $ LD_PRELOAD=/net/su/h/local2-rh8/lib/libxml2.so.2.6.8 ./parseprint para0.xml <para>ì¹èø¾øýáíùú»òï</para> $ LD_PRELOAD=/net/su/h/local2-rh8/lib/libxml2.so.2.6.15 ./parseprint para0.xml <para>ěščřžřýáíůúťňď</para> i.e. prior to 2.6.8 non-ascii characters are UTF8, post 2.6.15 non-ascii characters are escaped. parseprint.c is as follows: #include <stdio.h> #include <libxml/parser.h> #include <libxml/parserInternals.h> #include <libxml/tree.h> int main(int argc, char **argv) { xmlDoc *doc = NULL; xmlNode *root_element = NULL; const xmlChar *ret = NULL; xmlParserCtxtPtr ctxt; if (argc != 2) return(1); LIBXML_TEST_VERSION /* libxml2-2.4 API, so that we can link against older versions too */ ctxt = xmlCreateFileParserCtxt(argv[1]); xmlParseDocument(ctxt); doc = ctxt->myDoc; ctxt->myDoc = NULL; xmlFreeParserCtxt(ctxt); if (doc == NULL) { printf("error: could not parse file %s\n", argv[1]); } /*Get the root element node */ root_element = xmlDocGetRootElement(doc); xmlBufferPtr buffer; buffer = xmlBufferCreate(); xmlNodeDump( buffer, doc, root_element, 0, 0); if ( xmlBufferLength(buffer) > 0 ) { ret = xmlBufferContent( buffer ); } printf("%s\n",ret); xmlFreeDoc(doc); xmlCleanupParser(); return 0; }
In practice character references are safer since they will work even if the encoding is mislabelled for example in the HTTP headers (which is the case most of the time). Defaulting to UTF-8 makes some sense, but also carries some risks. Real solution is to use APIs defining the encoding. I tentatively reverted the behaviour in CVS to follow your suggestion but this is a risky business and this may be changed again if this breaks too many users. Daniel
This should be closed by release of libxml2-2.6.21, thanks, Daniel
Hi, I'm try to move my servers from Centos 4 ( libxml2-2.6.16-12.8.i386.rpm ) to Centos 5 ( libxml2-2.6.26-2.1.2.8.el5_5.1.i386.rpm ) and I have problems with applications that use libxml2. This problems are caused by this behavior change of libxml2 at this bug. I'm my opinion this is very serious change, and shouldnt have been made, it endeed the compatibility between versions. It's too late to modify it for compatibility of very older version 2.6.8. Att., Rodrigo Kellermann Ferreira
Sorry, no, I won't backport any of this on RHEL, especially that late, and since there is no RHEL bug against this behaviour. Just grab a more recent source rpm and rebuild it locally, that's my answer Daniel