GNOME Bugzilla – Bug 731065
Inconsistent namespace URI normalization breaks XPointer evaluation
Last modified: 2021-07-05 13:25:12 UTC
Trying to XInclude parts from OpenDocument files fails due to a bug in libxml2 (snapshot 2014-05-31) in xmlXPathNodeCollectAndTest(). The cause is that xmlXPtrEvalXPtrPart() normalizes namespace URIs using xmlSaveUri() before registering them whereas the normal XML parser does no normalization. Thus namespace URI comparison with xmlStrEqual() fails for some URIs, e.g. ones containing ':' in their path part. The following shows how to trigger and locate the bug in gdb. $ > cat data.xml <rootB xmlns="abc://d/e:f"/> $ > cat xinc.xml <rootA xmlns="wxy://z" xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="data.xml" xi:xpointer="xmlns(b=abc://d/e:f) xpointer(/b:rootB)"/> </rootA> $ > xmllint --xinclude xinc.xml xinc.xml:3: element include: XInclude error : XPointer evaluation failed: #xmlns(b=abc://d/e:f) xpointer(/b:rootB) xinc.xml:3: element include: XInclude error : could not load data.xml, and no fallback was found <?xml version="1.0"?> <rootA xmlns="wxy://z" xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="data.xml" xi:xpointer="xmlns(b=abc://d/e:f) xpointer(/b:rootB)"/> </rootA> $ > gdb xmllint ... (gdb) start --xinclude xinc.xml ... (gdb) break xmlStrEqual if (str1 && strcmp(str1,"abc://d/e%3Af")==0) Breakpoint 2 at 0xb7f10fed: file xmlstring.c, line 158. (gdb) cont ... (gdb) finish ... (gdb) print URI $2 = (const xmlChar *) 0x8001ea28 "abc://d/e%3Af" (gdb) print cur->ns->href $3 = (const xmlChar *) 0x8001e4b0 "abc://d/e:f" Here in xmlXPathNodeCollectAndTest() the value of URI comes from xmlXPathNsLookup() whose values are normalized in xmlXPtrEvalXPtrPart() using xmlSaveUri() before being fed to xmlXPathRegisterNs(). Because the value of cur->ns->href is never normalized in the same manner, xmlStrEqual() fails if the normalized value differs from the non-normalized value. xpointer.c: 979 xmlXPtrEvalXPtrPart(...) { xpointer.c: ... xpointer.c:1085 value = xmlParseURI((const char *)ctxt->cur); xpointer.c: ... xpointer.c:1092 URI = xmlSaveUri(value); xpointer.c: ... xpointer.c:1101 xmlXPathRegisterNs(ctxt->context, prefix, URI); xpath.c:12075 xmlXPathNodeCollectAndTest(... xpath.c: ... xpath.c:12149 URI = xmlXPathNsLookup(xpctxt, prefix); xpath.c: ... xpath.c:12433 (xmlStrEqual(URI, cur->ns->href))) xpath.c: ... xpath.c:12450 (xmlStrEqual(URI, cur->ns->href))) xpath.c: ... xpath.c:12482 (xmlStrEqual(URI, cur->ns->href))) xpath.c: ... xpath.c:12501 (xmlStrEqual(URI, I see several approaches to this: 1. Change xmlXPtrEvalXPtrPart() so it doesn't call xmlSaveUri() when registering a new namespace. 2. Add a new function for comparing URIs which ensures normalization and use it everywhere URIs are compared. It could use a hash table of already seen URIs for speed. 3. Ensure everywhere that only normalized URIs make it into all trees during parsing. Or store normalized URIs alongside the non-normalized ones. 4. Maybe something else entirely. Personally, I'd prefer a solution with URI normalization. Otherwise documents like the following won't be recognized as invalid even though the xi:include element has the same attribute twice. <rootA xmlns="abc://a" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xo="http://www.w3.org/2001/XInclud%65"> <xi:include href="data.xml" xi:xpointer=" " xo:xpointer=" "/> </rootA>
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.