GNOME Bugzilla – Bug 620195
xpointer doesn't use the string values of elements consistently
Last modified: 2021-07-05 13:21:20 UTC
Created attachment 162412 [details] test case As per the W3C XPointer draft, "Element boundaries, as well as entire embedded nodes such as processing instructions and comments, are ignored as specified by the definition of string-value in [XPath]." http://www.w3.org/TR/xptr-xpointer/#stringrange It turns out that embedded nodes disrupt the string calculations, however. Below is the source and the output of $ xmllint --xinclude xpointer-nested_element.xml > output.xml [some of the strange effects of running this command are described separately in bug #562541 and bug #620190] SOURCE: <div xmlns="http://example.org/"> <p>XXAAXXAAXX <b>YYBBYYBBYY</b> ZZCCZZCCZZ WWDDWWDDWW.</p> <p>XXAAXXAAXX YYBBYYBBYY ZZCCZZCCZZ WWDDWWDDWW.</p> <p>XXAAXXAAXX YYBBYYBBYY <!-- gizmo -->ZZCCZZCCZZ WWDDWWDDWW.</p> <p>Thomas <em>Pyn</em>chon</p> </div> OUTPUT: <body xmlns="http://example.org/"> <!-- Although the entire content of the <p> element should be seen and handled as a single string to the exclusion of non-element nodes (this is what "string value of <p>" is), we can see in the first three <div>s that an embedded node disrupts calculations (the second example is the reference [if you're surprised about the indexes, see bug #620190 - this seems an independent issue]). Notice that in the third <div>, the comment is not included, but nevertheless the calculations change. The last example is modified from the W3C draft, it should match the entire name. It actually does, but it returns more than the requested 10 characters. --> <div> <seg>XXAAXXAAXX</seg> <seg><b xmlns="http://example.org/">YBBYYBBYY</b> Z</seg> <seg>CCZZCCZZ W</seg> <seg>DDWWD</seg> </div> <div> <seg>XXAAXXAAXX</seg> <seg>YYBBYYBBYY</seg> <seg>ZZCCZZCCZZ</seg> <seg>WWDDW</seg> </div> <div> <seg>XXAAXXAAXX</seg> <seg>YYBBYYBBYY</seg> <seg>ZCCZZCCZZ </seg> <seg>WDDWW</seg> </div> <seg>Thomas <em xmlns="http://example.org/">Pyn</em>cho</seg> </body>
Ah, I have now realised the obvious, concerning the "Thoman Pynchon" example: string-range() does return exactly 10 characters: "Thomas "=7 + "cho"=3. It completely skips the embedded element, but it should not be able to distinguish <p>Thomas <em>Pyn</em>chon</p> from <p>Thomas Pynchon</p> So just to make matters clearer: the desired output of the last match is "Thomas Pyn" = 10 characters.
And one more remark (sigh): it doesn't "completely skip" the embedded element: its text value is apparently (properly!) used for the purpose of *matching* the string, but the retrieval (improperly) works on the full element content instead of merely the string value.
How extremely unfriendly of me not to have pasted at least part of the attachment that contains the XInclude directives. Here's the Thomas P. line, for starters: <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(//ex:p,'Thomas Pynchon',1,10))"/></seg> What it says is: - search the p elements for a match with 'Thomas Pynchon' -- this is *successful* - return 10 characters starting with the first character of the match(es) -- this *fails* ------------ The crucial part of the attached file with XIncludes follows, for convenience. <body xmlns="http://example.org/"> <div> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[1],'',1,9)[1])"/></seg> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[1],'',12,10)[1])"/></seg> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[1],'',23,10)[1])"/></seg> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[1],'',34,5)[1])"/></seg> </div> <div> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[2],'',1,9)[1])"/></seg> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[2],'',12,10)[1])"/></seg> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[2],'',23,10)[1])"/></seg> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[2],'',34,5)[1])"/></seg> </div> <div> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[3],'',1,9)[1])"/></seg> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[3],'',12,10)[1])"/></seg> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[3],'',23,10)[1])"/></seg> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(/ex:div/ex:p[3],'',34,5)[1])"/></seg> </div> <seg><include xmlns="http://www.w3.org/2001/XInclude" href="source-nested_element.xml" xpointer="xmlns(ex=http://example.org/) xpointer(string-range(//ex:p,'Thomas Pynchon',1,10))"/></seg> </body>
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.