GNOME Bugzilla – Bug 306081
Xpointer range-to function loss the end-point childrens
Last modified: 2021-07-05 13:20:53 UTC
Having that files: ---base.xml--- <?xml version="1.0"?> <sect1> <title>Test</title> <para>First Paragraph</para> <para>Second <quote>Paragraph</quote></para> <para>Third <quote>Paragraph</quote></para> <para>Other Text</para> </sect1> --- --- ---include.xml--- <?xml version="1.0"?> <sect1> <title>XInclude Test</title> <xi:include xmlns:xi="http://www.w3.org/2003/XInclude" href="base.xml" xpointer="xpointer(/sect1/para[1]/range-to(/sect1/para[3]))"/> </sect1> --- --- Procesed with: xmllint --xinclude include.xml The output is: <?xml version="1.0"?> <sect1> <title>XInclude Test</title> <para>First Paragraph</para> <para>Second <quote>Paragraph</quote></para> <para>Third </para> </sect1> Note in the third paragraph that <quote>Paragraph</quote> is missing.
Actually I don't think it is completely wrong, but it's weird: http://www.w3.org/TR/xptr-xpointer/#b2b1b1b3b6b7 "A location of type [Definition: range is defined by two points], a [Definition: start point] and an [Definition: end point]. A range represents all of the XML structure and content between the start point and end point. This is distinct from any list of nodes and/or characters, in part because some nodes might be only partly included." The quote in the third para is not in that range because in document order it is further than /sect1/para[3]. So in a sense quote is not in the range, and it not being selected it could not make to the inclusion, *but* there is an inconsistant behaviour since the text node "Third " is also later in document order from para[3] and is included... I think XInclude should recurse from the selection provided by XPointer so the behaviour you expect is the right one, but it's not that obvious. Daniel
I was reading the W3C spec before submint the bug, but isn't very clear what is the espected behavoir when the end point is a tag. Should the end tag be included with all their content? or should the point before the < of the end tag be used as the end of range (not including the end tag at all)? I don't care what of both ways is used, but the current bahavoir of the range-to function in libxml2 is unusable. I also noticed that the xml:base attributes isn't added to the resulting tree using range-to function.
*** Bug 355373 has been marked as a duplicate of this bug. ***
From near the top of the xpointer() draft: "The range-to function may be applied with a context location of any location type, and identifies a range whose start-point is start-point of the context location, and whose end-point is the end-point of the location identified by the function's argument." * start-point(/sect1/para[1]) is node-point 0 in the first <para/>, which means the point right before the text node in the <para/>. 1st <para/>: <para>0text()1</para> ^ ^ * end-point(/sect1/para[3]) is the last node-point (=3) in the third <para/>, which means the node-point right before the closing </para> tag. 3rd <para/>: <para>0text()1<quote/>2text()3</para> ^ ^ ^ ^ The range begins inside the 1st <para/> and ends inside the 3rd <para/>, so it looks like this: [not in range <para>]First Paragraph</para> <para>Second <quote>Paragraph</quote></para> <para>Third <quote>Paragraph</quote>[</para> not in range] And now XInclude kicks in, with its definition of range locations: http://www.w3.org/TR/xinclude/#ranges "[Definition: An information item is said to be partially selected by a range if it contains only the starting point of the range, or only the ending point of the range.]" Thus, the 1st and the 3rd <para/> are partially selected. "The set of top-level included items is the union, in document order with duplicates removed, of the information items either selected or partially selected by the range." Thus, the top-level included items are: /sect1/para[1] (partially selected) /sect1/para[2] (selected) /sect1/para[3] (partially selected) The desired result should therefore be: <?xml version="1.0"?> <sect1> <title>XInclude Test</title> <para>First Paragraph</para> <para>Second <quote>Paragraph</quote></para> <para>Third <quote>Paragraph</quote></para> </sect1> HTH. I realise that XPointer is not the developers' priority, but given that xmllint is the only tool that supports so much of the spec, it would be so very good to see the bugs removed. Good luck and thanks.
I'd vote for this one if voting would be enabled. This is really bad, as xmllint is the only tool I know that supports the xpointer scheme and range-to is unusable as it is, while it could be sooo useful.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.