After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 386013 - Node::find() does not return namespace nodes
Node::find() does not return namespace nodes
Status: RESOLVED OBSOLETE
Product: libxml++
Classification: Bindings
Component: General
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Christophe de Vienne
Christophe de Vienne
Depends on:
Blocks:
 
 
Reported: 2006-12-14 21:56 UTC by Max Kirillov
Modified: 2020-11-12 09:29 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
bug2.cc (1.03 KB, text/plain)
2007-02-05 14:53 UTC, Murray Cumming
Details

Description Max Kirillov 2006-12-14 21:56:22 UTC
The versions (debian etch):
libxml++2.6-2   2.14.0-0.1
libxml++2.6-dev 2.14.0-0.1
libxml2 2.6.26.dfsg-3
libxml2-dev     2.6.26.dfsg-3
libxml2-doc     2.6.26.dfsg-3

The example xml data and source code

max@dom_xpath$cat bug.xml                                    
<?xml version="1.0" encoding="UTF-8"?>
<el1 xmlns="http://example.com/ns1"
    xmlns:ns2="http://example.com/ns2">
    <ns2:el1>
    </ns2:el1>
</el1>
max@dom_xpath$cat bug.cc 
#include <libxml++/libxml++.h>
#include <iostream>
int main()
{
    xmlpp::DomParser parser("bug.xml");
    const xmlpp::Node* root = parser.get_document()->get_root_node();

    xmlpp::NodeSet nodes = root->find("/*/namespace::*");

    std::cout << nodes.size() << std::endl;
    std::cout << nodes[0] << " " << nodes[1] << " " << nodes[2] << std::endl;
    std::cout << nodes[0]->get_name() << std::endl;
}
max@dom_xpath$g++ `pkg-config --cflags --libs libxml++-2.6` -g -O0 bug.cc -o bug

The result:

max@dom_xpath$./bug 
3
0x8053870 0x8053870 0x8053870
zsh: segmentation fault (core dumped)  ./bug
max@dom_xpath$

note that all nodes items have the same value. If I call Node::cobj() instead of Node::get_name(), it returns 0x1 for all 3 items.

I suppose (did not look in the sources), that underlying libxml2 call allocates the resulting nodes not in the document, but in the xpath context. And this context is freed before the Node::find() returns.
Comment 1 Murray Cumming 2007-02-05 14:53:21 UTC
Created attachment 81933 [details]
bug2.cc

Yes. This revised test code seems to show that the Node::impl_ has a nonsense value.
Comment 2 Murray Cumming 2007-06-10 17:08:19 UTC
Unfortunately, I can't get valgrind to give me a useful report.

xmlXPathEval() in libxml does not seem to be using our on_libxml_construct() function (xmlRegisterNodeDefaultValue in libxml). CCing David Veillard in case he has some idea.
Comment 3 Murray Cumming 2010-06-13 21:16:23 UTC
I have _some_ explanation, but I don't have a general fix yet. See my comments in Node::find_impl here:
http://git.gnome.org/browse/libxml++/tree/libxml++/nodes/node.cc#n285I think libxml is abusing xmlNode::private_, but only sometimes.

I could fix this test code by assuming that the xmlNodeSet's xmlNode had the real xmlNode* in xmlNode::_private, but that breaks our existing example
http://git.gnome.org/browse/libxml++/tree/examples/dom_xpath
for which the xmlNode* is the real xmlNode* as we'd expect.

This really needs advice from the libxml maintainer.
Comment 4 Murray Cumming 2010-06-13 21:57:31 UTC
Actually, I see now that we must cast the xmlNode* sometimes because it is sometimes not really an xmlNode*. libxml's weird struct inheritance is really no fun sometimes.

The code now ignores these xmlNs objects, because it does not map to our C++ Node object and there's no obvious shared base. Or should it?:
http://www.xmlsoft.org/html/libxml-tree.html#xmlNs
Elsewhere we just deal with namespaces in terms of prefix and URI strings.

I wonder what result you would expect. I don't know xpath so I don't even know what this is asking for:
  xmlpp::NodeSet nodes = root->find("/*/namespace::*");

Would it be enough to add a find_namespaces() function, or is there some xpath that might return both nodes and namespaces? Is there some other API (for instance, in Java) that does this?
Comment 5 Max Kirillov 2010-06-14 07:01:12 UTC
First, I must say, that I have never been an XML guru, and currently I even do not do things which made me interested in the stuff, which resulted in finding this bug.

But, definitely, an c++ library must not segfault if passed some string parameter.

Then, as far as I rememeber and understand, "namespace" nodes is a valid request in xpath (or xquery?). At least "xpath" utility returns me a valid response:

max@max$xpath -e '/*/namespace::*' 1.xml 
Found 2 nodes in 1.xml:
-- NODE --
 xmlns="http://example.com/ns1"
-- NODE --
 xmlns:ns2="http://example.com/ns2"

I think the most correct solution is to keep request object (which is created inside that find() method) as long as returned objects is alive. For example, by keeping a shared pointer in the objects.
Comment 6 Murray Cumming 2010-06-14 07:37:03 UTC
The problem is that a namespace is not a node, even if the xpath utility lists it as a node.

Java seems to solve this by returning an Object - equivalent to a void* in C++, and requiring the return type to be specified as a parameter:
http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/xpath/XPath.html#evaluate%28java.lang.String,%20java.lang.Object,%20javax.xml.namespace.QName%29

C# (.Net)'s System.Xml.XPath seems to do the same:
http://msdn.microsoft.com/en-us/library/2c16b7x8.aspx

Presumably there is no xpath expression that could return both nodes and namespaces.
Comment 7 Murray Cumming 2010-06-14 07:39:05 UTC
This is no longer critical because the crash is fixed.
Comment 8 Max Kirillov 2010-06-14 15:23:35 UTC
> Presumably there is no xpath expression that could return both nodes and
namespaces

This is wrong. "/*/namespace::* | /", for example.
Comment 9 Murray Cumming 2010-06-14 15:48:45 UTC
I wonder how those Java and .Net APIs could handle that. They seem to return only one (or a set of the same) types, not a set of various types.
Comment 10 Kjell Ahlstedt 2011-12-19 12:50:21 UTC
(In reply to comment #6)
> The problem is that a namespace is not a node, even if the xpath utility
> lists it as a node.

According to http://www.w3.org/TR/xpath-datamodel/#Node, "XQuery 1.0 and
XPath 2.0 Data Model (XDM)" there are 7 kinds of nodes:
    document nodes (called root nodes in XPath 1.0)
    element nodes
    text nodes
    attribute nodes
    namespace nodes
    processing instruction nodes
    comment nodes

A namespace node is a node, according to the XPath standard.
libxml2 handles it very differently from other nodes.

> Java seems to solve this by returning an Object - equivalent to a void* in
> C++, and requiring the return type to be specified as a parameter:
> http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/xpath/XPath.html#evaluate%28java.lang.String,%20java.lang.Object,%20javax.xml.namespace.QName%29

The returnType input parameter does not request namespace or node, it requests
NUMBER, STRING, BOOLEAN, NODE, or NODESET. This bug applies only to NODESETs,
and each node in the returned nodeset is of a specific type. Among the listed
subinterfaces of the Node interface at
http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Node.html there is no
Namespace interface. I don't know how Java handles namespace nodes, or indeed
if it does handle them.

> C# (.Net)'s System.Xml.XPath seems to do the same:
> http://msdn.microsoft.com/en-us/library/2c16b7x8.aspx

In C# there is an XPathNodeType Enumeration,
http://msdn.microsoft.com/en-us/library/system.xml.xpath.xpathnodetype.aspx,
with possible values Root, Element, Attribute, Namespace, Text,
SignificantWhitespace, Whitespace, ProcessingInstruction, Comment, and All.
Presumably this enumeration is used for indicating the type of node.


Ideally there should be a
  class Namespace : public Node
but the way namespace nodes are handled in libxml2 makes it difficult to add
such a class.

Most kinds of nodes are represented by an xmlNode struct. An attribute node is
an exception. It's represented by an xmlAttr struct. xmlAttr and xmlNode have
most elements in common, and it's possible to ignore the difference in many
xmlpp::Node methods.

A namespace node is represented by an xmlNs struct, which is very different
from xmlNode. All methods in the base class xmlpp::Node would be affected by
this difference.

An even more severe complication is that the copy semantics of xmlNs differs
from that of xmlNode. An xmlNodeSet contains a pointer to an array of xmlNode
pointers (xmlNode**).
When an xmlNode* really points to an xmlNode, that xmlNode is not owned by the
xmlNodeSet.
When an xmlNode* points to an xmlNs, that xmlNs is a copy which _is_ owned by
the xmlNodeSet. The functions used for making and freeing individual xmlNs
instances in an xmlNodeSet are not public. They are local to libxml2/xpath.c.
It would be difficult for libxml++ to take over ownership of the xmlNs nodes in
an xmlNodeSet. xmlpp::Node::find() always frees the xmlNodeSet by a call to
xmlXPathFreeObject(), which frees the xmlNs instances in the xmlNodeSet.

The present solution to the problem is to skip the namespace nodes when
xmlpp::Node::find() converts the xmlNodeSet to an xmlpp::NodeSet.

If we want to keep the namespace nodes in xmlpp::NodeSet, one way would be to
let xmlpp::NodeSet be a class that stores the xmlNodeSet pointer and owns the
xmlNodeSet instance, i.e. calls xmlXPathFreeObject() in its destructor. That
would break API of course. And another question remains: Is it reasonable to
make a Namespace class as a subclass of Node, when very little of the logic in
Node applies to Namespace?


I changed the title of this bug, since the segfault was fixed on 2010-06-13.
Comment 11 Murray Cumming 2012-01-30 10:11:57 UTC
> The present solution to the problem is to skip the namespace nodes when
> xmlpp::Node::find() converts the xmlNodeSet to an xmlpp::NodeSet.

That would prevent the crash, I guess, so it's the minimum that we should do.

And then:

> Ideally there should be a
>  class Namespace : public Node

I guess that we should focus on doing this somewhow.
Comment 12 André Klapper 2020-11-12 09:29:08 UTC
libxml++ has moved to https://github.com/libxmlplusplus/libxmlplusplus

If this ticket is still valid in a recent version of libxml++, then please create a ticket at https://github.com/libxmlplusplus/libxmlplusplus/issues - thanks a lot!