GNOME Bugzilla – Bug 386013
Node::find() does not return namespace nodes
Last modified: 2020-11-12 09:29:08 UTC
The versions (debian etch): libxml++2.6-2 2.14.0-0.1 libxml++2.6-dev 2.14.0-0.1 libxml2 2.6.26.dfsg-3 libxml2-dev 2.6.26.dfsg-3 libxml2-doc 2.6.26.dfsg-3 The example xml data and source code max@dom_xpath$cat bug.xml <?xml version="1.0" encoding="UTF-8"?> <el1 xmlns="http://example.com/ns1" xmlns:ns2="http://example.com/ns2"> <ns2:el1> </ns2:el1> </el1> max@dom_xpath$cat bug.cc #include <libxml++/libxml++.h> #include <iostream> int main() { xmlpp::DomParser parser("bug.xml"); const xmlpp::Node* root = parser.get_document()->get_root_node(); xmlpp::NodeSet nodes = root->find("/*/namespace::*"); std::cout << nodes.size() << std::endl; std::cout << nodes[0] << " " << nodes[1] << " " << nodes[2] << std::endl; std::cout << nodes[0]->get_name() << std::endl; } max@dom_xpath$g++ `pkg-config --cflags --libs libxml++-2.6` -g -O0 bug.cc -o bug The result: max@dom_xpath$./bug 3 0x8053870 0x8053870 0x8053870 zsh: segmentation fault (core dumped) ./bug max@dom_xpath$ note that all nodes items have the same value. If I call Node::cobj() instead of Node::get_name(), it returns 0x1 for all 3 items. I suppose (did not look in the sources), that underlying libxml2 call allocates the resulting nodes not in the document, but in the xpath context. And this context is freed before the Node::find() returns.
Created attachment 81933 [details] bug2.cc Yes. This revised test code seems to show that the Node::impl_ has a nonsense value.
Unfortunately, I can't get valgrind to give me a useful report. xmlXPathEval() in libxml does not seem to be using our on_libxml_construct() function (xmlRegisterNodeDefaultValue in libxml). CCing David Veillard in case he has some idea.
I have _some_ explanation, but I don't have a general fix yet. See my comments in Node::find_impl here: http://git.gnome.org/browse/libxml++/tree/libxml++/nodes/node.cc#n285I think libxml is abusing xmlNode::private_, but only sometimes. I could fix this test code by assuming that the xmlNodeSet's xmlNode had the real xmlNode* in xmlNode::_private, but that breaks our existing example http://git.gnome.org/browse/libxml++/tree/examples/dom_xpath for which the xmlNode* is the real xmlNode* as we'd expect. This really needs advice from the libxml maintainer.
Actually, I see now that we must cast the xmlNode* sometimes because it is sometimes not really an xmlNode*. libxml's weird struct inheritance is really no fun sometimes. The code now ignores these xmlNs objects, because it does not map to our C++ Node object and there's no obvious shared base. Or should it?: http://www.xmlsoft.org/html/libxml-tree.html#xmlNs Elsewhere we just deal with namespaces in terms of prefix and URI strings. I wonder what result you would expect. I don't know xpath so I don't even know what this is asking for: xmlpp::NodeSet nodes = root->find("/*/namespace::*"); Would it be enough to add a find_namespaces() function, or is there some xpath that might return both nodes and namespaces? Is there some other API (for instance, in Java) that does this?
First, I must say, that I have never been an XML guru, and currently I even do not do things which made me interested in the stuff, which resulted in finding this bug. But, definitely, an c++ library must not segfault if passed some string parameter. Then, as far as I rememeber and understand, "namespace" nodes is a valid request in xpath (or xquery?). At least "xpath" utility returns me a valid response: max@max$xpath -e '/*/namespace::*' 1.xml Found 2 nodes in 1.xml: -- NODE -- xmlns="http://example.com/ns1" -- NODE -- xmlns:ns2="http://example.com/ns2" I think the most correct solution is to keep request object (which is created inside that find() method) as long as returned objects is alive. For example, by keeping a shared pointer in the objects.
The problem is that a namespace is not a node, even if the xpath utility lists it as a node. Java seems to solve this by returning an Object - equivalent to a void* in C++, and requiring the return type to be specified as a parameter: http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/xpath/XPath.html#evaluate%28java.lang.String,%20java.lang.Object,%20javax.xml.namespace.QName%29 C# (.Net)'s System.Xml.XPath seems to do the same: http://msdn.microsoft.com/en-us/library/2c16b7x8.aspx Presumably there is no xpath expression that could return both nodes and namespaces.
This is no longer critical because the crash is fixed.
> Presumably there is no xpath expression that could return both nodes and namespaces This is wrong. "/*/namespace::* | /", for example.
I wonder how those Java and .Net APIs could handle that. They seem to return only one (or a set of the same) types, not a set of various types.
(In reply to comment #6) > The problem is that a namespace is not a node, even if the xpath utility > lists it as a node. According to http://www.w3.org/TR/xpath-datamodel/#Node, "XQuery 1.0 and XPath 2.0 Data Model (XDM)" there are 7 kinds of nodes: document nodes (called root nodes in XPath 1.0) element nodes text nodes attribute nodes namespace nodes processing instruction nodes comment nodes A namespace node is a node, according to the XPath standard. libxml2 handles it very differently from other nodes. > Java seems to solve this by returning an Object - equivalent to a void* in > C++, and requiring the return type to be specified as a parameter: > http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/xpath/XPath.html#evaluate%28java.lang.String,%20java.lang.Object,%20javax.xml.namespace.QName%29 The returnType input parameter does not request namespace or node, it requests NUMBER, STRING, BOOLEAN, NODE, or NODESET. This bug applies only to NODESETs, and each node in the returned nodeset is of a specific type. Among the listed subinterfaces of the Node interface at http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Node.html there is no Namespace interface. I don't know how Java handles namespace nodes, or indeed if it does handle them. > C# (.Net)'s System.Xml.XPath seems to do the same: > http://msdn.microsoft.com/en-us/library/2c16b7x8.aspx In C# there is an XPathNodeType Enumeration, http://msdn.microsoft.com/en-us/library/system.xml.xpath.xpathnodetype.aspx, with possible values Root, Element, Attribute, Namespace, Text, SignificantWhitespace, Whitespace, ProcessingInstruction, Comment, and All. Presumably this enumeration is used for indicating the type of node. Ideally there should be a class Namespace : public Node but the way namespace nodes are handled in libxml2 makes it difficult to add such a class. Most kinds of nodes are represented by an xmlNode struct. An attribute node is an exception. It's represented by an xmlAttr struct. xmlAttr and xmlNode have most elements in common, and it's possible to ignore the difference in many xmlpp::Node methods. A namespace node is represented by an xmlNs struct, which is very different from xmlNode. All methods in the base class xmlpp::Node would be affected by this difference. An even more severe complication is that the copy semantics of xmlNs differs from that of xmlNode. An xmlNodeSet contains a pointer to an array of xmlNode pointers (xmlNode**). When an xmlNode* really points to an xmlNode, that xmlNode is not owned by the xmlNodeSet. When an xmlNode* points to an xmlNs, that xmlNs is a copy which _is_ owned by the xmlNodeSet. The functions used for making and freeing individual xmlNs instances in an xmlNodeSet are not public. They are local to libxml2/xpath.c. It would be difficult for libxml++ to take over ownership of the xmlNs nodes in an xmlNodeSet. xmlpp::Node::find() always frees the xmlNodeSet by a call to xmlXPathFreeObject(), which frees the xmlNs instances in the xmlNodeSet. The present solution to the problem is to skip the namespace nodes when xmlpp::Node::find() converts the xmlNodeSet to an xmlpp::NodeSet. If we want to keep the namespace nodes in xmlpp::NodeSet, one way would be to let xmlpp::NodeSet be a class that stores the xmlNodeSet pointer and owns the xmlNodeSet instance, i.e. calls xmlXPathFreeObject() in its destructor. That would break API of course. And another question remains: Is it reasonable to make a Namespace class as a subclass of Node, when very little of the logic in Node applies to Namespace? I changed the title of this bug, since the segfault was fixed on 2010-06-13.
> The present solution to the problem is to skip the namespace nodes when > xmlpp::Node::find() converts the xmlNodeSet to an xmlpp::NodeSet. That would prevent the crash, I guess, so it's the minimum that we should do. And then: > Ideally there should be a > class Namespace : public Node I guess that we should focus on doing this somewhow.
libxml++ has moved to https://github.com/libxmlplusplus/libxmlplusplus If this ticket is still valid in a recent version of libxml++, then please create a ticket at https://github.com/libxmlplusplus/libxmlplusplus/issues - thanks a lot!