GNOME Bugzilla – Bug 343302
appending element with xml:id attribute to new document breaks attribute
Last modified: 2006-05-30 19:52:26 UTC
Please describe the problem: xmlReconciliateNs seems to be broken for the 'xml' namespace (http://www.w3.org/XML/1998/namespace) Steps to reproduce: 1. create two element nodes 2. add an xml:id attribute to one of them 3. add it as child of the other element 4. call xmlReconciliateNs on the added child to fix its namespaces Actual results: Name and namespace of the xml:id attribute will mutate into unreadable strings. Expected results: The attribute should stay unchanged. Does this happen every time? yes Other information:
Can you supply some code that produces the problem? I havent been able to reproduce this.
Ok, I'll try to come up with code. I think the problem may be related to the shared dictionary that we are using here. Both documents have a reference to the same dictionary. The document of the appended element is freed after the transition. Note also that exactly the same order of API calls works nicely for any other namespace I tried, so it may well be a problem in the initial dictionary setup.
Two hints: 1) have you tried to use xmlDOMWrapReconcileNamespaces() instead of xmlReconciliateNs()? The latter is not working correctly in every case, but don't ask me what it does wrong; I did already look too often into that function; it should be in the mailing list archive somewhere. 2) Stefan, if lxml shares just one string dict, then be aware that the dict is not threadsafe with respect to adding/lookup of strings. So one dict per thread/fiber/whatever is recommended.
Another hint: xmlNs->href and xmlNs->prefix is not put in dicts in Libxml2; i.e., it is always xmlStrdup()ed.
Ah, and another one: IIRC, lxml is using those temporary fake xmlDoc(s), in some cases, right? This might be the source of the error you observe, since the XML namespace is stored (normally) internally in xmlDoc->oldNs (since it is an implicit namespace declaration); you might end up referencing an already freed (together with the xmlDoc) xmlNs from your attribute.
Kasimier, thanks for the comments. Starting with comment #5: Fake documents are not used here, so that's not an issue. They are only used for deep copying documents and for running things like XSLT and XPath on them. And fake documents are created by shallow copying the original document node, so I hope that does all that's needed (except for children etc.). Regarding comment #3: I tried xmlDOMWrapReconcileNamespaces() and it does not fix the problem. Anyway, I don't like the sentence "This function is in a experimental state." in its documentation. Namespace fixing is a crucial operation in lxml. Also, we try to stay compatible with libxml2 2.6.16, so any newer fixes are of limited help. BTW, threading is currently declared non-working in lxml, mainly because of the GIL. Comment #4 makes it unlikely that this is a dictionary issue, so I still feel tempted to blame xmlReconciliateNs. Thanks, Stefan
I think I found the problem, it's in xmlSearchNsByHref (as Kasimier suggested in comment #5). lxml calls this function to find the namespace and libxml2 happily returns xmlDoc->oldNs. We then set this as the namespace of the new attribute and when the element is moved and the original document freed, oldNS is freed with it. So, a possible work-around would be to check the result of xmlSearchNsByHref if it's identical to doc->oldNs and copy the namespace in that case. Is this the right thing to do or is there a better work-around?
You mean declaring the XML namespace locally on the element holding the attribute? Yes, this should work, although it will add to lxmls ns-decl pollution; but I don't see an easier way to workaround the problem. Normally I use xmlDOMWrapAdoptNode() if moving trees to other docs. But this is also a new function. Ah, and its also marked "experimental" - so danger ahead ;-) Regarding xmlDOMWrapReconcileNamespaces(): I don't like "This function is in a experimental state." either :-) I think we should remove this comment; obviously this just hinders it from being used (and tested). The tests for this are in Libxml2's file tree in /test/namespaces/reconcile.
You could also try to declare in on the document element, so it will be in scope for all other nodes as well; this will avoid declaring it on every element.
No, I don't think this would work. Even if it's declared on the document, xmlSearchNsByHref will still return doc->oldNs when asked. That's special cased before any other test. So we would end up redefining the XML namespace over and over. But then, that's not different from what we'd get when we define it on a node... (A question related to that: why doesn't xmlReconciliateNs always declare the namespaces on the document node if no in-between parent declared the respective prefix differently? That would nicely remove any redundant namespace declarations from the document /and/ reduce the overhead involved in moving elements between documents...) Maybe lxml should just get its own _fixNamespaces() function that a) works and b) does what it's supposed to do... Anyway, I'm still trying to get my head around why this is only a problem with the XML namespace and does not regard other namespaces defined on other parts of the old document... Kasimier, I saw that xmlDOMWrapAdoptNode was added in 2.6.20. Do you have an idea since which version it is in a usable state?
Ah, true, the "xml" prefix is special cased. Regarding xmlReconciliateNs(): It never really worked; pity that Daniel didn't wrote it, so we could blame him :-) I hit the issues of that function in: http://mail.gnome.org/archives/xml/2003-April/msg00162.html A good namespace fixup it important - and interesting to implement; I bet lxml would gain much if you would try to write a made-to-measure fixup for lxml's namespace semantics. See xmlDOMWrapReconcileNamespaces() for how I tried to make it work with our Delphi DOM wrapper in conjunction with Libxml2. W3C DOM Level 3 namespace normalization: http://www.w3.org/TR/DOM-Level-3-Core/namespaces-algorithms.html I guess only the XML namespace is producing problems, since it is the only one which is declared implicitely on the xmlDoc, all other ns-decls are in the tree. Although, if you move a part of the tree, which has references to ns-decls outside of the removed branch, then you're in danger of having stale references as well; but I'm sure you're aware of this. Regarding xmlDOMWrapAdoptNode(): I don't know. We don't adopt much in our Delphi apps, so it might be not well tested. IIRC, lxml is "polluted" with ns-decls; this might also slow down the adoption mechanism, since it keeps track (xmlNsMapItem) of all ns-decls in scope for the current node to be processed. But it's optimized for the case where the string dicts of docs are shared, so the strings need not be put in the destination doc. The funcion is also used in the tests at /test/namespaces/reconcile.
I was finally able to re-produce this to a point now when trying to append from one document to another. The problem I run into has nothing to do with xmlReconciliateNs(). After reconciliation, the namespace is properly fixed to use the one from oldNS on the correct document. However, when the old document is destroyed, the attribute name is freed due to the dictionary being freed, but the namespace is still correctly intact. Since you are sharing a dictionary between the 2 documents are you making sure that you are not free'ing it when destroying the old document?
Is the attr->doc set to the new doc when moving the attribute from one doc to another?
Stefan, another reason might be what you give xmlReconcileNs() as the argument: if ((node == NULL) || (node->type != XML_ELEMENT_NODE)) return(-1); So it will do nothing in case you fed it with an attribute. Maybe a xmlSearchNsByHref() for attr->ns on the new owner element will suffice.
We don't feed attributes to Reconcil and we don't adapt doc pointers ourselves. As far as I understand, Reconcil adapts attributes as well as elements, so there should be no reason to handle attributes by hand. And AddChild calls SetTreeDoc, so there shouldn't be any problems from that side either. But I'll look into the dict business. Although we use xmlDictReference everywhere, so I wouldn't know why it should disappear....
I found it. It was a well hidden, ancient bug in lxml (not libxml2). In rare cases, the document was freed by Python's garbage collection /before/ we called xmlReconciliateNs, which then obviously failed to read the namespaces. It is also not related to the XML namespace, it just happened to show there as that namespace sits (or rather: sat) on the document. Sorry for the confusion and thanks for all the help.
Great you found the bug! The good side of this is that you might have found appetite for writing a namespace normalization mechanism.