After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 343302 - appending element with xml:id attribute to new document breaks attribute
appending element with xml:id attribute to new document breaks attribute
Status: RESOLVED INVALID
Product: libxml2
Classification: Platform
Component: general
2.6.x
Other All
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2006-05-29 16:23 UTC by Stefan Behnel
Modified: 2006-05-30 19:52 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Stefan Behnel 2006-05-29 16:23:22 UTC
Please describe the problem:
xmlReconciliateNs seems to be broken for the 'xml' namespace
(http://www.w3.org/XML/1998/namespace)


Steps to reproduce:
1. create two element nodes
2. add an xml:id attribute to one of them
3. add it as child of the other element
4. call xmlReconciliateNs on the added child to fix its namespaces


Actual results:
Name and namespace of the xml:id attribute will mutate into unreadable strings.

Expected results:
The attribute should stay unchanged.

Does this happen every time?
yes

Other information:
Comment 1 Rob Richards 2006-05-30 01:00:23 UTC
Can you supply some code that produces the problem? I havent been able to reproduce this.
Comment 2 Stefan Behnel 2006-05-30 05:24:21 UTC
Ok, I'll try to come up with code. I think the problem may be related to the shared dictionary that we are using here. Both documents have a reference to the same dictionary. The document of the appended element is freed after the transition.

Note also that exactly the same order of API calls works nicely for any other namespace I tried, so it may well be a problem in the initial dictionary setup.
Comment 3 kbuchcik 2006-05-30 08:38:49 UTC
Two hints:
1) have you tried to use xmlDOMWrapReconcileNamespaces() instead
  of xmlReconciliateNs()? The latter is not working correctly in every
  case, but don't ask me what it does wrong; I did already look too often
  into that function; it should be in the mailing list archive somewhere.
2) Stefan, if lxml shares just one string dict, then be aware that
  the dict is not threadsafe with respect to adding/lookup of strings.
  So one dict per thread/fiber/whatever is recommended.
Comment 4 kbuchcik 2006-05-30 08:43:41 UTC
Another hint: xmlNs->href and xmlNs->prefix is not put in dicts in
Libxml2; i.e., it is always xmlStrdup()ed.
Comment 5 kbuchcik 2006-05-30 08:52:05 UTC
Ah, and another one: IIRC, lxml is using those temporary fake xmlDoc(s),
in some cases, right? This might be the source of the error you observe,
since the XML namespace is stored (normally) internally in xmlDoc->oldNs
(since it is an implicit namespace declaration); you might end up
referencing an already freed (together with the xmlDoc) xmlNs from your
attribute.
Comment 6 Stefan Behnel 2006-05-30 09:34:36 UTC
Kasimier, thanks for the comments.

Starting with comment #5: Fake documents are not used here, so that's not an issue. They are only used for deep copying documents and for running things like XSLT and XPath on them. And fake documents are created by shallow copying the original document node, so I hope that does all that's needed (except for children etc.).

Regarding comment #3: I tried xmlDOMWrapReconcileNamespaces() and it does not fix the problem. Anyway, I don't like the sentence "This function is in a experimental state." in its documentation. Namespace fixing is a crucial operation in lxml. Also, we try to stay compatible with libxml2 2.6.16, so any newer fixes are of limited help. BTW, threading is currently declared non-working in lxml, mainly because of the GIL.

Comment #4 makes it unlikely that this is a dictionary issue, so I still feel tempted to blame xmlReconciliateNs.

Thanks,
Stefan
Comment 7 Stefan Behnel 2006-05-30 09:58:06 UTC
I think I found the problem, it's in xmlSearchNsByHref (as Kasimier suggested in comment #5). lxml calls this function to find the namespace and libxml2 happily returns xmlDoc->oldNs. We then set this as the namespace of the new attribute and when the element is moved and the original document freed, oldNS is freed with it.

So, a possible work-around would be to check the result of xmlSearchNsByHref if it's identical to doc->oldNs and copy the namespace in that case.

Is this the right thing to do or is there a better work-around?
Comment 8 kbuchcik 2006-05-30 10:17:33 UTC
You mean declaring the XML namespace locally on the element holding
the attribute? Yes, this should work, although it will add to lxmls
ns-decl pollution; but I don't see an easier way to workaround the problem.

Normally I use xmlDOMWrapAdoptNode() if moving trees to other docs.
But this is also a new function. Ah, and its also marked "experimental" -
so danger ahead ;-)

Regarding xmlDOMWrapReconcileNamespaces(): I don't like "This function
is in a experimental state." either :-) I think we should remove this
comment; obviously this just hinders it from being used (and tested).
The tests for this are in Libxml2's file tree in /test/namespaces/reconcile.

Comment 9 kbuchcik 2006-05-30 10:19:43 UTC
You could also try to declare in on the document element, so it will
be in scope for all other nodes as well; this will avoid declaring
it on every element.
Comment 10 Stefan Behnel 2006-05-30 10:51:49 UTC
No, I don't think this would work. Even if it's declared on the document, xmlSearchNsByHref will still return doc->oldNs when asked. That's special cased before any other test. So we would end up redefining the XML namespace over and over. But then, that's not different from what we'd get when we define it on a node...

(A question related to that: why doesn't xmlReconciliateNs always declare the namespaces on the document node if no in-between parent declared the respective prefix differently? That would nicely remove any redundant namespace declarations from the document /and/ reduce the overhead involved in moving elements between documents...)

Maybe lxml should just get its own _fixNamespaces() function that a) works and b) does what it's supposed to do...

Anyway, I'm still trying to get my head around why this is only a problem with the XML namespace and does not regard other namespaces defined on other parts of the old document...

Kasimier, I saw that xmlDOMWrapAdoptNode was added in 2.6.20. Do you have an idea since which version it is in a usable state?
Comment 11 kbuchcik 2006-05-30 11:18:44 UTC
Ah, true, the "xml" prefix is special cased.

Regarding xmlReconciliateNs(): It never really worked; pity that
Daniel didn't wrote it, so we could blame him :-)
I hit the issues of that function in:
http://mail.gnome.org/archives/xml/2003-April/msg00162.html

A good namespace fixup it important - and interesting to implement;
I bet lxml would gain much if you would try to write a made-to-measure
fixup for lxml's namespace semantics.

See xmlDOMWrapReconcileNamespaces() for how I tried to make it
work with our Delphi DOM wrapper in conjunction with Libxml2.

W3C DOM Level 3 namespace normalization:
http://www.w3.org/TR/DOM-Level-3-Core/namespaces-algorithms.html

I guess only the XML namespace is producing problems, since it is the
only one which is declared implicitely on the xmlDoc, all other
ns-decls are in the tree.
Although, if you move a part of the tree, which has references to
ns-decls outside of the removed branch, then you're in danger of
having stale references as well; but I'm sure you're aware of this.

Regarding xmlDOMWrapAdoptNode(): I don't know. We don't adopt much
in our Delphi apps, so it might be not well tested. IIRC, lxml is
"polluted" with ns-decls; this might also slow down the adoption
mechanism, since it keeps track (xmlNsMapItem) of all ns-decls in
scope for the current node to be processed. But it's optimized
for the case where the string dicts of docs are shared, so the
strings need not be put in the destination doc. The funcion is
also used in the tests at /test/namespaces/reconcile.

Comment 12 Rob Richards 2006-05-30 11:54:22 UTC
I was finally able to re-produce this to a point now when trying to append from one document to another. The problem I run into has nothing to do with xmlReconciliateNs(). After reconciliation, the namespace is properly fixed to use the one from oldNS on the correct document. However, when the old document is destroyed, the attribute name is freed due to the dictionary being freed, but the namespace is still correctly intact. Since you are sharing a dictionary between the 2 documents are you making sure that you are not free'ing it when destroying the old document?
Comment 13 kbuchcik 2006-05-30 12:04:43 UTC
Is the attr->doc set to the new doc when moving the attribute
from one doc to another?
Comment 14 kbuchcik 2006-05-30 12:10:57 UTC
Stefan, another reason might be what you give xmlReconcileNs() as the
argument:
if ((node == NULL) || (node->type != XML_ELEMENT_NODE)) return(-1);
So it will do nothing in case you fed it with an attribute.

Maybe a xmlSearchNsByHref() for attr->ns on the new owner element
will suffice.
Comment 15 Stefan Behnel 2006-05-30 14:33:47 UTC
We don't feed attributes to Reconcil and we don't adapt doc pointers ourselves. As far as I understand, Reconcil adapts attributes as well as elements, so there should be no reason to handle attributes by hand. And AddChild calls SetTreeDoc, so there shouldn't be any problems from that side either.

But I'll look into the dict business. Although we use xmlDictReference everywhere, so I wouldn't know why it should disappear....
Comment 16 Stefan Behnel 2006-05-30 19:45:49 UTC
I found it. It was a well hidden, ancient bug in lxml (not libxml2). In rare cases, the document was freed by Python's garbage collection /before/ we called xmlReconciliateNs, which then obviously failed to read the namespaces. It is also not related to the XML namespace, it just happened to show there as that namespace sits (or rather: sat) on the document.

Sorry for the confusion and thanks for all the help.
Comment 17 kbuchcik 2006-05-30 19:52:26 UTC
Great you found the bug! The good side of this is that you might have
found appetite for writing a namespace normalization mechanism.