GNOME Bugzilla – Bug 120981
Would like limited NS normalisation to remove redundant NS declarations
Last modified: 2009-08-15 18:40:50 UTC
[Note: I'm raising this enhancement request as suggested in your response to bug id 120971. That bug related to libxslt, but I think this enhancement request applies more logically to libxml2.] I would like an API in libxml2 to do some simple cases of namespace normalisation. I realise that the problem in general is exceedingly difficult and not well-defined, but some common cases would be feasible and would be a great help. The specific case that I have is as follows. I have an XML document with a root element like: <root xmlns="http://foo.org"> No other element in the document so much as mentions namespaces -- they all simply inherit the root element's namespace -- and I would like to keep it that way for aesthetic and space-saving reasons. However, when I add an element to this document using createElementNS(), specifying the "http://foo.org" namespace, the element created gets that namespace declaration [unnecessarily] added to it, so I end up with this: <?xml version="1.0"?> <root xmlns="http://foo.org"> <elt xmlns="http://foo.org"/> </root> I would like to be able to remove the redundant namespace declaration there (or else have createElementNS() not add it in the first place when, as here, it isn't necessary), to leave me with this: <?xml version="1.0"?> <root xmlns="http://foo.org"> <elt/> </root> It would be a great help to have an API call to perform that kind of clean-up. At the moment (as explained in libxslt bug 120971), I am using an identity XSLT transformation to perform this clean-up for me, which is highly undesirable because of the way that XSLT affects CDATA sections. Cheers, - Steve
I've just been looking at "Canonical XML", and I realise that what I want is probably a subset of that. This could be relevant because Canonical XML is already supported by libxml2 via its xmlC14N*() functions. However, it is only the namespaces that I want canonicalised - not the other stuff that XML Canonicalisation does as well. The example given in my original request can actually be achieved with the existing xmlC14N*() functions, but this is no good for me more generally because of the effect that it also has on CDATA sections - namely, it converts them to TEXT. As you may have gathered from the related libxslt bug 120971, I want to keep CDATA sections unchanged. Perhaps what I'm after can be fairly cheaply achieved by providing an interface to part of the XML Canonicalisation process that removes redundant namespace declarations? Just a thought. - Steve
It was trivial to add as a new parser option in the new XML parser environment of the upcoming 2.6.0 release, I added this as a new xmllint option: paphio:~/XML -> xmllint tst.xml <?xml version="1.0"?> <root xmlns="http://foo.org"> <elt xmlns="http://foo.org"/> </root> paphio:~/XML -> xmllint --nsclean tst.xml <?xml version="1.0"?> <root xmlns="http://foo.org"> <elt/> </root> paphio:~/XML -> The new option is in CVS, Daniel
Thanks, Daniel! That'll be a great help to me. I haven't been able to test it myself yet because I can't build the current CVS version on my Win32 platform. Initially I get complaints about the four '@' characters in include/libxml/xmlversion.h ("error C2018: unknown character '0x40'"); if I delete those four characters (don't know if that's the right thing to do :-s) then it compiles OK but then complains about 9 unresolved externals when it comes to link: ===== debugXML.obj : error LNK2001: unresolved external symbol _xmlValidateDocument debugXML.obj : error LNK2001: unresolved external symbol _xmlValidateDtd debugXML.obj : error LNK2001: unresolved external symbol _xmlParseDTD debugXML.obj : error LNK2001: unresolved external symbol _xmlMemShow relaxng.obj : error LNK2001: unresolved external symbol _xmlValidateDocumentFinal xmlreader.obj : error LNK2001: unresolved external symbol _xmlValidatePushElement xmlreader.obj : error LNK2001: unresolved external symbol _xmlValidatePushCData xmlreader.obj : error LNK2001: unresolved external symbol _xmlValidatePopElement xmlschemastypes.obj : error LNK2001: unresolved external symbol _xmlValidateNotationUse binaries\libxml2.dll : fatal error LNK1120: 9 unresolved externals NMAKE : fatal error U1077: 'link.exe' : return code '0x460' ===== I'm sure that will all be fixed before the next release, so I'll give it a try then. Cheers, - Steve
This should be available in release libxml2-2.6.0, thanks, Daniel