GNOME Bugzilla – Bug 570365
Should normalize whitespace in namespace attributes
Last modified: 2009-08-12 09:43:07 UTC
I have the following document: <?xml version="1.0" encoding="utf-8"?> <node xmlns=" urn:microsoft-com:wmc-1-0"/> Which produces parser warnings: test.xml:2: parser warning : xmlns: urn:microsoft-com:wmc-1-0 not a valid URI Now I was going to ask if the "recover" parser could handle this and strip leading and trailing whitespace from namespace attributes. But from chatting with Dodji, this may actually be a valid bug in libxml2. Quoting from bits of the XML spec: "In a namespace declaration, the URI reference is the normalized value of the attribute" "If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character." Now I won't claim to be an XML expert, and generally get a headache when reading the specification, but it looks like libxml2 should be stripping leading/trailing whitespace from the attribute.
BTW, fixing this bug will help us (GUPnP) to maintain IOP with MS UPnP stack, which seems to be way too ubiquitous than we would want it to be.
Looking at this this is actually wrong: - the core spec for this is Namespaces in XML 1.0, see section 3 about declaring namespaces: http://www.w3.org/TR/REC-xml-names/#ns-decl "The attribute's normalized value MUST be either a URI reference — the namespace name identifying the namespace — or an empty string." then "mormalized value" points to the associated part of the XML specification - http://www.w3.org/TR/REC-xml/#AVNormalize the point 1/2/3 being applied the leading white spaces will still be preserved now the part about "If the attribute type is not CDATA" does not apply, because the attribute is CDATA by default unless the DTD specifies it is something else (and really for a namespace declaration attribute you really expect CDATA) See also back in the first specification the part about comparisons: http://www.w3.org/TR/REC-xml-names/#NSNameComparison [Definition: The two URIs are treated as strings, and they are identical if and only if the strings are identical, that is, if they are the same sequence of characters. ] The comparison is case-sensitive, and no %-escaping is done or undone. so basically two elements <node xmlns=" urn:microsoft-com:wmc-1-0"/> and <node xmlns="urn:microsoft-com:wmc-1-0"/> are *not* in the same namespace per the specification, the fact that microsoft may accept that is actually a serious bug, and should be reported to them (unless the document uses a DTD and that DTD states it's not CDATA attributes but that sounds like an horrible interop issue with the document then) So libxml2 does the right thing by warning about the problem IMHO, Daniel