After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 570365 - Should normalize whitespace in namespace attributes
Should normalize whitespace in namespace attributes
Status: RESOLVED NOTABUG
Product: libxml2
Classification: Platform
Component: general
2.6.x
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2009-02-03 14:08 UTC by Ross Burton
Modified: 2009-08-12 09:43 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Ross Burton 2009-02-03 14:08:12 UTC
I have the following document:

<?xml version="1.0" encoding="utf-8"?>
<node xmlns=" urn:microsoft-com:wmc-1-0"/>

Which produces parser warnings:

test.xml:2: parser warning : xmlns:  urn:microsoft-com:wmc-1-0 not a valid URI

Now I was going to ask if the "recover" parser could handle this and strip leading and trailing whitespace from namespace attributes.  But from chatting with Dodji, this may actually be a valid bug in libxml2.  Quoting from bits of the XML spec:

"In a namespace declaration, the URI reference is the normalized value of the attribute"
"If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character."

Now I won't claim to be an XML expert, and generally get a headache when reading the specification, but it looks like libxml2 should be stripping leading/trailing whitespace from the attribute.
Comment 1 Zeeshan Ali 2009-02-03 14:22:30 UTC
BTW, fixing this bug will help us (GUPnP) to maintain IOP with MS UPnP stack, which seems to be way too ubiquitous than we would want it to be.
Comment 2 Daniel Veillard 2009-08-12 09:43:07 UTC
Looking at this this is actually wrong:

- the core spec for this is Namespaces in XML 1.0, see section 3 about
  declaring namespaces:
  http://www.w3.org/TR/REC-xml-names/#ns-decl

  "The attribute's normalized value MUST be either a URI reference — the  
   namespace name identifying the namespace — or an empty string."

  then "mormalized value" points to the associated part of the XML
  specification

- http://www.w3.org/TR/REC-xml/#AVNormalize
  the point 1/2/3 being applied the leading white spaces will still be
  preserved
  now the part about "If the attribute type is not CDATA" does not apply,
  because the attribute is CDATA by default unless the DTD specifies it
  is something else (and really for a namespace declaration attribute
  you really expect CDATA)

See also back in the first specification the part about comparisons:
  http://www.w3.org/TR/REC-xml-names/#NSNameComparison

[Definition: The two URIs are treated as strings, and they are identical if and only if the strings are identical, that is, if they are the same sequence of characters. ] The comparison is case-sensitive, and no %-escaping is done or undone. 

  so basically two elements
<node xmlns=" urn:microsoft-com:wmc-1-0"/>
and
<node xmlns="urn:microsoft-com:wmc-1-0"/>

  are *not* in the same namespace per the specification, the fact that
microsoft may accept that is actually a serious bug, and should be reported
to them (unless the document uses a DTD and that DTD states it's not CDATA
attributes but that sounds like an horrible interop issue with the document
then)

   So libxml2 does the right thing by warning about the problem IMHO,

Daniel