After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 109564 - XML attribute normalization not done for SAX
XML attribute normalization not done for SAX
Status: VERIFIED FIXED
Product: libxml2
Classification: Platform
Component: general
2.5.2
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
Daniel Veillard
Depends on:
Blocks:
 
 
Reported: 2003-03-30 22:32 UTC by Dave Beckett
Modified: 2009-08-15 18:40 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Dave Beckett 2003-03-30 22:32:04 UTC
This is with 2.5.4 and with libxml2 in CVS.

According to http://www.w3.org/TR/REC-xml#AVNormalize
all the attribute values must be normalized before returning to
the app.

As far as I can tell, this isn't done for the SAX API
as given by the callback startElement.

To test, build testSAX and do:
./testSAX test/c14n/with-comments/example-4.xml
and you can see that you get:

SAX.startElement(norm, attr=' '    
	   ' ')
SAX.endElement(norm)

where the spaces and newlines aren't normalized after '

(Aside: these C14N tests don't match the ones in the C14N REC,
they look older)
Comment 1 Dave Beckett 2003-03-30 23:08:03 UTC
Later ... this can't be fixed at the application level since
the type of the attribute (CDATA, ID, ...) is unknown once it
passes the SAX API.  Since only certain types get this
normalization, it can't be fixed above the library level.

If this helps to encourage you: expat gets it right :)
Comment 2 Daniel Veillard 2003-03-31 09:20:17 UTC
Right, the SAX pseudo API makes an horrible mess because it
doesn't allow to preserve entities references in attributes.
Which is something I wanted to do for libxml which is an editing
toolkit. SAX doesn't exist as a reliable C API. It's provided
for "compatibility only" within libxml2 bacause there is NO SAX
API for C. So I have no interest in fixing problems at that level
honnestly.
The XMLReader streaming interface will get those right.
You can also try to enable entity substitution to get a behaviour
similar to expat one, but this also mean the parser will fetch 
external subset.
The fact that "expat gets it right" also mean that no toolkit
based on expat can save back entities from attribute values,
and honnestly I don't consider this a feature.
For me SAX is an horribly broken API, I don't claim full conformance
to it because it's not possible,

Daniel
Comment 3 Daniel Veillard 2003-03-31 10:36:57 UTC
Okay looking at it anyway. The output from testSAX can't be trusted
as is. Running xmllint --noent test/c14n/with-comments/example-4.xml
under GDB the following is received:

Breakpoint 1, startElement (ctx=0x8120628, fullname=0x8130420 "norm",
    atts=0x8130478) at SAX.c:1255
1255        xmlParserCtxtPtr ctxt = (xmlParserCtxtPtr) ctx;
(gdb) p atts[0]
$1 = (const xmlChar *) 0x81306d0 "attr"
(gdb) p atts[1]
$2 = (const xmlChar *) 0x8130700 " '    \r\n\t   ' "
(gdb)

  \r\n\t must not be "normalized" because they appeared as 
character references in the serialization, and this explicitely 
to bypass that layer:
   <norm attr=' &apos;   &#x20;&#13;&#xa;&#9;   &apos; '/>

  So what do you mean by "get it right" ?

Daniel
Comment 4 Daniel Veillard 2003-09-11 14:52:20 UTC
Okay this should be fixed in CVS as I'm migrating to SAX2,
attribute type is looked at before the callback and normalization
is now done before the callback.
Aleksey, I'm Cc'ing you on that bug report because the change
affects two tests:
test/c14n/with-comments/example-4.xml
test/c14n/without-comments/example-4.xml

  it fixes a normalization problem 
<normId id=' &apos;   &#x20;&#13;&#xa;&#9;   &apos; '/>

  the value was wrongly normalized as
    "' &#13;&#xa;&#9; '"
  instead of
    "'  &#13;&#xa;&#9; '"
i.e. it was removing the space induced by &#20; and that's just wrong
so the result of the C14Ntests is now slightly different for those
two tests,

  All this should be fixed in CVS now,

Daniel
Comment 5 Aleksey Sanin 2003-09-11 16:17:23 UTC
Thanks, Daniel! These sounds good to me. 

Dave, you said that the C14N tests in LibXML2 are not correct. But the
W3C C14N/ExcC14N interop tests were not changed for quite a long time
and I just checked the web page and file names seems to be the same as
I have used. I had to slightly tweak tests before putting them in
LibXML2 because original tests have used signatures a lot. But the
original signature tests are part of xmlsec package anyway. Can you
explain what did you mean? 
Comment 6 Daniel Veillard 2003-10-21 12:29:46 UTC
This should be fixed in release libxml2-2.6.0,
                                                                     
          
  thanks,
                                                                     
          
Daniel