After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 172638 - "&" in attribute value is returned from SAX as "&"
"&" in attribute value is returned from SAX as "&"
Status: VERIFIED NOTABUG
Product: libxml2
Classification: Platform
Component: general
2.6.17
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2005-04-04 19:18 UTC by Timothy M. Shead
Modified: 2009-08-15 18:40 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Timothy M. Shead 2005-04-04 19:18:10 UTC
Distribution/Version: Gentoo Linux x86

Parsing a document using the SAX API, an "&" entity in the value of an 
attribute is returned to the start element handler as "&".  This is a 
valid character entity for an ampersand, but I was expecting the parser to 
return "&" at this point.  I've confirmed that "<" and ">" in an 
attribute value are returned as "<" and ">" by the start element handler.

Sample input document:

<a>
        <b c="Tim &amp; Bev">Tim &amp; Bev</b>
        <b c="1 &lt; 2">1 &lt; 2</b>
        <b c="4 &gt; 3">4 &gt; 3</b>
        <b c="Don&apos;t!">Don&apos;t!</b>
        <b c="Call me &quot;Ishmael&quot;">Call me &quot;Ishmael&quot;</b>
</a>

Here is the data I'm getting from the parser (output using my own code for 
debugging purposes, this is obviously *not* valid XML):

<a>
        <b c="Tim &#38; Bev">Tim & Bev</b>
        <b c="1 < 2">1 < 2</b>
        <b c="4 > 3">4 > 3</b>
        <b c="Don't!">Don't!</b>
        <b c="Call me "Ishmael"">Call me "Ishmael"</b>
</a>

Note that this only affects &amp; and only if it is an attribute value - an 
ampersand in element text is returned to the caller as "&" as expected.

Tim Shead
tshead@k-3d.com
Comment 1 Daniel Veillard 2005-04-04 21:51:15 UTC
That's normal in libxml2. It a workaround for SAX limitation of not being
able to preserve entities. libxml2 was designed for *editing* i.e. entities
in attribute content had to be preserved. You don't want this, so turn on
entities substitution at the parser level.

Daniel
Comment 2 Timothy M. Shead 2005-04-05 06:37:13 UTC
Many thanks Daniel, that did the trick.

Cheers,
Tim