GNOME Bugzilla – Bug 172638
"&" in attribute value is returned from SAX as "&"
Last modified: 2009-08-15 18:40:50 UTC
Distribution/Version: Gentoo Linux x86 Parsing a document using the SAX API, an "&" entity in the value of an attribute is returned to the start element handler as "&". This is a valid character entity for an ampersand, but I was expecting the parser to return "&" at this point. I've confirmed that "<" and ">" in an attribute value are returned as "<" and ">" by the start element handler. Sample input document: <a> <b c="Tim & Bev">Tim & Bev</b> <b c="1 < 2">1 < 2</b> <b c="4 > 3">4 > 3</b> <b c="Don't!">Don't!</b> <b c="Call me "Ishmael"">Call me "Ishmael"</b> </a> Here is the data I'm getting from the parser (output using my own code for debugging purposes, this is obviously *not* valid XML): <a> <b c="Tim & Bev">Tim & Bev</b> <b c="1 < 2">1 < 2</b> <b c="4 > 3">4 > 3</b> <b c="Don't!">Don't!</b> <b c="Call me "Ishmael"">Call me "Ishmael"</b> </a> Note that this only affects & and only if it is an attribute value - an ampersand in element text is returned to the caller as "&" as expected. Tim Shead tshead@k-3d.com
That's normal in libxml2. It a workaround for SAX limitation of not being able to preserve entities. libxml2 was designed for *editing* i.e. entities in attribute content had to be preserved. You don't want this, so turn on entities substitution at the parser level. Daniel
Many thanks Daniel, that did the trick. Cheers, Tim