GNOME Bugzilla – Bug 134566
xmlParseChunk and '>' symbol in attributes - can't parse!
Last modified: 2009-08-15 18:40:50 UTC
Try to parse the following XML file with xmllint with option -push: xmllint -push message.xml It does not work. Though, without -push it works fine. If you remove all '>' chars from attributes, it will work even with -push. Sincerely, Artyom. ---------------- XML file ---------------------------- <?xml version="1.0" encoding="UTF-8"?> <Invoice xmlns:ccts="urn:oasis:names:tc:ubl:CoreComponentParameters:1.0:0.70" xmlns:cct="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.70" xmlns:cat="urn:oasis:names:tc:ubl:CommonAggregateTypes:1.0:0.70" xmlns="urn:oasis:names:tc:ubl:Invoice:1.0:0.70"> <cat:ReferencedOrder> <cat:SellersOrderID schemeID="pvalue- >ReferencedOrder.SellersOrderID.schemeID" schemeAgencyID="pvalue- >ReferencedOrder.SellersOrderID.schemeAgencyID" schemeVersionID="pvalue- >ReferencedOrder.SellersOrderID.schemeVersionID" schemeAgencySchemeID="pvalue- >ReferencedOrder.SellersOrderID.schemeAgencySchemeID" schemeAgencySchemeAgencyID="pvalue- >ReferencedOrder.SellersOrderID.schemeAgencySchemeAgencyID" schemeDataURI="pvalue->ReferencedOrder.SellersOrderID.schemeDataURI" schemeURI="pvalue->ReferencedOrder.SellersOrderID.schemeURI" UID="pvalue- >ReferencedOrder.SellersOrderID.UID" UIDRef="pvalue- >ReferencedOrder.SellersOrderID.UIDRef" UIDRefs="pvalue- >ReferencedOrder.SellersOrderID.UIDRefs0" language="pvalue- >ReferencedOrder.SellersOrderID.language">pvalue- >ReferencedOrder.SellersOrderID</cat:SellersOrderID> </cat:ReferencedOrder> </Invoice>
Created attachment 24464 [details] XML file
the "following XML" schemeID="pvalue->ReferencedOrder.SellersOrderID.schemeID" is not XML '>' is forbidden in attribute value. it is not XML, you cannot parse this with a conformant XML parser. That's perfectly normal, see the XML spec at http://www.w3.org/TR/2004/REC-xml-20040204/#NT-AttValue Daniel
Hi Daniel! I am not agree with you, sorry :) From the same spec you reference: --------------------------------------- The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings "&" and "<" respectively. The right angle bracket (>) MAY be represented using the string ">", and MUST, for compatibility, be escaped using either ">" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section. In the content of elements, character data is any string of characters which does not contain the start-delimiter of any markup and does not include the CDATA-section-close delimiter, "]]>". In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, "]]>". To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') MAY be represented as "'", and the double-quote character (") as """. Character Data [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) *** My reading on this is that only left angle bracket (<) or ampersand (&) MUST be escaped. While it is good practice to escape the other characters in the set, it is not required and I have seen examples where they are not. I've found, EXPAT accepts the attributes with '>' in them and when I tested with XMLSpy, it said the document was well-formed. Even LIBXML works with '>' in attribute, but not with xmlParseChunk. When I tried to put a '<' in the attribute and re-validate with XMLSpy, it declared that to be an error. Artyom.
Okay, my mistake I confused '>' and '<' :-\ I know where the problem lies, I will try to get this fixed soon, Daniel
Okay, there is a 2 line patch to parser.c needed, I just commited it to CVS, I also added the test to the regression suite, thanks, Daniel
This should be closed by release of libxml2-2.6.8, thanks, Daniel