After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 134566 - xmlParseChunk and '>' symbol in attributes - can't parse!
xmlParseChunk and '>' symbol in attributes - can't parse!
Status: VERIFIED FIXED
Product: libxml2
Classification: Platform
Component: general
2.6.6
Other Windows
: Normal critical
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2004-02-16 21:38 UTC by abolgar
Modified: 2009-08-15 18:40 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
XML file (1.15 KB, text/plain)
2004-02-16 21:39 UTC, abolgar
Details

Description abolgar 2004-02-16 21:38:43 UTC
Try to parse the following XML file with xmllint with option -push:

xmllint -push message.xml

It does not work. Though, without -push it works fine. If you remove 
all '>' chars from attributes, it will work even with -push. 

Sincerely,
   Artyom.
---------------- XML file ----------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Invoice 
xmlns:ccts="urn:oasis:names:tc:ubl:CoreComponentParameters:1.0:0.70" 
xmlns:cct="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.70" 
xmlns:cat="urn:oasis:names:tc:ubl:CommonAggregateTypes:1.0:0.70" 
xmlns="urn:oasis:names:tc:ubl:Invoice:1.0:0.70">
   <cat:ReferencedOrder>
      <cat:SellersOrderID schemeID="pvalue-
>ReferencedOrder.SellersOrderID.schemeID" schemeAgencyID="pvalue-
>ReferencedOrder.SellersOrderID.schemeAgencyID" schemeVersionID="pvalue-
>ReferencedOrder.SellersOrderID.schemeVersionID" 
schemeAgencySchemeID="pvalue-
>ReferencedOrder.SellersOrderID.schemeAgencySchemeID" 
schemeAgencySchemeAgencyID="pvalue-
>ReferencedOrder.SellersOrderID.schemeAgencySchemeAgencyID" 
schemeDataURI="pvalue->ReferencedOrder.SellersOrderID.schemeDataURI" 
schemeURI="pvalue->ReferencedOrder.SellersOrderID.schemeURI" UID="pvalue-
>ReferencedOrder.SellersOrderID.UID" UIDRef="pvalue-
>ReferencedOrder.SellersOrderID.UIDRef" UIDRefs="pvalue-
&gt;ReferencedOrder.SellersOrderID.UIDRefs0" language="pvalue-
>ReferencedOrder.SellersOrderID.language">pvalue-
&gt;ReferencedOrder.SellersOrderID</cat:SellersOrderID>
   </cat:ReferencedOrder>
</Invoice>
Comment 1 abolgar 2004-02-16 21:39:53 UTC
Created attachment 24464 [details]
XML file
Comment 2 Daniel Veillard 2004-02-16 21:46:29 UTC
  the "following XML"

schemeID="pvalue->ReferencedOrder.SellersOrderID.schemeID"

  is not XML

'>' is forbidden in attribute value.

it is not XML, you cannot parse this with a conformant XML parser.
That's perfectly normal, see the XML spec at
  
   http://www.w3.org/TR/2004/REC-xml-20040204/#NT-AttValue

Daniel
Comment 3 abolgar 2004-02-17 16:25:09 UTC
Hi Daniel!

I am not agree with you, sorry :)
From the same spec you reference:
---------------------------------------
The ampersand character (&) and the left angle bracket (<) MUST NOT 
appear
in their literal form, except when used as markup delimiters, or 
within a
comment, a processing instruction, or a CDATA section. If they are 
needed
elsewhere, they MUST be escaped using either numeric character 
references or
the strings "&amp;" and "&lt;" respectively. The right angle bracket 
(>) MAY
be represented using the string "&gt;", and MUST, for compatibility, 
be
escaped using either "&gt;" or a character reference when it appears 
in the
string "]]>" in content, when that string is not marking the end of a 
CDATA
section.

In the content of elements, character data is any string of 
characters which
does not contain the start-delimiter of any markup and does not 
include the
CDATA-section-close delimiter, "]]>". In a CDATA section, character 
data is
any string of characters not including the CDATA-section-close 
delimiter,
"]]>".

To allow attribute values to contain both single and double quotes, 
the
apostrophe or single-quote character (') MAY be represented 
as "&apos;", and
the double-quote character (") as "&quot;".

Character Data
      [14]    CharData    ::=    [^<&]* - ([^<&]* ']]>' [^<&]*)
***

My reading on this is that only left angle bracket (<) or ampersand 
(&) MUST
be escaped.  While it is good practice to escape the other characters 
in the
set, it is not required and I have seen examples where they are not.  
I've found, EXPAT accepts the attributes with '>' in them and when I 
tested
with XMLSpy, it said the document was well-formed. Even LIBXML works 
with '>' in attribute, but not with xmlParseChunk. When I tried to 
put a
'<' in the attribute and re-validate with XMLSpy, it declared that to 
be an
error.

Artyom.
Comment 4 Daniel Veillard 2004-02-17 17:05:12 UTC
Okay, my mistake I confused '>' and '<' :-\
I know where the problem lies, I will try to get this fixed soon,

Daniel
Comment 5 Daniel Veillard 2004-02-18 14:53:34 UTC
Okay, there is a 2 line patch to parser.c needed, I just commited
it to CVS, I also added the test to the regression suite,

 thanks,

Daniel
Comment 6 Daniel Veillard 2004-03-25 11:19:46 UTC
This should be closed by release of libxml2-2.6.8,
                                                                                
  thanks,
                                                                                
Daniel