GNOME Bugzilla – Bug 542666
Duplicate resolution of CharRefs in regexp (XML Schema)
Last modified: 2008-08-26 07:46:11 UTC
Please describe the problem: When an XML Schemas contains a line like this: <xs:pattern value="[56;&#]"/> the regexp sould be understood as: [56;&#] i.e. a pattern that matches "5", "6", ";", "&" and "#". However, libxml2 tries to resolve the ampersand again and complains about the illegal CharRef in the schema: &#] Even worse, when the characters are given in a different order: <xs:pattern value="[&#65;]"/> the patters is first resolved to: [A] and then again to: [A] In that case libxml2 does not even complain! Instead, it silently checks against a completely different regexp! The reason for that strange behaviour seems to be grammar rule [19] of an old XML Schemas specification: http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#dt-charrange In the current specification this misguiding grammar rule has been removed: http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#dt-charrange The bug can be fixed by simply removing all code that handles grammar rule [19]. The attached patch does exactly that and adds two regression tests. Steps to reproduce: xmllint --noout --schema schema1.xsd doc.xml -- and -- xmllint --noout --schema schema2.xsd doc.xml Content of "doc.xml": <?xml version="1.0"?> <test>5</test> Content of "schema1.xsd": <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="test"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[56;&#]"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:schema> Content of "schema2.xsd": <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="test"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[&#65;]"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:schema> Actual results: regexp error : failed to compile: Char ref: expecting [0-9] schema1.xsd:6: element pattern: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}pattern': The value '[56;&#]' of the facet 'pattern' is not a valid regular expression. WXS schema schema1.xsd failed to compile --- and --- doc.xml:2: element test: Schemas validity error : Element 'test': [facet 'pattern'] The value '5' is not accepted by the pattern '[A]'. doc.xml:2: element test: Schemas validity error : Element 'test': '5' is not a valid value of the local atomic type. doc.xml fails to validate Expected results: doc.xml validates -- and -- doc.xml validates Does this happen every time? yes Other information:
Created attachment 114436 [details] [review] Solves the bug and adds two regression tests In addition to this patch, you need to create some empty files: touch result/schemas/regexp-char-ref_0_0.err touch result/schemas/regexp-char-ref_1_0.err
Very good patch and explanations, perfect ! Applied and commited to SVN, many thanks ! Daniel