GNOME Bugzilla – Bug 674411
Regular expressions: caret handled incorrectly
Last modified: 2019-09-25 12:10:31 UTC
When entering a caret in the middle of a character class in a regular expression, the ranges following the caret in the character class are treated as negations. According to the spec, a caret should only negate the ranges if it is the FIRST character in the character class. Example: Schema, schema.xsd: <?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="testElement" type="test" nillable="false" /> <xs:simpleType name="test"> <xs:restriction base="xs:string"> <xs:pattern value="[a^b]"></xs:pattern> </xs:restriction> </xs:simpleType> </xs:schema> XML, to_validate.xml: <?xml version="1.0" encoding="utf-8"?> <testElement>b</testElement> ~$ xmllint --noout --schema schema.xsd to_validate.xml to_validate.xml:2: element testElement: Schemas validity error : Element 'testElement': [facet 'pattern'] The value 'b' is not accepted by the pattern '[a^b]'. to_validate.xml:2: element testElement: Schemas validity error : Element 'testElement': 'b' is not a valid value of the atomic type 'test'. to_validate.xml fails to validate Changing the "b" to an "a" causes the XML to validate.
As is mentioned in https://www.w3.org/TR/xmlschema11-2/#regex Note: For example, the string '[^X]' is ambiguous according the grammar rules, denoting either a character class consisting of a negative character group with 'X' as a member, or a positive character class with 'X' and '^' as members. The normative prose rule just given requires that the first interpretation be taken. '^b' means exclude b. "[a^b]" accepts any character except b. The value 'b' is accepted by the pattern '[a\^b]'.
Sorry to this, it's a bug. As is mentioned in https://www.w3.org/TR/xmlschema11-2/#regex If the first character in a charGroup is '^', this is taken as indicating that the charGroup starts with a negCharGroup. A posCharGroup can itself start with '^' but only when it appears within a negCharGroup, that is, when the '^' is preceded by another '^'. That means the value 'b' should be accepted by the pattern '[a^b]'.
*** This bug has been marked as a duplicate of bug 779751 ***