After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 674411 - Regular expressions: caret handled incorrectly
Regular expressions: caret handled incorrectly
Status: RESOLVED DUPLICATE of bug 779751
Product: libxml2
Classification: Platform
Component: regexp
2.7.8
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2012-04-19 16:42 UTC by Johan
Modified: 2019-09-25 12:10 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Johan 2012-04-19 16:42:16 UTC
When entering a caret in the middle of a character class in a regular expression, the ranges following the caret in the character class are treated as negations. According to the spec, a caret should only negate the ranges if it is the FIRST character in the character class.

Example:
Schema, schema.xsd:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="testElement" type="test" nillable="false" />
    <xs:simpleType name="test">
        <xs:restriction base="xs:string">
            <xs:pattern value="[a^b]"></xs:pattern>
        </xs:restriction>
    </xs:simpleType>
</xs:schema>

XML, to_validate.xml:
<?xml version="1.0" encoding="utf-8"?>
<testElement>b</testElement>

~$ xmllint --noout --schema schema.xsd to_validate.xml
to_validate.xml:2: element testElement: Schemas validity error : Element 'testElement': [facet 'pattern'] The value 'b' is not accepted by the pattern '[a^b]'.
to_validate.xml:2: element testElement: Schemas validity error : Element 'testElement': 'b' is not a valid value of the atomic type 'test'.
to_validate.xml fails to validate

Changing the "b" to an "a" causes the XML to validate.
Comment 1 zhouzhongyuan 2019-08-26 11:58:12 UTC
As is mentioned in https://www.w3.org/TR/xmlschema11-2/#regex

Note: For example, the string '[^X]' is ambiguous according the grammar rules, denoting either a character class consisting of a negative character group with 'X' as a member, or a positive character class with 'X' and '^' as members. The normative prose rule just given requires that the first interpretation be taken.

'^b' means exclude b. "[a^b]" accepts any character except b. The value 'b' is accepted by the pattern '[a\^b]'.
Comment 2 zhouzhongyuan 2019-08-27 12:25:57 UTC
Sorry to this, it's a bug. 

As is mentioned in https://www.w3.org/TR/xmlschema11-2/#regex

If the first character in a charGroup is '^', this is taken as indicating that the charGroup starts with a negCharGroup.  A posCharGroup can itself start with '^' but only when it appears within a negCharGroup, that is, when the '^' is preceded by another '^'.

That means the value 'b' should be accepted by the pattern '[a^b]'.
Comment 3 Nick Wellnhofer 2019-09-25 12:10:31 UTC

*** This bug has been marked as a duplicate of bug 779751 ***