GNOME Bugzilla – Bug 342591
Pattern (Regular Expression) Validation Not Working Properly
Last modified: 2021-07-05 13:25:33 UTC
Please describe the problem: I have a schema with datatype defined as a string that matches a certain pattern. Some of the patterns are validating, others are not when they should. Steps to reproduce: 1. load the xmlDoc 2. validate it against the specified schema 3. Actual results: Validation fails on some, but not all, of the patterns specified. Expected results: All of these patterns match the regular expression in the schema, and they should all validate. Does this happen every time? Yes Other information: where d is any digit 0-9, it should accept any comma separated list of d+, d+:d+, or d+:d+:d+ mixed in any which way, yet for some reason the 2 colon ("a:b:c") variety doesn't validate if either the second or third numbers have more than 1 digit. a:b:c and ab:c:d validates but a:bc:d and a:b:cd will not validate even though they should =================== pattern.xsd Schema file ======================= <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" xml:lang="en"> <xs:simpleType name="mypattern"> <xs:restriction base="xs:string"> <xs:pattern value="\d+(:\d+){0,2}(,\d+(:\d+){0,2})*"/> </xs:restriction> </xs:simpleType> <xs:element name="test"> <xs:complexType> <xs:sequence> <xs:element name="pattern" type="mypattern" maxOccurs="unbounded"> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> ===================== End of Schema ====================== ======================= Instance Document ======================== <?xml version="1.0" encoding="UTF-8"?> <test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="pattern.xsd"> <!-- All should work --> <!-- Works --> <pattern>100</pattern> <!-- Works --> <pattern>10:20</pattern> <!-- Works --> <pattern>1:2:3</pattern> <!-- Works --> <pattern>10:2:3</pattern> <!-- Doesn't Work --> <pattern>1:20:3</pattern> <!-- Doesn't Work --> <pattern>1:2:30</pattern> <!-- Doesn't Work --> <pattern>10:20:30</pattern> </test> ==================== End of Instance ==================
ERROR MESSAGES ---------------------------------- pi.xml:15: element pattern: Schemas validity error : Element 'pattern': [facet 'pattern'] The value '1:20:3' is not accepted by the pattern '\d+(:\d+){0,2}(,\d+(:\d+){0,2})*'. pi.xml:15: element pattern: Schemas validity error : Element 'pattern': '1:20:3' is not a valid value of the atomic type 'mypattern'. pi.xml:17: element pattern: Schemas validity error : Element 'pattern': [facet 'pattern'] The value '1:2:30' is not accepted by the pattern '\d+(:\d+){0,2}(,\d+(:\d+){0,2})*'. pi.xml:17: element pattern: Schemas validity error : Element 'pattern': '1:2:30' is not a valid value of the atomic type 'mypattern'. pi.xml:19: element pattern: Schemas validity error : Element 'pattern': [facet 'pattern'] The value '10:20:30' is not accepted by the pattern '\d+(:\d+){0,2}(,\d+(:\d+){0,2})*'. pi.xml:19: element pattern: Schemas validity error : Element 'pattern': '10:20:30' is not a valid value of the atomic type 'mypattern'.
I can reproduce this with the CVS HEAD. This should be a problem in the xmlregexp.c module.
Why wasn't this changed to COMFIRMED if the problem was reproduced?
My bad, I'm apparently not into bugzilla's policy and wasn't aware that I needed to change the status; I thought if someone's _working_ on the issues, _then_ it needs to be chaged to NEW. So thank's for the lesson :-)
It also seems to have a problem with 'or' in pattern values. The following pattern causes an error for any value supplied. <xs:restriction base="xs:string"> <xs:pattern value="([A-Za-z0-9]{2}|[A-Za-z0-9]{4})"/> </xs:restriction> element subject: Schemas validity error : Element 'subject': [facet 'pattern'] The value '90' is not accepted by the pattern ([A-Za-z0-9]{2}|[A-Za-z0-9]{4})'
(I posted here rather than created a new bug report....) I have problem with regex in RELAX NG validator, seems like it does not treat (any) escape sequence correctly if inside character class expression: bug.xml: <?xml version="1.0" encoding="UTF-8" ?> <bug> <reg1>A Z</reg1> <reg2>A Z</reg2> <reg3>AtZ</reg3> </bug> bug.rng: <?xml version="1.0" encoding="UTF-8" ?> <element name="bug" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <element name="reg1"> <data type="string"><param name="pattern">A\tZ</param></data> </element> <element name="reg2"> <data type="string"><param name="pattern">A[\t]Z</param></data> </element> <element name="reg3"> <data type="string"><param name="pattern">A[\t]Z</param></data> </element> </element> command: xmllint.exe bug.xml --relaxng bug.rng output: bug.xml:4: element reg2: Relax-NG validity error : Error validating datatype string bug.xml:4: element reg2: Relax-NG validity error : Element reg2 failed to validate content bug.xml fails to validate note: reg1 is correct, reg2 fails and it should not, reg3 passes and it should fail
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.