GNOME Bugzilla – Bug 747567
Plus quantifier (+) in patterns (regular expressions) sometimes matches 0 times
Last modified: 2021-07-05 13:24:31 UTC
Overview: In patterns, the + quantifier matches 0 or more times instead of 1 or more times when it follows a closing parenthesis. Example: "(a)+" Steps to Reproduce: Use the following xsd: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:simpleType name="fooType"> <xs:restriction base="xs:string"> <xs:pattern value="(a)+"/> </xs:restriction> </xs:simpleType> <xs:element name="foo" type="fooType"/> </xs:schema> Validate the following xml: <?xml version="1.0" encoding="UTF-8"?> <foo/> Actual Results: libxml validation accepts this xml Expected Results: libxml validation should reject the xml with the error: test.xml:1.7: Element 'foo': [facet 'pattern'] The value '' is not accepted by the pattern '(a)+'. test.xml:1.7: Element 'foo': '' is not a valid value of the atomic type 'fooType'. Build Date & Hardware: 2.9.1+dfsg1-3ubuntu4.4 Additional Information: Replacing the restriction pattern with any of the following works as expected, and rejects the xml: "a+" "(a){1,}" This is a highly simplified example of course, but it looks like if + follows a closing parenthesis, the + works as if it was a *, matching 0 or more times. I performed the same validation using MSXML, it properly recognizes that the empty <foo/> is invalid.
Created attachment 302457 [details] [review] Patch to forbid epsilon-reduction of final states When building the internal representation of a regexp, it is possible that a lot of empty transitions are created. Therefore there is a step to reduce them in the function xmlFAEliminateSimpleEpsilonTransitions. There is an error there for this case: * State 1 has a transition with an atom (in this case "a") to state 2. * State 2 is final and has an epsilon transition to state 1. After reduction it looked like: * State 1 has a transition with an atom (in this case "a") to itself and is final. In other words, the empty string is accepted when it shouldn't be. The attached patch skips the reduction step for final states. An alternative would be to insert or increment counters when reducing a final state, but this seemed error prone and unnecessary, since there aren't that many final states. I'm not affiliated with the libxml2 team, so use this at your own risk :-)
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.