After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 690846 - regex issue ([a-z]+ interpreted correctly, ([a-z])+ incorrectly)
regex issue ([a-z]+ interpreted correctly, ([a-z])+ incorrectly)
Status: RESOLVED OBSOLETE
Product: libxml2
Classification: Platform
Component: regexp
2.7.3
Other Mac OS
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2012-12-29 02:01 UTC by C. M. Sperberg-McQueen
Modified: 2021-07-05 13:26 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
a sample schema defining types with patterns [a-z]+, ([a-z])+, and ([a-z]+) (630 bytes, application/octet-stream)
2012-12-29 02:01 UTC, C. M. Sperberg-McQueen
Details

Description C. M. Sperberg-McQueen 2012-12-29 02:01:23 UTC
Created attachment 232360 [details]
a sample schema defining types with patterns [a-z]+, ([a-z])+, and ([a-z]+)

See http://stackoverflow.com/questions/14060308/xmllint-validation-succeeds-on-invalid-input
for full account.

An XSD simple type using a pattern of [a-z]+ correctly rejects the empty string); a pattern of ([a-z])+ accepts the empty string.

I attach a schema document in case it's helpful.  A simple test of the following form suggests that the problem is visible only on the empty string, not on the other tests. 

  for string in "" "test" "test2012" "2012"; 
      do for gi in bare parens parens2; 
             do echo "................................................................"; 
             echo "<$gi>$string</$gi>"; 
             echo "<$gi>$string</$gi>" | xmllint --schema user5112.xsd -; 
             echo ; 
      done; 
  done
Comment 1 zhouzhongyuan 2019-09-02 12:36:06 UTC
When using function xmlFAEliminateSimpleEpsilonTransitions in line 1865, xmlRegexp.c, there is a step to reduce the internal representation of a regexp.
But may cause an error for this case:

State X has a transition from an atom to state Y. State Y is final state and has an epsilon transition to state X.

After reduce the internal representation of a regexp.

State X has a transition from an atom  to itself and is final.

In this case, the pattern accepts the empty string while it shouldn't be.

So the solution to this error is fix the reduce steps.

In line 1875: if (state->type == XML_REGEXP_UNREACH_STATE )  modified as follows: if (state->type == XML_REGEXP_UNREACH_STATE || state->type == XML_REGEXP_FINAL_STATE) 

Then the test results as follows:

root@oss-0017:~/libxml2-fix/libxml2-v2.9.9# ./testRegexp "([a-z])+" ""
Testing ([a-z])+:
: Fail

Results correctly shows ([a-z])+ correctly rejects the empty string.

Combined with issue 57, the test results of https://gitlab.gnome.org/GNOME/libxml2/issues/57 would be ok!

root@oss-0017:~/libxml2-fix/libxml2-v2.9.9# ./testRegexp --debug "(([a-zA-Z0-9_]+)(;[a-zA-Z0-9_]+))|" "a1;a2"
Testing (([a-zA-Z0-9_]+)(;[a-zA-Z0-9_]+))|:
 regexp: '(([a-zA-Z0-9_]+)(;[a-zA-Z0-9_]+))|'
6 atoms:
 00  atom: ranges once 4 entries
  range: charval a - z
  range: charval A - Z
  range: charval 0 - 9
  range: charval _ - _
 01  atom: subexpr once start 4 end 5
 02  atom: charval once char ;
 03  atom: ranges once 4 entries
  range: charval a - z
  range: charval A - Z
  range: charval 0 - 9
  range: charval _ - _
 04  atom: subexpr once start 0 end 10
 05  atom: subexpr once start 2 end 10
12 states:
 state: FINAL 0, 5 transitions:
  trans: removed
  trans: removed
  trans: removed
  trans: removed
  trans: atom 0, to 6
 state: NULL
 state: NULL
 state: NULL
 state: NULL
 state: NULL
 state: 6, 5 transitions:
  trans: removed
  trans: atom 0, to 6
  trans: removed
  trans: removed
  trans: char ; atom 2, to 9
 state: NULL
 state: NULL
 state: 9, 1 transitions:
  trans: atom 3, to 11
 state: NULL
 state: FINAL 11, 2 transitions:
  trans: removed
  trans: atom 3, to 11
0 counters:
a1;a2: Ok
Comment 2 GNOME Infrastructure Team 2021-07-05 13:26:17 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxml2/-/issues/

Thank you for your understanding and your help.