After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 577676 - XPath 2.0-style regular expressions (also used in EXSLT, XSLT 2.0, XQuery 1.0)
XPath 2.0-style regular expressions (also used in EXSLT, XSLT 2.0, XQuery 1.0)
Status: RESOLVED OBSOLETE
Product: libxml2
Classification: Platform
Component: xpath
git master
Other All
: Normal enhancement
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2009-04-02 01:25 UTC by naesten
Modified: 2021-07-05 13:20 UTC
See Also:
GNOME target: ---
GNOME version: Unversioned Enhancement



Description naesten 2009-04-02 01:25:34 UTC
I know that libxml already provides code for XML Schema Datatypes-style regular expressions [XSD], but it would really be nice if it could also provide support for the slightly different form(s) of regular expressions used in JavaScript [ECMA-262 pages 129-145], EXSLT (which just references JavaScript's RegExp syntax -- see [EXSLT-regexp]), and XSLT 2.0/XPath 2.0/XQuery 1.0 [xpath-functions].

It appears that both types are closely based on Perl's regular expression syntax (big surprise!), but more closely in the JavaScript/XPath cases.

As far as the regular expressions themselves go, the XSD regexps [XSD] are missing:

 * "^" and "$" (regexps are implicitly anchored at the beginning and end of the string in XSD). These should be fairly trivial to support with the existing code.
 
 * "reluctant" versions of qualifiers (??, *?, +?, etc.) (Perl calls these non-"greedy"). These don't do anything interesting when just checking for matches, but are quite important when doing search or capturing sub-expression matches.

 * Sub-expression (group) capture of the text matched by parenthesized portions of the regular expression

 * Back references to captured text (to match it again)

The following JavaScript regexp features seem to be missing even from [xpath-functions]:

 * "\b" and "\B" assertions (for "at word boundary" and "not at word boundary", respectively, based on \w and \W)

 * "(?: ... )", "(?= ... )", and "(?! ... )" atoms.
   "(?: ... )" is just a capture-free version of "( ... )".
   "(?= ... )" and "(?! ... )" assert that the nested regexp does or does not (respectively) match the next portion of the input, but do not consume it.

Both [xpath-functions] and JavaScript have "flag characters" that affect the semantics of the match:

 * "i": ignore case in matching

 * "m": multi-line; ^ and $ match immediately after/before a newline, not just at beginning/end of string.

 * "g" (JavaScript only): complicated stateful behavior, probably not intended to apply to the EXSLT case.

 * "s" ([xpath-functions] only): single-line; allow "." to match newlines
 
 * "x" ([xpath-functions] only): remove (most) whitespace from the regexp before matching. (Doesn't support the # metacharacter here like perl does, though ...)
 

References

[ECMA-262]
    Standard ECMA-262: ECMAScript Language Specification. 
    http://www.ecma-international.org/publications/standards/Ecma-262.htm

[EXSLT-regexp]
    EXSLT - regexp:match
    http://www.exslt.org/regexp/functions/match/index.html

[xpath-functions]
    XQuery 1.0 and XPath 2.0 Functions and Operators.
    http://www.w3.org/TR/2007/REC-xpath-functions-20070123/#regex-syntax

[XSD]
    XML Schema Part 2: Datatypes Second Edition.
    http://www.w3.org/TR/xmlschema-2/#regexs
Comment 1 GNOME Infrastructure Team 2021-07-05 13:20:40 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxml2/-/issues/

Thank you for your understanding and your help.