GNOME Bugzilla – Bug 779751
Incorrect interpretation of caret (^) in regexp character group
Last modified: 2020-10-25 20:09:00 UTC
Created attachment 347482 [details] [review] Patch for the described bug. The regexp '[ab^cd]' is effectively equivalent to '[^cd]', since libxml2 treats the caret as a negation wherever it appears in a Character Group. But according to both https://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#regexs and https://www.w3.org/TR/xmlschema11-2/#regex, the caret is only special when it appears as the first character of a Character Group, and should not need escaping when used elsewhere - i.e. '[ab^cd]' should match any of the 5 characters in the group. (The regexp '[ab\^cd]' does work as expected, but the escape should not be needed.) Tested in 2.9.2 and 2.9.4, but neither is available in the version selector. Attaching patch for 2.9.4 (includes some simplification of xmlregexp.c:xmlFAParseCharGroup() - if a more minimal patch is desired, it can be provided).
*** Bug 674411 has been marked as a duplicate of this bug. ***
Fixed here: https://gitlab.gnome.org/GNOME/libxml2/-/commit/7d6837ba0e282e94eb8630ad791f427e44a57491 Thanks for the patch!