After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 689794 - support incremental matching
support incremental matching
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: gregex
unspecified
Other Linux
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks: 134674
 
 
Reported: 2012-12-06 18:24 UTC by Christian Persch
Modified: 2018-05-24 14:53 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
GRegex: add g_regex_get_max_lookbehind() (3.60 KB, patch)
2013-07-15 12:22 UTC, Sébastien Wilmet
committed Details | Review

Description Christian Persch 2012-12-06 18:24:16 UTC
Suppose you have a large piece of data you want to match with a regex; so much you can't fit it into memory all at once. So you want to load slices of it and match that. The problem there is that a match may only be partial at the end of the available data. Then you retrieve its position (the first inspected character), and use the max lookbehind length to know how much data you need to keep from the this data slice to possibly complete the match with the next data slice.

For this to work, we need API to a) retrieve the max lookbehind of a pattern (PCRE_INFO_MAXLOOKBEHIND), and b) to retrieve the position of the partial match, which the current GMatchInfo doesn't allow.

See man:pcrepartial(3) for more information on incremental (there called 'multi-segmented') matching.
Comment 1 Sébastien Wilmet 2013-07-14 15:38:31 UTC
With some patterns, incremental matching can give different results than full matching. There are some possible workarounds, but the API in PCRE is not easy to use for incremental matching. I filed a feature request about that:

http://bugs.exim.org/show_bug.cgi?id=1368
Comment 2 Sébastien Wilmet 2013-07-15 12:20:52 UTC
The safer way is to use g_regex_match() with G_REGEX_MATCH_PARTIAL_HARD, with a larger segment if a partial match is returned, until a complete match is found, or there is no match.

The PCRE documentation was not really clear, but this should give the same result as matching on the full subject string.

A patch is coming for retrieving the max lookbehind of a pattern.
Comment 3 Sébastien Wilmet 2013-07-15 12:22:02 UTC
Created attachment 249191 [details] [review]
GRegex: add g_regex_get_max_lookbehind()

It is useful for multi-segment regex matching.

A unit test is included.
Comment 4 Allison Karlitskaya (desrt) 2013-07-23 13:38:16 UTC
If the two of you are sure that this is correct and if it makes your life easier then I have no problem with the addition.

I'd ask that you actually test this 'read a bunch of data in a stream, doing partial matching'  idea before committing, though, just in case there is another catch that you didn't think of yet.  The testcase here doesn't seem to actually be doing that...
Comment 5 Sébastien Wilmet 2013-07-23 13:51:13 UTC
Comment on attachment 249191 [details] [review]
GRegex: add g_regex_get_max_lookbehind()

The new function is currently used in a branch in GtkSourceView, and it works fine.

The second point of this bug would only improve slightly the performances, as the partial matching is done line by line in GtkSourceView. So at most one line is scanned uselessly.
Comment 6 GNOME Infrastructure Team 2018-05-24 14:53:31 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/643.