GNOME Bugzilla – Bug 689794
support incremental matching
Last modified: 2018-05-24 14:53:31 UTC
Suppose you have a large piece of data you want to match with a regex; so much you can't fit it into memory all at once. So you want to load slices of it and match that. The problem there is that a match may only be partial at the end of the available data. Then you retrieve its position (the first inspected character), and use the max lookbehind length to know how much data you need to keep from the this data slice to possibly complete the match with the next data slice. For this to work, we need API to a) retrieve the max lookbehind of a pattern (PCRE_INFO_MAXLOOKBEHIND), and b) to retrieve the position of the partial match, which the current GMatchInfo doesn't allow. See man:pcrepartial(3) for more information on incremental (there called 'multi-segmented') matching.
With some patterns, incremental matching can give different results than full matching. There are some possible workarounds, but the API in PCRE is not easy to use for incremental matching. I filed a feature request about that: http://bugs.exim.org/show_bug.cgi?id=1368
The safer way is to use g_regex_match() with G_REGEX_MATCH_PARTIAL_HARD, with a larger segment if a partial match is returned, until a complete match is found, or there is no match. The PCRE documentation was not really clear, but this should give the same result as matching on the full subject string. A patch is coming for retrieving the max lookbehind of a pattern.
Created attachment 249191 [details] [review] GRegex: add g_regex_get_max_lookbehind() It is useful for multi-segment regex matching. A unit test is included.
If the two of you are sure that this is correct and if it makes your life easier then I have no problem with the addition. I'd ask that you actually test this 'read a bunch of data in a stream, doing partial matching' idea before committing, though, just in case there is another catch that you didn't think of yet. The testcase here doesn't seem to actually be doing that...
Comment on attachment 249191 [details] [review] GRegex: add g_regex_get_max_lookbehind() The new function is currently used in a branch in GtkSourceView, and it works fine. The second point of this bug would only improve slightly the performances, as the partial matching is done line by line in GtkSourceView. So at most one line is scanned uselessly.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/643.