After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 638709 - utf8_strrcasestr in GtkTextIter makes an invalid read
utf8_strrcasestr in GtkTextIter makes an invalid read
Status: RESOLVED FIXED
Product: gtk+
Classification: Platform
Component: Widget: GtkTextView
unspecified
Other All
: Normal minor
: ---
Assigned To: gtk-bugs
gtk-bugs
Depends on:
Blocks:
 
 
Reported: 2011-01-04 23:04 UTC by morshed.nader
Modified: 2014-07-31 16:03 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to fix the issue in gtktextiter.c (663 bytes, patch)
2011-01-04 23:04 UTC, morshed.nader
none Details | Review
textiter: use g_utf8_find_prev_char() (safer) (1.19 KB, patch)
2014-07-20 13:28 UTC, Sébastien Wilmet
none Details | Review
textiter: don't call g_utf8_prev_char() on start of string (1.12 KB, patch)
2014-07-31 14:01 UTC, Sébastien Wilmet
committed Details | Review

Description morshed.nader 2011-01-04 23:04:17 UTC
In utf8_strrcasestr, the search loop iterates backwards using:

p = g_utf8_prev_char(p);

However, when this is called at the start of a string, in the last loop of any search, it reads at least 1 byte before the allocated range.

I've attached a simple fix.
Comment 1 morshed.nader 2011-01-04 23:04:39 UTC
Created attachment 177528 [details] [review]
Patch to fix the issue in gtktextiter.c
Comment 2 Sébastien Wilmet 2014-07-20 13:28:51 UTC
Created attachment 281235 [details] [review]
textiter: use g_utf8_find_prev_char() (safer)

g_utf8_prev_char() should not be used at the start of the string.
g_utf8_find_prev_char() is safer (NULL is returned when no previous char
is found).

Thanks to morshed.nader for the bug report and an initial patch.
Comment 3 Paolo Borelli 2014-07-31 10:08:19 UTC
Review of attachment 281235 [details] [review]:

What are the performance implications of this change? Can't we just special case the start of the string instead of using the safe version within the loop?

The safe version is meant to handle a pointer to an offset in the middle of a utf8 sequence, but in this case we are sure we always move to the start of each sequence, so we just need to handle the p == haystack special case.

pseudocode:


while (p > haystack) {
  if cmp() goto

  prev()
}

assert (p == haystack)
if cmp() goto
Comment 4 Sébastien Wilmet 2014-07-31 10:38:32 UTC
Well, g_utf8_find_prev_char() does the work for us, it is meant to be used for the use case here, as the documentation of g_utf8_prev_char() says.

With micro-optimization the code is less clear. Algorithms, data structures and the code architecture have generally a greater impact on performances. Doing micro-optimizations should be done in the rare cases when it is really needed.
Comment 5 Paolo Borelli 2014-07-31 13:06:11 UTC
I do not agree:

1) g_utf8_prev_char is meant to be safe against strings that are not validated, which is not the case here. Here we just need a proper check on the pointer arithmetic

2) Properly handling the exit condition actually makes the code more clear, even if slightly longer

3) using the safe version makes it inconsistent with the variant that moves forward

4) while it is true that this is a micro-optimization, it is something happeing inside a tight loop that iterates the string, so I think that not beeing intentionally inefficient is a good thing
Comment 6 Sébastien Wilmet 2014-07-31 14:01:26 UTC
Created attachment 282141 [details] [review]
textiter: don't call g_utf8_prev_char() on start of string

Changes also the "goto finally" with a break. A break is more common.

Another way is to use g_utf8_find_prev_char().
Comment 7 Sébastien Wilmet 2014-07-31 14:03:58 UTC
The above patch is a second version. For me either way is fine.
Comment 8 Paolo Borelli 2014-07-31 14:29:06 UTC
Review of attachment 282141 [details] [review]:

looks good
Comment 9 Sébastien Wilmet 2014-07-31 16:03:12 UTC
Comment on attachment 282141 [details] [review]
textiter: don't call g_utf8_prev_char() on start of string

Pushed:
https://git.gnome.org/browse/gtk+/commit/?id=6e4e7c22a028171c9c2df7041b036878b5bcbf42