Bug 638709 – utf8_strrcasestr in GtkTextIter makes an invalid read

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 638709 - utf8_strrcasestr in GtkTextIter makes an invalid read


Summary:	utf8_strrcasestr in GtkTextIter makes an invalid read


Status:	RESOLVED FIXED

Product:	gtk+
Classification:	Platform
Component:	Widget: GtkTextView
Version:	unspecified
Hardware:	Other All

Importance:	Normal minor
Target Milestone:	---
Assigned To:	gtk-bugs
QA Contact:	gtk-bugs

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-01-04 23:04 UTC by morshed.nader
Modified:	2014-07-31 16:03 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch to fix the issue in gtktextiter.c (663 bytes, patch) 2011-01-04 23:04 UTC, morshed.nader	none	Details \| Review
textiter: use g_utf8_find_prev_char() (safer) (1.19 KB, patch) 2014-07-20 13:28 UTC, Sébastien Wilmet	none	Details \| Review
textiter: don't call g_utf8_prev_char() on start of string (1.12 KB, patch) 2014-07-31 14:01 UTC, Sébastien Wilmet	committed	Details \| Review

Description morshed.nader 2011-01-04 23:04:17 UTC

In utf8_strrcasestr, the search loop iterates backwards using:

p = g_utf8_prev_char(p);

However, when this is called at the start of a string, in the last loop of any search, it reads at least 1 byte before the allocated range.

I've attached a simple fix.

Comment 1 morshed.nader 2011-01-04 23:04:39 UTC

Created attachment 177528 [details] [review]
Patch to fix the issue in gtktextiter.c

Comment 2 Sébastien Wilmet 2014-07-20 13:28:51 UTC

Created attachment 281235 [details] [review]
textiter: use g_utf8_find_prev_char() (safer)

g_utf8_prev_char() should not be used at the start of the string.
g_utf8_find_prev_char() is safer (NULL is returned when no previous char
is found).

Thanks to morshed.nader for the bug report and an initial patch.

Comment 3 Paolo Borelli 2014-07-31 10:08:19 UTC

Review of attachment 281235 [details] [review]:

What are the performance implications of this change? Can't we just special case the start of the string instead of using the safe version within the loop?

The safe version is meant to handle a pointer to an offset in the middle of a utf8 sequence, but in this case we are sure we always move to the start of each sequence, so we just need to handle the p == haystack special case.

pseudocode:


while (p > haystack) {
  if cmp() goto

  prev()
}

assert (p == haystack)
if cmp() goto

Comment 4 Sébastien Wilmet 2014-07-31 10:38:32 UTC

Well, g_utf8_find_prev_char() does the work for us, it is meant to be used for the use case here, as the documentation of g_utf8_prev_char() says.

With micro-optimization the code is less clear. Algorithms, data structures and the code architecture have generally a greater impact on performances. Doing micro-optimizations should be done in the rare cases when it is really needed.

Comment 5 Paolo Borelli 2014-07-31 13:06:11 UTC

I do not agree:

1) g_utf8_prev_char is meant to be safe against strings that are not validated, which is not the case here. Here we just need a proper check on the pointer arithmetic

2) Properly handling the exit condition actually makes the code more clear, even if slightly longer

3) using the safe version makes it inconsistent with the variant that moves forward

4) while it is true that this is a micro-optimization, it is something happeing inside a tight loop that iterates the string, so I think that not beeing intentionally inefficient is a good thing

Comment 6 Sébastien Wilmet 2014-07-31 14:01:26 UTC

Created attachment 282141 [details] [review]
textiter: don't call g_utf8_prev_char() on start of string

Changes also the "goto finally" with a break. A break is more common.

Another way is to use g_utf8_find_prev_char().

Comment 7 Sébastien Wilmet 2014-07-31 14:03:58 UTC

The above patch is a second version. For me either way is fine.

Comment 8 Paolo Borelli 2014-07-31 14:29:06 UTC

Review of attachment 282141 [details] [review]:

looks good

Comment 9 Sébastien Wilmet 2014-07-31 16:03:12 UTC

Comment on attachment 282141 [details] [review]
textiter: don't call g_utf8_prev_char() on start of string

Pushed:
https://git.gnome.org/browse/gtk+/commit/?id=6e4e7c22a028171c9c2df7041b036878b5bcbf42