GNOME Bugzilla – Bug 638709
utf8_strrcasestr in GtkTextIter makes an invalid read
Last modified: 2014-07-31 16:03:29 UTC
In utf8_strrcasestr, the search loop iterates backwards using: p = g_utf8_prev_char(p); However, when this is called at the start of a string, in the last loop of any search, it reads at least 1 byte before the allocated range. I've attached a simple fix.
Created attachment 177528 [details] [review] Patch to fix the issue in gtktextiter.c
Created attachment 281235 [details] [review] textiter: use g_utf8_find_prev_char() (safer) g_utf8_prev_char() should not be used at the start of the string. g_utf8_find_prev_char() is safer (NULL is returned when no previous char is found). Thanks to morshed.nader for the bug report and an initial patch.
Review of attachment 281235 [details] [review]: What are the performance implications of this change? Can't we just special case the start of the string instead of using the safe version within the loop? The safe version is meant to handle a pointer to an offset in the middle of a utf8 sequence, but in this case we are sure we always move to the start of each sequence, so we just need to handle the p == haystack special case. pseudocode: while (p > haystack) { if cmp() goto prev() } assert (p == haystack) if cmp() goto
Well, g_utf8_find_prev_char() does the work for us, it is meant to be used for the use case here, as the documentation of g_utf8_prev_char() says. With micro-optimization the code is less clear. Algorithms, data structures and the code architecture have generally a greater impact on performances. Doing micro-optimizations should be done in the rare cases when it is really needed.
I do not agree: 1) g_utf8_prev_char is meant to be safe against strings that are not validated, which is not the case here. Here we just need a proper check on the pointer arithmetic 2) Properly handling the exit condition actually makes the code more clear, even if slightly longer 3) using the safe version makes it inconsistent with the variant that moves forward 4) while it is true that this is a micro-optimization, it is something happeing inside a tight loop that iterates the string, so I think that not beeing intentionally inefficient is a good thing
Created attachment 282141 [details] [review] textiter: don't call g_utf8_prev_char() on start of string Changes also the "goto finally" with a break. A break is more common. Another way is to use g_utf8_find_prev_char().
The above patch is a second version. For me either way is fine.
Review of attachment 282141 [details] [review]: looks good
Comment on attachment 282141 [details] [review] textiter: don't call g_utf8_prev_char() on start of string Pushed: https://git.gnome.org/browse/gtk+/commit/?id=6e4e7c22a028171c9c2df7041b036878b5bcbf42