GNOME Bugzilla – Bug 320638
patch to speed up gtk_text_iter_backwards_chars
Last modified: 2005-12-12 19:17:06 UTC
this patch improves the backwards_char algorithm by walking backwards in the segment when the distance to walk backwards is shorter than the length forwards. Other information:
Created attachment 54288 [details] patch to optimize walking backward in long segments right now the weighs the cost of walking forward the same as walking backwards, this isn't really the case so the if (count < real->segment_char_offset) condition should probably be weighted to prefer walking forwards (count * some multiple < real->segement_char_offset).
It is also important to note that gtk_text_iter_backward_char calls gtk_text_iter_backward_chars with count 1 and is itself called internally in gtk_text_iter in many places
We need a benchmark for this. Maybe it's easy to pull out the search/replace code from Gedit, which was using this function a lot.
The factors I get when testing g_utf8_next/prev_char() on the data in pango-profiles vary between 13 times faster and 300 times faster for moving forwards through a string. 300 seems a little high, but it is definitively a lot faster.
Clearly, but on a 4k paragaph 300 is still a win
Created attachment 54480 [details] speed up the backwards walking too This is a variation on the original approach. We change offset_to_pointer to allow negative offsets (before it would just run off into wildly into memory) and then call offset_to_pointer from text_iter_backwards_chars
Created attachment 54485 [details] [review] proof of concept that walking backwards can be fast too new patch to speed up offset_to_pointer with negative values we step backwards offset chars then walk to the beginning of the character we landed in. next we count how many actual characters we stepped back then repeat
Created attachment 54486 [details] [review] proof of concept that walking backwards can be fast (the proper patch) new patch to speed up offset_to_pointer with negative values we step backwards offset chars then walk to the beginning of the character we landed in. next we count how many actual characters we stepped back then repeat
Created attachment 54488 [details] [review] use the correctly signed values for offset. this is the same as the previous two patches but with using negative offsets correctly
Created attachment 54491 [details] [review] full patch with stutter stepping this is the full patch, hopefully it compiles ;)
Created attachment 54492 [details] [review] really the full patch
Created attachment 54493 [details] [review] the I am undeniably stupid patch real patch take 3
Created attachment 54494 [details] my hacked up pango-language-profile.c to measure utf8 movement
Created attachment 54517 [details] [review] benchmark for the stutter step patch this should benchmark the patch fairly well.
using the attached build.c but bumping the iterations to 1000000 without the patch I get real 0m7.364s user 0m7.332s sys 0m0.020s with the patch I get real 0m4.708s user 0m4.668s sys 0m0.016s
Created attachment 54606 [details] [review] stutter better
Created attachment 54608 [details] another hacked benchmark
Created attachment 54614 [details] unit tests for g_utf8_offset_to_pointer Here are some unit tests for g_utf8_offset_to_pointer and g_utf8_pointer_to_offset, which my version passes. Not that you will have to add if (pos < str) return - g_utf8_pointer_to_offset (pos, str); to g_utf8_pointer_to_offset to make it pass the tests.
Created attachment 54874 [details] [review] pango patch to use negative pointer_to_offset Attaching Pango patch to use negative-enabled g_utf8_pointer_to_offset (as per Matthias' last comment), to make cursor movement linear instead of quadratic.
I have committed stutter stepping to glib HEAD, so we can do the gtktextiter changes in gtk+ HEAD now.
Created attachment 54884 [details] small bench to figure out the weight function I used this small bench to figure out when it is worth to go backward (the bench uses a string of 64 four-byte chars which should be the worse case): here I got that going forward is almost 3 times faster, thus going backward is worth if offset falls in the last fourth of the string.
Created attachment 54888 [details] [review] doc patch I propose to add this information to the docs so that walking backwards is not abused. Obviously provided that others see similar numbers.
I don't like the doc patch that much. The goal of handling negative offsets is that it makes many operations constant-time instead of O(N). That is, when you want to move locally. I don't see it used with an if statement that much.
Created attachment 54891 [details] [review] updated textiter patch updated version of lewing's textiter patch to use the weight function measured above + same stuff in gtktextbtree bench of comment #14 with 1000000 iterations goes from 10s (unpatched) to 6s (textiter patch) to 2s (textiter + btree)
(as noted on irc, writing the weight function as '4 *' instead of '/ 4' may be safer... I assumed compiler was smart enough to figure that out these days. I doubt it matter anyways)
When stepping backwards through the testtext example with the patch applied, it segfaults for me when hitting the arabic text.
Pango patch committed after fixing.
Ok, couldn't reproduce my earlier problems. Committed to GTK+ HEAD. 2005-12-12 Matthias Clasen <mclasen@redhat.com> * gtk/gtktextbtree.c (_gtk_text_line_char_to_byte_offsets): * gtk/gtktextiter.c (gtk_text_iter_backward_chars): Speed up stepping backwards. (#320638, Larry Ewing, Paolo Borelli)