GNOME Bugzilla – Bug 530405
Use is_word_boundary PangoLogAttr attribute for word selection
Last modified: 2018-04-15 00:05:32 UTC
I recently added a new member to PangoLogAttr, called is_word_boundary. It follows the Unicode recommendation and is more useful for select-by-word than the current is_word_start/end bits are.
I'm doing some experiments with this bug, see the attached patch below. For the word selection on double click, using is_word_boundary doesn't work well outside a natural-language word, for example to select several spaces. There is a word boundary between each space. The current code has a special case when the iter is outside a natural-language word. This special case needs to use is_word_start/end. So does it really worth the effort of using is_word_boundary? Also, I don't know if is_word_boundary can also be used for words movements with Ctrl+Left, Ctrl+Right, etc. The behavior is different. Currently Ctrl+Right moves to the next word end. With is_word_boundary, Ctrl+Right moves to the next word boundary, which can be a word start, a word end, a space, etc.
Created attachment 273930 [details] [review] textview: use is_word_boundary attribute for word selection The new functions in GtkTextIter are private for now. It is almost the same implementation as for the word start/end. The unique difference is the use of the is_word_boundary PangoLogAttr attribute instead of is_word_start and is_word_end. And those private functions are used in GtkTextView for: - move by words with Ctrl+Left, Ctrl+Right, etc. - word selection with double click
An idea: make Ctrl+Left and Ctrl+Right the same as 'b' and 'e' in Vim. A word in Vim is not only natural-language words, but also groups of contiguous "special" characters like punctuations. These word boundaries rules would apply for selecting words with double click too, of course. The implementation in GtkTextView could use a combination of is_word_start, is_word_end, is_word_boundary, and maybe other custom functions for detecting spaces, etc.
See bug #727972 for the latter idea.
Just few remarks, currently pango doesn't use is_word_boundary. It use is_word_start/is_word_end atributes. See bug 700103 and bug 97545. In short: 1.- we have two different definition of "word" 2.- "words" defined by is_word_start/is_word_end are more simple than "words" defined by is_word_boundary, and there are issues with spell-checking in some languages (read Catalan). I prefer using is_word_boundary definition for mouse/cursor selection and spell-checking engine. AFAIK is the behaviour in Qt and other OSes (read OSX), but a huge refactoring of pango-break is needed.
The problem is that the bugs in pango are quite big ("a huge refactoring" as you said). The solution that I propose for GtkTextView is simpler. It would use is_word_start/is_word_end from pango to delimit natural-language words. And it would use another custom (and simple) algorithm to tune the word boundaries so it behaves like in Vim. If such a simple algorithm in GtkTextView is feasible, would it be accepted? I didn't try yet to sketch the algorithm, but I think it should be feasible. And if the GTK+ developers don't want such behavior for GtkTextView (or prefer to wait for the big patches in pango), in the meantime it would be nice to be able to tune the word boundaries in a subclass of GtkTextView, for example GtkSourceView is interested by that (see bug #354587).
Created attachment 274760 [details] [review] textview: word boundaries like in Vim Ugly implementation, only for Ctrl+Right. Doesn't take into account invisible regions in the text buffer. The underscore is not taken as part of a natural-language word, so the behavior is different than in Vim for e.g. variable names like a_variable_with_underscores. In Vim it is taken as one word. Here the word is split at each underscore.
So with the above patch you get an idea. For the underscores a solution is to use the is_word_boundary PangoLogAttr attribute too. But the algorithm would be more complicated. What do you think?
(In reply to comment #8) > What do you think? I don't think we should work around pango problems in gtk. They should be fixed in pango.
*** Bug 727972 has been marked as a duplicate of this bug. ***
A complete implementation of my idea to improve the behavior for word movements (ctrl+arrow) and word selection (double click) is available in bug #562767, with the custom word boundaries implemented in GtkSourceView: https://git.gnome.org/browse/gtksourceview/log/?h=wip/custom-word-boundaries And it makes sense to have these custom word boundaries (only for word movements and selection, not for a spell checker) directly in GtkTextView. It has the advantage to not add an API. It would be less flexible, of course, but the generic implementation is enough in my opinion. Like I said in bug #562767, for the custom word boundaries, it is simpler I think to keep two word boundary types (word start and word end). This is needed for word movements: Ctrl+right go to the next word end, Ctrl+left go to the previous word start. With a single word boundary type (like the is_word_boundary PangoLogAttr attribute), it would be more difficult to know if a word boundary is a word start or a word end (the contents must be analyzed in that case, for example if the previous char is a space, the iter is probably at a word start). With two word boundary types, it is simple to merge them to have only one word boundary type for the double click. The implementation for the custom word boundaries is generic (see the branch in gtksourceview, the code is well documented). It uses only the is_word_start and is_word_end PangoLogAttr attributes (through the GtkTextIter API), and g_unichar_isspace().
We're moving to gitlab! As part of this move, we are moving bugs to NEEDINFO if they haven't seen activity in more than a year. If this issue is still important to you and still relevant with GTK+ 3.22 or master, please reopen it and we will migrate it to gitlab.
As announced a while ago, we are migrating to gitlab, and bugs that haven't seen activity in the last year or so will be not be migrated, but closed out in bugzilla. If this bug is still relevant to you, you can open a new issue describing the symptoms and how to reproduce it with gtk 3.22.x or master in gitlab: https://gitlab.gnome.org/GNOME/gtk/issues/new