After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 530405 - Use is_word_boundary PangoLogAttr attribute for word selection
Use is_word_boundary PangoLogAttr attribute for word selection
Status: RESOLVED OBSOLETE
Product: gtk+
Classification: Platform
Component: Widget: GtkTextView
unspecified
Other Linux
: Normal enhancement
: ---
Assigned To: gtk-bugs
gtk-bugs
: 727972 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2008-04-28 17:29 UTC by Behdad Esfahbod
Modified: 2018-04-15 00:05 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
textview: use is_word_boundary attribute for word selection (7.13 KB, patch)
2014-04-09 19:37 UTC, Sébastien Wilmet
none Details | Review
textview: word boundaries like in Vim (3.93 KB, patch)
2014-04-20 14:05 UTC, Sébastien Wilmet
rejected Details | Review

Description Behdad Esfahbod 2008-04-28 17:29:14 UTC
I recently added a new member to PangoLogAttr, called is_word_boundary.  It follows the Unicode recommendation and is more useful for select-by-word than the current is_word_start/end bits are.
Comment 1 Sébastien Wilmet 2014-04-09 19:36:30 UTC
I'm doing some experiments with this bug, see the attached patch below.

For the word selection on double click, using is_word_boundary doesn't work well outside a natural-language word, for example to select several spaces. There is a word boundary between each space. The current code has a special case when the iter is outside a natural-language word. This special case needs to use is_word_start/end. So does it really worth the effort of using is_word_boundary?

Also, I don't know if is_word_boundary can also be used for words movements with Ctrl+Left, Ctrl+Right, etc. The behavior is different. Currently Ctrl+Right moves to the next word end. With is_word_boundary, Ctrl+Right moves to the next word boundary, which can be a word start, a word end, a space, etc.
Comment 2 Sébastien Wilmet 2014-04-09 19:37:05 UTC
Created attachment 273930 [details] [review]
textview: use is_word_boundary attribute for word selection

The new functions in GtkTextIter are private for now. It is almost the
same implementation as for the word start/end. The unique difference is
the use of the is_word_boundary PangoLogAttr attribute instead of
is_word_start and is_word_end.

And those private functions are used in GtkTextView for:
- move by words with Ctrl+Left, Ctrl+Right, etc.
- word selection with double click
Comment 3 Sébastien Wilmet 2014-04-09 21:43:33 UTC
An idea: make Ctrl+Left and Ctrl+Right the same as 'b' and 'e' in Vim. A word in Vim is not only natural-language words, but also groups of contiguous "special" characters like punctuations.

These word boundaries rules would apply for selecting words with double click too, of course.

The implementation in GtkTextView could use a combination of is_word_start, is_word_end, is_word_boundary, and maybe other custom functions for detecting spaces, etc.
Comment 4 Sébastien Wilmet 2014-04-10 15:27:25 UTC
See bug #727972 for the latter idea.
Comment 5 jmontane 2014-04-14 19:23:37 UTC
Just few remarks,

currently pango doesn't use is_word_boundary. It use is_word_start/is_word_end atributes. See bug 700103 and bug 97545. In short:
1.- we have two different definition of "word"
2.- "words" defined by is_word_start/is_word_end are more simple than "words" defined by is_word_boundary, and there are issues with spell-checking in some languages (read Catalan).

I prefer using is_word_boundary definition for mouse/cursor selection and spell-checking engine. AFAIK is the behaviour in Qt and other OSes (read OSX), but a huge refactoring of pango-break is needed.
Comment 6 Sébastien Wilmet 2014-04-18 13:20:45 UTC
The problem is that the bugs in pango are quite big ("a huge refactoring" as you said).

The solution that I propose for GtkTextView is simpler. It would use is_word_start/is_word_end from pango to delimit natural-language words. And it would use another custom (and simple) algorithm to tune the word boundaries so it behaves like in Vim.

If such a simple algorithm in GtkTextView is feasible, would it be accepted? I didn't try yet to sketch the algorithm, but I think it should be feasible.

And if the GTK+ developers don't want such behavior for GtkTextView (or prefer to wait for the big patches in pango), in the meantime it would be nice to be able to tune the word boundaries in a subclass of GtkTextView, for example GtkSourceView is interested by that (see bug #354587).
Comment 7 Sébastien Wilmet 2014-04-20 14:05:57 UTC
Created attachment 274760 [details] [review]
textview: word boundaries like in Vim

Ugly implementation, only for Ctrl+Right.
Doesn't take into account invisible regions in the text buffer.

The underscore is not taken as part of a natural-language word, so the
behavior is different than in Vim for e.g. variable names like
a_variable_with_underscores. In Vim it is taken as one word. Here the
word is split at each underscore.
Comment 8 Sébastien Wilmet 2014-04-20 14:17:10 UTC
So with the above patch you get an idea. For the underscores a solution is to use the is_word_boundary PangoLogAttr attribute too. But the algorithm would be more complicated.

What do you think?
Comment 9 Matthias Clasen 2014-04-21 12:51:31 UTC
(In reply to comment #8)

> What do you think?

I don't think we should work around pango problems in gtk. They should be fixed in pango.
Comment 10 Sébastien Wilmet 2014-04-23 11:08:28 UTC
*** Bug 727972 has been marked as a duplicate of this bug. ***
Comment 11 Sébastien Wilmet 2014-05-06 20:32:31 UTC
A complete implementation of my idea to improve the behavior for word movements (ctrl+arrow) and word selection (double click) is available in bug #562767, with the custom word boundaries implemented in GtkSourceView:

https://git.gnome.org/browse/gtksourceview/log/?h=wip/custom-word-boundaries

And it makes sense to have these custom word boundaries (only for word movements and selection, not for a spell checker) directly in GtkTextView. It has the advantage to not add an API. It would be less flexible, of course, but the generic implementation is enough in my opinion.

Like I said in bug #562767, for the custom word boundaries, it is simpler I think to keep two word boundary types (word start and word end). This is needed for word movements: Ctrl+right go to the next word end, Ctrl+left go to the previous word start. With a single word boundary type (like the is_word_boundary PangoLogAttr attribute), it would be more difficult to know if a word boundary is a word start or a word end (the contents must be analyzed in that case, for example if the previous char is a space, the iter is probably at a word start).

With two word boundary types, it is simple to merge them to have only one word boundary type for the double click.

The implementation for the custom word boundaries is generic (see the branch in gtksourceview, the code is well documented). It uses only the is_word_start and is_word_end PangoLogAttr attributes (through the GtkTextIter API), and g_unichar_isspace().
Comment 12 Matthias Clasen 2018-02-10 05:15:13 UTC
We're moving to gitlab! As part of this move, we are moving bugs to NEEDINFO if they haven't seen activity in more than a year. If this issue is still important to you and still relevant with GTK+ 3.22 or master, please reopen it and we will migrate it to gitlab.
Comment 13 Matthias Clasen 2018-04-15 00:05:32 UTC
As announced a while ago, we are migrating to gitlab, and bugs that haven't seen activity in the last year or so will be not be migrated, but closed out in bugzilla.

If this bug is still relevant to you, you can open a new issue describing the symptoms and how to reproduce it with gtk 3.22.x or master in gitlab:

https://gitlab.gnome.org/GNOME/gtk/issues/new