GNOME Bugzilla – Bug 131625
GTK_WRAP_CHAR with long sequence of spaces
Last modified: 2018-05-22 12:04:56 UTC
When a line of input contains a long sequences of space characters (compared to the width of the window), character wrapping behaviour seems quite unnatural. 1) On a blank line entering a long sequence of spaces causes an horizontal scrollbar after the cursor has moved to the right edge of the GtkTextView. 2) On a line containing some non-space characters already, adding a long continuous sequence of space characters causes the character just before the first of the spaces to be wrapped onto a new line when more spaces than fit within the width have of the current line have been added. Then when more characters are added to fill up with line, it continues to behave like case (1). Ie it changes from ---------------------------------- |word_______________________ | `_' == space char | | | | | | ---------------------------------- to ---------------------------------- |wor | |d_____________________________ | | | | | | | ---------------------------------- and then this ---------------------------------- |or | |__________________________________| | | | | | *********************************| <- scrollbar ---------------------------------- In GTK_WRAP_CHAR mode, I wouldn't expect the horizontal scrollbar ever to appear.
Actually I can kind of reproduce this with GTK_WRAP_WORD too: ---------------------------------- |one_two_______________________x | `_' == space char | | `x' == cursor | | | | ---------------------------------- to ---------------------------------- |one | |two_____________________________ x| | | | | ---------------------------------- and then this ---------------------------------- |ne | |wo________________________________x | | | *********************************| <- scrollbar ---------------------------------- So it seems the spaces after "two" are treated as an extension of the word as far as wrapping is concerned.
Still seems to happen with gtk2-2.3.2.
I can reproduce this. If anything, it is a Pango bug, since the text view relies on Pango for line breaking. The question is what the expected behaviour is. For WRAP_CHAR, the answer should be relatively clear: it should be possible to break the space sequence anywhere. Is the same true for WRAP_WORD ?
We generally are trying to stay as close as possible to http://www.unicode.org/reports/tr14/. Relevant quote describing SPACE (U+0020) The space characters are explicit break opportunities, but spaces at the end of a line are not measured for fit. If there is a sequence of space characters, and breaking after any of the space characters would result in the same visible line, the line breaking position after the last space character in the sequence is the locally most optimal one. In other words, since the last character measured for fit is before the space character, any number of space characters are kept together invisibly on the previous line and the first non-space character starts the next line. The approach described there only makes sense for non-interactive text layout ... you can't make the spacebar do nothing at the end of a line. Maybe PangoLayout needs an optional mode like that. But simply adding break opportunities between all pairs of spaces isn't close to right.. you want to avoid breaking blahblahblah blah As: blabblahblah blah I'm not sure that there is a reasonable behavior for WRAP_WORD. We could make the textview use PANGO_WRAP_WORD_CHAR instead, but it's pretty much an odd corner case, and I'd be inclined to leave it as is. For WRAP_CHAR the current behavior is documented in the code comment as: /* Unicode doesn't specify char wrap; we wrap around all chars * except where a line break is prohibited, which means we * effectively break everywhere except inside runs of spaces. */ Which is pretty inaccurate - line breaks are prohibited in a lot more places than in runs of spaces. If you look at Table 2 in the Unicode annex referenced above there are a lot of '^' signs. And you can see the prohibited breaks there easily enough in wrap-char mode... try lines full of (((( or )))) or !!!! or .... It's possible that the right fix is simply to remove the code: /* can't break here */ attrs[i].is_char_break = FALSE; The question is whether anybody is counting on the slightly-better-than-break anywhere behavior currently. (The current behavior could be described as treating latin letters the same as ideographs... we'll break blah blah bla h! But not blah blah blah !
*** Bug 307650 has been marked as a duplicate of this bug. ***
*** Bug 308126 has been marked as a duplicate of this bug. ***
---------------------------------- |one_two.......................x | `.' == dot | | `x' == cursor | | | | ---------------------------------- to ---------------------------------- |one | |two..............................x| | | | | ---------------------------------- I mean, space char is not alone itself. It happens with dot character too! Other charactesr who creates this effect ",./;[]{}()!" Is there any body can Fix this? !@#$% It's nearly 2 years old BUG!
What is gonna break with the following "patch"? As far as I understand most people understand WRAP_CHAR as it's described in GtkWrapMode docs: "wrap text, breaking lines anywhere the cursor can appear", i.e. wrap text as if it was a terminal - stupid, on the right boundary. For what it's worth, this change does make textview work as expected - break sequences of spaces or dots without resizing. Index: pango/break.c =================================================================== --- pango/break.c (revision 2218) +++ pango/break.c (working copy) @@ -902,8 +902,7 @@ pango_default_break (const gchar *text switch (break_op) { case BREAK_PROHIBITED: - /* can't break here */ - attrs[i].is_char_break = FALSE; + /* char break is fine here */ break; case BREAK_IF_SPACES:
Any feedback on the above mentioned change? It would be nice to either apply it upstream or have some pointers regarding an alternative solution.
That change is against the Unicode spec we are trying to implement. So, that can't change in Pango. There are three solutions: - Do nothing, declare this bug as feature. - In GTK+, use is_cursor_position if failing to break line with is_char_break. This has the same effect as the patch in comment 8, but hopefully doesn't break in between really prohibited positions as often. - Introduce yet another boolean in pango that is between like is_char_break but more relaxed about spaces. I don't like this solution.
Fourth solution: - Figure out why the spec prohibits correct intuitive behavior. Is it a spec bug, spec misunderstanding, something else? Either pango thing we are talking about here is intended to work for our use case (i.e. that unicode mumbo-jumbo is actually about word wrapping in a text widget and alike), or it's not and then we shouldn't care about the spec with this particular use case in the first place. Though the second solution seems the, TextView really uses too much pango, which bites your ass whenever you want to customize anything. Doing thing which is right for a text widget is the best thing to do in a text widget. Oh well, sweet theory.
Whatever... Pango exposes all the information you may want. It's up to GTK+ to use it however it likes. Pango doesn't force anyone to use a specific bit and not the other.
I just tried again with gedit-2.24.0 and gtk2-2.14.4 and the behaviour now seems better. Perhaps this can be closed?
I can still reproduce the original steps with master, ie 2.17, of today, so I don't think this was fixed.
I've reported this as bug 591364 in the past. This issue, as you are probably aware, extends to several other characters including punctuation, which understandably you would not expect to see repeated in an unbroken line. When it comes to text editing, this is not a big deal. However, the problem arises when TextView is used in more unpredictable situations, such as a buffer for chat clients. Pidgin in particular comes to mind and if I'm not mistaken, Empathy may suffer from this as well. If someone discovers this, although it won't pose any sort of true security threat of any sort, it can still prove to be a real nuisance and a potential blow to readability and accessibility by forcing the user to clear the buffer or scroll to read subsequent long lines. I confess, both to test this out and raise awareness of this issue, I have personally in the past gone to acquaintances using affected programs and sent them long streams of open brackets. Surely there are more malicious people out there who may be using this to pester their contacts and drive people away from otherwise fine GTK+-based applications.
*** Bug 591364 has been marked as a duplicate of this bug. ***
Created attachment 177313 [details] Bug occurring in gedit 2.30.3 I have this bug on gedit 2.30.3 and Gnome 2.32.0. See the attached screen shot.
I suggest to rethink this "feature". The referred Unicode standard annex #14, "Unicode Line Breaking Algorithm", in section "1 Overview and Scoping" says: Line breaking, also known as word wrapping... If the "line breaking" is known as "word wrapping", why are you applying the algorithm in character breaking mode? I'm not sure the current behavior in "character breaking" mode is desirable by anyone. The expected behavior would be rather like is seen in Vim text editor or like in xterm terminal emulator. Please open the mentioned editor or run the emulator and compare behavior with your control. For example, fill half of line with "a" letters and then start typing spaces. A normal editor behaves as follows: ---------------------------------- |aaaaaaaaaaaaaaaa | | | | | | | | ---------------------------------- The GtkTextView control behaves as follows: ---------------------------------- |aaaaaaaaaaaaaaa | |a | | | | | | ---------------------------------- Which is completely non-intuitive and annoying.
Is there anything that can be done about this bug? Because we're now in 2017 (13 years after the bug was reported) and apps are STILL getting hit by this! See https://github.com/baedert/corebird/issues/657#issuecomment-287559624, where the app grows horizontally because Pango won't break in punctuation but will break within a word. @Rafal's comment seems the most relevant here - if you're using WRAP_CHAR or WRAP_WORD_CHAR then you're (eventually) character wrapping, not word wrapping, so why worry about the Unicode standard and cause this mismatch of behaviour between letters and punctuation?
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/pango/issues/13.