Bug 131625 – GTK_WRAP_CHAR with long sequence of spaces

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 131625 - GTK_WRAP_CHAR with long sequence of spaces


Summary:	GTK_WRAP_CHAR with long sequence of spaces


Status:	RESOLVED OBSOLETE

Product:	pango
Classification:	Platform
Component:	general
Version:	1.4.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	Small fix
Assigned To:	Behdad Esfahbod
QA Contact:	gtk-bugs

URL:
Whiteboard:

Duplicates:	307650 591364 (view as bug list)
Depends on:	97545
Blocks:

Reported:	2004-01-16 01:04 UTC by Jens Petersen
Modified:	2018-05-22 12:04 UTC

See Also:
GNOME target:	---
GNOME version:	2.9/2.10

Attachments
Bug occurring in gedit 2.30.3 (22.34 KB, image/png) 2010-12-31 19:12 UTC, Jean-Philippe Fleury	Details

Description Jens Petersen 2004-01-16 01:04:02 UTC

When a line of input contains a long sequences of space
characters (compared to the width of the window),
character wrapping behaviour seems quite unnatural.

1) On a blank line entering a long sequence of spaces causes
   an horizontal scrollbar after the cursor has moved to the
   right edge of the GtkTextView.

2) On a line containing some non-space characters already,
adding a long continuous sequence of space characters causes
the character just before the first of the spaces to be wrapped
onto a new line when more spaces than fit within the width have
of the current line have been added.  Then when more characters
are added to fill up with line, it continues to behave like
case (1).

Ie it changes from
 ----------------------------------
|word_______________________       |  `_' == space char
|                                  |
|                                  |
|                                  |
 ----------------------------------
to
 ----------------------------------
|wor                               |
|d_____________________________    |
|                                  |
|                                  |
|                                  |
 ----------------------------------
and then this
 ----------------------------------
|or                                |
|__________________________________|
|                                  |
|                                  |
| *********************************|  <- scrollbar
 ----------------------------------

In GTK_WRAP_CHAR mode, I wouldn't expect the horizontal
scrollbar ever to appear.

Comment 1 Jens Petersen 2004-01-21 07:13:20 UTC

Actually I can kind of reproduce this with GTK_WRAP_WORD too:
 ----------------------------------
|one_two_______________________x   |  `_' == space char
|                                  |  `x' == cursor
|                                  |
|                                  |
 ----------------------------------
to
 ----------------------------------
|one                               |
|two_____________________________ x|
|                                  |
|                                  |
 ----------------------------------
and then this
 ----------------------------------
|ne                                |
|wo________________________________x
|                                  |
| *********************************|  <- scrollbar
 ----------------------------------

So it seems the spaces after "two" are treated as
an extension of the word as far as wrapping is
concerned.

Comment 2 Jens Petersen 2004-02-10 11:40:23 UTC

Still seems to happen with gtk2-2.3.2.

Comment 3 Matthias Clasen 2004-04-21 20:10:33 UTC

I can reproduce this. If anything, it is a Pango bug, since the text view relies
on Pango for line breaking. The question is what the expected behaviour is. For
WRAP_CHAR, the answer should be relatively clear: it should be possible to break
the space sequence anywhere. Is the same true for WRAP_WORD ?

Comment 4 Owen Taylor 2004-12-16 04:10:45 UTC

We generally are trying to stay as close as possible to 
http://www.unicode.org/reports/tr14/.  Relevant quote describing SPACE
(U+0020)

 The space characters are explicit break opportunities, but spaces at the 
 end of a line are not measured for fit. If there is a sequence of space 
 characters, and breaking after any of the space characters would result in 
 the same visible line, the line breaking position after the last space
 character in the sequence is the locally most optimal one. In other words,
 since the last character measured for fit is before the space character, any 
 number of space characters are kept together invisibly on the previous line 
 and the first non-space character starts the next line.

The approach described there only makes sense for non-interactive text
layout ... you can't make the spacebar do nothing at the end of a line.
Maybe PangoLayout needs an optional mode like that.

But simply adding break opportunities between all pairs of spaces isn't 
close to right.. you want to avoid breaking 

blahblahblah  blah

As:


blabblahblah 
 blah

I'm not sure that there is a reasonable behavior for WRAP_WORD. We could make
the textview use PANGO_WRAP_WORD_CHAR instead, but it's pretty much an odd
corner case, and I'd be inclined to leave it as is.

For WRAP_CHAR the current behavior is documented in the code comment
as:

          /* Unicode doesn't specify char wrap; we wrap around all chars
           * except where a line break is prohibited, which means we
           * effectively break everywhere except inside runs of spaces.
           */

Which is pretty inaccurate - line breaks are prohibited in a lot
more places than in runs of spaces. If you look at Table 2 in the Unicode
annex referenced above there are a lot of '^' signs. And you can see
the prohibited breaks there easily enough in wrap-char mode... try
lines full of (((( or )))) or !!!! or ....

It's possible that the right fix is simply to remove the code:

                   /* can't break here */
                  attrs[i].is_char_break = FALSE;

The question is whether anybody is counting on the slightly-better-than-break
anywhere behavior currently. (The current behavior could be described 
as treating latin letters the same as ideographs... we'll break

 blah blah bla
 h!

But not 
 
 blah blah blah
 !

Comment 5 Owen Taylor 2005-06-14 17:28:24 UTC

*** Bug 307650 has been marked as a duplicate of this bug. ***

Comment 6 Behdad Esfahbod 2006-04-26 02:58:31 UTC

*** Bug 308126 has been marked as a duplicate of this bug. ***

Comment 7 Death Knight 2006-10-05 23:20:24 UTC

 ----------------------------------
|one_two.......................x   |  `.' == dot
|                                  |  `x' == cursor
|                                  |
|                                  |
 ----------------------------------
to
 ----------------------------------
|one                               |
|two..............................x|
|                                  |
|                                  |
 ----------------------------------

I mean, space char is not alone itself. It happens with dot character too!
Other charactesr who creates this effect ",./;[]{}()!"

Is there any body can Fix this? !@#$%
It's nearly 2 years old BUG!

Comment 8 Yevgen Muntyan 2007-03-31 21:11:16 UTC

What is gonna break with the following "patch"? As far as I understand most people understand WRAP_CHAR as it's described in GtkWrapMode docs: "wrap text, breaking lines anywhere the cursor can appear", i.e. wrap text as if it was a terminal - stupid, on the right boundary. For what it's worth, this change does make textview work as expected - break sequences of spaces or dots without resizing.

Index: pango/break.c
===================================================================
--- pango/break.c       (revision 2218)
+++ pango/break.c       (working copy)
@@ -902,8 +902,7 @@ pango_default_break (const gchar   *text
              switch (break_op)
                {
                case BREAK_PROHIBITED:
-                 /* can't break here */
-                 attrs[i].is_char_break = FALSE;
+                 /* char break is fine here */
                  break;

                case BREAK_IF_SPACES:

Comment 9 Christian Dywan 2008-10-23 11:52:07 UTC

Any feedback on the above mentioned change? It would be nice to either apply it upstream or have some pointers regarding an alternative solution.

Comment 10 Behdad Esfahbod 2008-10-23 17:58:55 UTC

That change is against the Unicode spec we are trying to implement.  So, that can't change in Pango.  There are three solutions:

  - Do nothing, declare this bug as feature.

  - In GTK+, use is_cursor_position if failing to break line with is_char_break.  This has the same effect as the patch in comment 8, but hopefully doesn't break in between really prohibited positions as often.

  - Introduce yet another boolean in pango that is between like is_char_break but more relaxed about spaces.  I don't like this solution.

Comment 11 Yevgen Muntyan 2008-10-23 18:52:16 UTC

Fourth solution:
 
 - Figure out why the spec prohibits correct intuitive behavior. Is it a spec bug, spec misunderstanding, something else? Either pango thing we are talking about here is intended to work for our use case (i.e. that unicode mumbo-jumbo is actually about word wrapping in a text widget and alike), or it's not and then we shouldn't care about the spec with this particular use case in the first place.

Though the second solution seems the, TextView really uses too much pango, which bites your ass whenever you want to customize anything. Doing thing which is right for a text widget is the best thing to do in a text widget. Oh well, sweet theory.

Comment 12 Behdad Esfahbod 2008-10-23 19:06:41 UTC

Whatever...

Pango exposes all the information you may want.  It's up to GTK+ to use it however it likes.  Pango doesn't force anyone to use a specific bit and not the other.

Comment 13 Jens Petersen 2008-10-24 00:47:04 UTC

I just tried again with gedit-2.24.0 and gtk2-2.14.4 and the behaviour now seems better.

Perhaps this can be closed?

Comment 14 Christian Dywan 2009-09-21 14:43:58 UTC

I can still reproduce the original steps with master, ie 2.17, of today, so I don't think this was fixed.

Comment 15 Erik Xian 2010-04-19 01:06:43 UTC

I've reported this as bug 591364 in the past. This issue, as you are probably aware, extends to several other characters including punctuation, which understandably you would not expect to see repeated in an unbroken line. When it comes to text editing, this is not a big deal. However, the problem arises when TextView is used in more unpredictable situations, such as a buffer for chat clients. Pidgin in particular comes to mind and if I'm not mistaken, Empathy may suffer from this as well. If someone discovers this, although it won't pose any sort of true security threat of any sort, it can still prove to be a real nuisance and a potential blow to readability and accessibility by forcing the user to clear the buffer or scroll to read subsequent long lines.

I confess, both to test this out and raise awareness of this issue, I have personally in the past gone to acquaintances using affected programs and sent them long streams of open brackets. Surely there are more malicious people out there who may be using this to pester their contacts and drive people away from otherwise fine GTK+-based applications.

Comment 16 Erik Xian 2010-04-19 01:09:28 UTC

*** Bug 591364 has been marked as a duplicate of this bug. ***

Comment 17 Jean-Philippe Fleury 2010-12-31 19:12:38 UTC

Created attachment 177313 [details]
Bug occurring in gedit 2.30.3

I have this bug on gedit 2.30.3 and Gnome 2.32.0. See the attached screen shot.

Comment 18 Rafal 2012-03-03 08:35:35 UTC

I suggest to rethink this "feature". The referred Unicode standard annex #14, "Unicode Line Breaking Algorithm", in section "1 Overview and Scoping" says:

    Line breaking, also known as word wrapping...

If the "line breaking" is known as "word wrapping", why are you applying the algorithm in character breaking mode?

I'm not sure the current behavior in "character breaking" mode is desirable by anyone. The expected behavior would be rather like is seen in Vim text editor or like in xterm terminal emulator. Please open the mentioned editor or run the emulator and compare behavior with your control. For example, fill half of line with "a" letters and then start typing spaces. A normal editor behaves as follows:
 ----------------------------------
|aaaaaaaaaaaaaaaa                  |
|     |                            |
|                                  |
|                                  |
 ----------------------------------

The GtkTextView control behaves as follows:
 ----------------------------------
|aaaaaaaaaaaaaaa                   |
|a                     |           |
|                                  |
|                                  |
 ----------------------------------

Which is completely non-intuitive and annoying.

Comment 19 IBBoard 2017-03-19 19:41:48 UTC

Is there anything that can be done about this bug? Because we're now in 2017 (13 years after the bug was reported) and apps are STILL getting hit by this!

See https://github.com/baedert/corebird/issues/657#issuecomment-287559624, where the app grows horizontally because Pango won't break in punctuation but will break within a word.

@Rafal's comment seems the most relevant here - if you're using WRAP_CHAR or WRAP_WORD_CHAR then you're (eventually) character wrapping, not word wrapping, so why worry about the Unicode standard and cause this mismatch of behaviour between letters and punctuation?

Comment 20 GNOME Infrastructure Team 2018-05-22 12:04:56 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/pango/issues/13.