After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 154896 - Serious display corruption in UTF-8 mode
Serious display corruption in UTF-8 mode
Status: RESOLVED FIXED
Product: vte
Classification: Core
Component: general
0.11.x
Other Linux
: High critical
: ---
Assigned To: VTE Maintainers
Nalin Dahyabhai
: 133211 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2004-10-08 13:46 UTC by Egmont Koblinger
Modified: 2005-08-29 11:48 UTC
See Also:
GNOME target: ---
GNOME version: 2.7/2.8


Attachments
the typescript file which triggers vte bug when cat'ed to the terminal (2.73 KB, application/octet-stream)
2004-10-08 13:47 UTC, Egmont Koblinger
  Details
the "source" I used to create the typescript (333 bytes, application/octet-stream)
2004-10-08 13:48 UTC, Egmont Koblinger
  Details
slowcat source (502 bytes, text/plain)
2004-10-08 13:49 UTC, Egmont Koblinger
  Details
screenshot: this time gnome-terminal correctly executed "cat typescript" (28.23 KB, image/png)
2004-10-08 13:54 UTC, Egmont Koblinger
  Details
screenshot: and this time it didn't (28.21 KB, image/png)
2004-10-08 13:55 UTC, Egmont Koblinger
  Details
foo.txt used in the much simpler test case (82 bytes, application/octet-stream)
2004-10-09 20:30 UTC, Egmont Koblinger
  Details
screenshot for the much simpler test case: dd'ing foo.txt to the terminal (35.30 KB, image/png)
2004-10-09 20:31 UTC, Egmont Koblinger
  Details
proposed works-for-me patch (484 bytes, patch)
2004-10-10 13:32 UTC, Egmont Koblinger
none Details | Review

Description Egmont Koblinger 2004-10-08 13:46:03 UTC
In UTF-8 mode, under heavy output, vte very often misbehaves and the terminal's
content goes completely crazy.

Some historical background, just for fun:
I first noticed that while using the 'joe' editor to edit UTF-8 encoded
Hungarian text, the title bar often gets corrupted, but only if I hold the
down arrow or pagedown for a longer time, not if I just press them once.
Later I noticed other strange things, e.g. editing an at most 80 columns wide
text file in a much wider terminal caused some words to flash up in
columns beyond 80 where they shouldn't have. All this yet again only when
intensively scrolling inside joe in UTF-8 mode.

After some hours of testing the phenomenon it's clear that although the
behavior is insonsistent, it's a very easily reproducible vte bug, absolutely
unrelated to ncurses, joe or anything else.

In all circumstanses I use a fully UTF-8 system, all terminals mentioned and
locale settings set accordingly. The bug I describe here does not occur with
8-bit legacy charsets.

Download the file 'typescript'.

(If you're interested, typescript was created with the command
  script -c 'joe hungarian_alphabet'
and then pressed the down arrow until the cursor reached the end of file, then
exited with ^C and removed the first and last several unuseful lines of
typescript. All this inside a gnome-terminal of 80x24 size, TERM=gnome (from
ncurses 5.4 database), joe 3.1.)

So now you have typescript, gunzip it (I just gzipped them to make sure no-one
tries to perform a character set conversion) and cat it to some terminals.

All the following test work perfectly under xterm and konsole (UTF-8 mode, of
course) but are buggy in gnome-terminal and also in standalone vte.

A simple 'cat' in a 80x24 window sometimes leads to correct result, but more
often you only see the correct status bar (topmost line) for a short time and
later it becomes damaged. Also note that in the rightmost column you see some
characters, though this column is not used by joe.

Trying the same test in a wider terminal window shows that while xterm and
konsole (and rarely gnome-terminal) correctly still use only 79 columns,
gnome-terminal often produces much longer lines.

If outputting the text in smaller entities, maybe only one byte at a time, the
behavior gets much worse and more easily observable in gnome-terminal. Simply
try this:
  dd if=typescript bs=1 2>/dev/null

I also quickly wrote a slowcat program (attached to this bug entry) which
is similar to dd (the value of bs can be given by defining N at compile time)
but it is intentionally a little bit slower. Try this command:
  ./slowcat typescript
and you can easily interrupt it before the first screenful of letters are
printed and see some duplicated parts of text (e.g. "f g gy h i g gy h i í j")
even in the 2nd line of the terminal (the 1st line of the text inside the
editor) which cause overflowing lines.
Comment 1 Egmont Koblinger 2004-10-08 13:47:34 UTC
Created attachment 32392 [details]
the typescript file which triggers vte bug when cat'ed to the terminal
Comment 2 Egmont Koblinger 2004-10-08 13:48:17 UTC
Created attachment 32393 [details]
the "source" I used to create the typescript
Comment 3 Egmont Koblinger 2004-10-08 13:49:17 UTC
Created attachment 32394 [details]
slowcat source
Comment 4 Egmont Koblinger 2004-10-08 13:54:44 UTC
Created attachment 32395 [details]
screenshot: this time gnome-terminal correctly executed "cat typescript"
Comment 5 Egmont Koblinger 2004-10-08 13:55:19 UTC
Created attachment 32396 [details]
screenshot: and this time it didn't
Comment 6 Egmont Koblinger 2004-10-09 20:29:10 UTC
Ohh, the whole story is much much simpler. You can safely forget my original
long and complicated report and try this very simple test case.

Download foo.txt.gz, gunzip it. It's absolutely nothing special, just several
Hungarian and Italian accented letters appended to each other in UTF-8 encoding,
all these letters take 2 bytes in UTF-8.

Try something like
  dd if=foo.txt bs=2 2>/dev/null
where bs is an even number (2, 4, 6...). The result is always perfect.

Now try this with an odd blocksize (1, 3, 5...), or prepend a space to foo.txt
and try with an even blocksize. E.g.
  dd if=foo.txt bs=1 2>/dev/null
The result is very often buggy.

Most likely the bug occurs if the UTF-8 bytes of a non-ascii character are
split amongst consecutive write() calls.

Screenshot follows.
Comment 7 Egmont Koblinger 2004-10-09 20:30:13 UTC
Created attachment 32433 [details]
foo.txt used in the much simpler test case
Comment 8 Egmont Koblinger 2004-10-09 20:31:04 UTC
Created attachment 32434 [details]
screenshot for the much simpler test case: dd'ing foo.txt to the terminal
Comment 9 Egmont Koblinger 2004-10-09 21:05:26 UTC
Oh, I've got some more details.

I've found a slowcat shipped by vte, I shouldn't have written my own version :-)

Now I execute this command:
while true; do /usr/lib/vte/slowcat foo.txt; done
(foo.txt is the same file attached previously)

If I leave this command running and do not do anything special, then the
output is perfect.

If I press the 1st mouse button inside vte (or gnome-terminal's terminal area)
for several seconds, then the bug immediately arises. While holding the mouse
button, the application keeps on running but the terminal does not update.
When releasing the mouse button, updating the terminal's content goes wrong.

So it seems to me that bug only occurs if all of the following conditions are 
met:
- we're in UTF-8 mode
- UTF-8 byte sequences are wrapped around in different write() calls by the
  application
- the terminal itself is not fast enough to immediately display all the
  received data, before the applications sends new data again.
Comment 10 Egmont Koblinger 2004-10-10 12:32:09 UTC
And yet again, a much much simpler test case.

$ echo -ne '\303\241\303'; sleep 1; echo -e '\251'

The output should be, and without the sleep it is, "בי".
However, with the sleep as above, ב is duplicated, the output is "בבי".
Comment 11 Egmont Koblinger 2004-10-10 13:31:49 UTC
A one-line patch follows. This is just a works-for-me kind of patch, I'm not
sure if it's perfect or free of side effects.

Line numbers are according to the 0.11.11 tarball. The only file involved is
src/iso2022.c.

1725: note that 'block' is a standalone struct, not a pointer
1734: the 'block = ...' command hence copies a structure, not just sets a
 pointer
1774: block.start is perfectly incremented by 2 (in my latest test case)
1808: the 'block = ...' command yet again does not set a pointer (this command
 would have no effect then) but it copies the whole content of the structure,
 overwriting previous changes, which sets block.start back to 0 instead of 2.
1814: drop the first block.start bytes from the buffer, which is supposed to be
 2 bytes, but it is 0.

My patch simply removes the buggy and superfluous line 1808.
Comment 12 Egmont Koblinger 2004-10-10 13:32:52 UTC
Created attachment 32443 [details] [review]
proposed works-for-me patch
Comment 13 Kjartan Maraas 2004-10-18 10:24:29 UTC
Please review this patch.
Comment 14 Egmont Koblinger 2004-10-29 10:46:36 UTC
I've been using this patch for nearly three weeks now, I found absolutely no
side effects, but all the misbehaviours described above have disappeared.
Comment 15 Olav Vitters 2004-10-29 15:21:36 UTC
There are various bugs about display corruption and UTF-8 (the bugs with irsii
in the summary). However, since this one has a simple testcase and a patch,
marking new.
Comment 16 Kjartan Maraas 2005-02-15 10:21:32 UTC
This works for me with the Fedora Core 3 packages, but is broken with CVS
sources. I can't see this exact patch in the Fedora SRPM, so maybe they fixed it
in a different manner there? We *really* need to get the fedora patches into CVS
ASAP.
Comment 17 Egmont Koblinger 2005-02-16 10:07:37 UTC
AFAIK it's not yet fixed in Fedora Core 3, but was fixed shortly after Core 3,
on Nov 9 2004. I don't know if there was a backport to Core 3 or it only
appeared in Rawhide. They created a different fix to the problem based on this
bugreport, their patch within the SRPM is called "dont-copy-blocks".
I also recommend applying the Fedora patch to CVS as that one is a more proper
solution leading to a nicer source code, my one is just a quick fix.
Comment 18 Kjartan Maraas 2005-03-03 11:37:13 UTC
The patch was commited to CVS. Closing this.
Comment 19 Kjartan Maraas 2005-08-29 11:48:17 UTC
*** Bug 133211 has been marked as a duplicate of this bug. ***