After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 133211 - Malformed output around multibyte characters
Malformed output around multibyte characters
Status: RESOLVED DUPLICATE of bug 154896
Product: vte
Classification: Core
Component: general
0.11.x
Other Linux
: High normal
: ---
Assigned To: VTE Maintainers
Nalin Dahyabhai
Depends on:
Blocks:
 
 
Reported: 2004-02-02 13:06 UTC by mjrauhal
Modified: 2005-08-29 11:48 UTC
See Also:
GNOME target: ---
GNOME version: 2.9/2.10


Attachments
Typescript output from when the problem was reproduced. (4.24 KB, application/octet-stream)
2004-04-20 06:37 UTC, mjrauhal
Details
A test program for reproducing the problem. (159 bytes, text/plain)
2004-07-01 18:32 UTC, Mikko Rauhala
Details

Description mjrauhal 2004-02-02 13:06:56 UTC
VTE (as used in gnome-terminal 2.4.2) sometimes malfunctions in such a way
that a line that contains multibyte utf-8 characters is sometimes displayed
erroneously in such a way that, well, here's an example:

14:39 < suo> oleellista on kuitenkin huomata, että ieee-flotareissa kaikki
luvut ovat etumerkillisi14:39 < suo> oleellista on kuitenkin huomata, että
ieee-flotareissa kaikki luvut ovat etumerkillisiä. myös 0.0 on
etumerkillinen, se on

So the line starts to display from the beginning on encountering a
multibyte character. I have managed to get this kind of output in both
irssi and slrn, using UTF-8-locales (though as slrn doesn't support UTF-8,
I use screen's translation for that).

This is over an ssh connection, but I seem to recall that it sometimes
did it locally as well under similar conditions - not sure on that, though.
screen-4.0.2 is used on the remote machine, but I doubt that it is the
culprit; I haven't been able to reproduce this with xterm. I even tried
displaying the same screen session in both gnome-terminal and xterm and
gnome-terminal sometimes corrupted the output while xterm simultaneously
displayed the same thing properly.
Comment 1 Nalin Dahyabhai 2004-04-20 05:46:31 UTC
Multibyte strings being broken after being partially output suggests that
somewhere, some piece of software is using iconv without performing any error
recovery.  If this is happening before the terminal gets the text, there's
nothing to be done in the terminal to fix it.  Can you re-run the program which
triggers this behavior under "script" to capture its output, and attach it to
this report along with information on the locale you were in (both inside and
outside of screen)?  That'll help pinpoint where things are going wonky, and if
it's the terminal, what input causes the terminal to misbehave.
Comment 2 mjrauhal 2004-04-20 06:37:16 UTC
Created attachment 26849 [details]
Typescript output from when the problem was reproduced.

Here's the requested script output. This was done by running script and then
attaching to my irssi screen from inside.

Note that I had to hit Ctrl-L once to get the bug to reproduce. After that, the
8th similar line read (on the terminal):

09:23 <mjr> Ääliö älä lyö, ö09:23 <mjr> Ääliö älä lyö, ööliä läikkyy

And the 14th line read:

09:23 <mjr> Ääli09:23 <mjr> Ääliö älä lyö, ööliä läikkyy

However, in the log there's nothing special on those lines, so I'd still have
to be partial to gnome-terminal being the culprit.

Locales are as follows both inside and out of the screen, on the machine
running screen:

LANG=en_US.UTF-8
LC_CTYPE=fi_FI.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=fi_FI.UTF-8
LC_MONETARY=fi_FI.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=fi_FI.UTF-8
LC_NAME=fi_FI.UTF-8
LC_ADDRESS=fi_FI.UTF-8
LC_TELEPHONE=fi_FI.UTF-8
LC_MEASUREMENT=fi_FI.UTF-8
LC_IDENTIFICATION=fi_FI.UTF-8
LC_ALL=

And here's the locales on the computer I'm connecting to my irssi screen from
(and running gnome-terminal at):

LANG=en_US.UTF-8
LC_CTYPE=fi_FI.UTF-8
LC_NUMERIC=en_GB.UTF-8
LC_TIME=en_GB.UTF-8
LC_COLLATE=fi_FI.UTF-8
LC_MONETARY=fi_FI.UTF-8
LC_MESSAGES=en_GB.UTF-8
LC_PAPER=fi_FI.UTF-8
LC_NAME=fi_FI.UTF-8
LC_ADDRESS=fi_FI.UTF-8
LC_TELEPHONE=fi_FI.UTF-8
LC_MEASUREMENT=fi_FI.UTF-8
LC_IDENTIFICATION=fi_FI.UTF-8
LC_ALL=

(Basically just the GB/US difference. Both are Debian GNU/Linux unstable
boxes.)

Were I to hazard an uneducated guess, maybe gnome-terminal misinterprets utf-8
characters when it doesn't get the whole character data in one read operation,
and this becomes more evident when using remote terminal sessions. Or maybe
not.

Oh, incidentally, I upgraded to Gnome 2.6 from Debian's experimental, and this
still happens with gnome-terminal 2.6.0.
Comment 3 mjrauhal 2004-04-20 08:01:31 UTC
Oh right, VTE currently libvte4 0.11.10-8 as found in Debian.
Comment 4 Mikko Rauhala 2004-07-01 18:26:25 UTC
The original reporter here; this bug still lives on rather persistently at
gnome-terminal 2.6.1-4 and libvte4 0.11.11-4 as packaged by Debian.

Some additional testing has revealed that when I set ISO-8859-1 as the terminal
encoding and tell screen to convert my UTF-8 terminal sessions to that charset
for the terminal, everything seems to work fine; this further points to errors
in multibyte character handling, as suggested (and as opposed to eg. problems
with all non-ASCII).

Also, I coded a short test program that writes "foobar foobar " and half of the
U+00E4 LATIN SMALL LETTER A WITH DIAERESIS encoded in UTF-8, sleeps for a
second, and then prints the other half with " xyzzy xyzzy". The program output
reliably displays "foobar foobar foobar foobar ä xyzzy xyzzy", that is double
the foobars there should be. On xterm the program displays only two foobars as
is proper. (Attaching program source after this note.)

That said, I'm going to put aside a few hours tonight to see if this is
something that I can fix myself with not too much effort considering my total
unfamiliarity with the code; I'll put a note here when I fix it or give up.

In the latter case, I'd appreciate some attention into this, as it is quite an
obnoxious bug for those of us using Gnome's "native" charset (and not just an
ASCII subset of it).
Comment 5 Mikko Rauhala 2004-07-01 18:32:39 UTC
Created attachment 29141 [details]
A test program for reproducing the problem.

The test program referred to in my previous comment.
Comment 6 Mikko Rauhala 2004-07-01 19:06:14 UTC
Righto, I readily give up since Sid's build-dependencies are broken for vte at
the moment, therefore precluding building a debug-version without manual labor;
I'm not feeling up to going blind into unfamiliar C code right now.

I hope the test program is helpful, anyway.
Comment 7 Mikko Rauhala 2004-08-12 20:09:44 UTC
Ok, I've had a dive into the source, and have managed to dig up likely places
where the problem occurs. I haven't managed to fix it yet, but I'll share at
this point for the record.

When running the test program above, libvte debug output includes the following
interesting tidbits: When the "foobar foobar Ã" is printed, it says quite
correctly: "Handler processing 15 bytes." After this comes the interesting part:
"0 chars and 15 bytes left to process."

This output comes from the vte_terminal_process_incoming function, and indicates
that terminal->pvt->incoming retains all of the input. Obviously, it should only
retain the last byte in this case. This data is then reprocessed.

I could've fixed this with an ugly, ugly hack, but instead I tried to find the
actual problem. Thus far I've come to the conclusion that the bug is _probably_
within _vte_iso2022_process which is called from vte_terminal_process_incoming
and given the terminal->pvt->incoming buffer as a parameter so that it could
update it properly.

At this moment I think it, more spesifically, might be that at the end of
_vte_iso2022_process, _vte_buffer_consume isn't being called in some cases where
it should. My head hurts too much for me to poke more at it right now.
Comment 8 mjrauhal 2004-11-21 14:49:29 UTC
Checked the priority guidelines and therefore updated the priority, as this does
match the following, methinks:

"Seriously broken, but not as high impact. Should be fixed before next major
release. Frequently includes cosmetic bugs of particularly high visibility,"
Comment 9 Egmont Koblinger 2005-02-06 22:20:46 UTC
Hmmm, isn't it the same as bug 154896?
Comment 10 mjrauhal 2005-02-07 08:16:44 UTC
Seems to be the same.
Comment 11 Kjartan Maraas 2005-08-29 11:48:16 UTC
Ok. marking as duplicate then.

*** This bug has been marked as a duplicate of 154896 ***