GNOME Bugzilla – Bug 133211
Malformed output around multibyte characters
Last modified: 2005-08-29 11:48:16 UTC
VTE (as used in gnome-terminal 2.4.2) sometimes malfunctions in such a way that a line that contains multibyte utf-8 characters is sometimes displayed erroneously in such a way that, well, here's an example: 14:39 < suo> oleellista on kuitenkin huomata, että ieee-flotareissa kaikki luvut ovat etumerkillisi14:39 < suo> oleellista on kuitenkin huomata, että ieee-flotareissa kaikki luvut ovat etumerkillisiä. myös 0.0 on etumerkillinen, se on So the line starts to display from the beginning on encountering a multibyte character. I have managed to get this kind of output in both irssi and slrn, using UTF-8-locales (though as slrn doesn't support UTF-8, I use screen's translation for that). This is over an ssh connection, but I seem to recall that it sometimes did it locally as well under similar conditions - not sure on that, though. screen-4.0.2 is used on the remote machine, but I doubt that it is the culprit; I haven't been able to reproduce this with xterm. I even tried displaying the same screen session in both gnome-terminal and xterm and gnome-terminal sometimes corrupted the output while xterm simultaneously displayed the same thing properly.
Multibyte strings being broken after being partially output suggests that somewhere, some piece of software is using iconv without performing any error recovery. If this is happening before the terminal gets the text, there's nothing to be done in the terminal to fix it. Can you re-run the program which triggers this behavior under "script" to capture its output, and attach it to this report along with information on the locale you were in (both inside and outside of screen)? That'll help pinpoint where things are going wonky, and if it's the terminal, what input causes the terminal to misbehave.
Created attachment 26849 [details] Typescript output from when the problem was reproduced. Here's the requested script output. This was done by running script and then attaching to my irssi screen from inside. Note that I had to hit Ctrl-L once to get the bug to reproduce. After that, the 8th similar line read (on the terminal): 09:23 <mjr> Ääliö älä lyö, ö09:23 <mjr> Ääliö älä lyö, ööliä läikkyy And the 14th line read: 09:23 <mjr> Ääli09:23 <mjr> Ääliö älä lyö, ööliä läikkyy However, in the log there's nothing special on those lines, so I'd still have to be partial to gnome-terminal being the culprit. Locales are as follows both inside and out of the screen, on the machine running screen: LANG=en_US.UTF-8 LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE=fi_FI.UTF-8 LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=fi_FI.UTF-8 LC_NAME=fi_FI.UTF-8 LC_ADDRESS=fi_FI.UTF-8 LC_TELEPHONE=fi_FI.UTF-8 LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=fi_FI.UTF-8 LC_ALL= And here's the locales on the computer I'm connecting to my irssi screen from (and running gnome-terminal at): LANG=en_US.UTF-8 LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=en_GB.UTF-8 LC_TIME=en_GB.UTF-8 LC_COLLATE=fi_FI.UTF-8 LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=fi_FI.UTF-8 LC_NAME=fi_FI.UTF-8 LC_ADDRESS=fi_FI.UTF-8 LC_TELEPHONE=fi_FI.UTF-8 LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=fi_FI.UTF-8 LC_ALL= (Basically just the GB/US difference. Both are Debian GNU/Linux unstable boxes.) Were I to hazard an uneducated guess, maybe gnome-terminal misinterprets utf-8 characters when it doesn't get the whole character data in one read operation, and this becomes more evident when using remote terminal sessions. Or maybe not. Oh, incidentally, I upgraded to Gnome 2.6 from Debian's experimental, and this still happens with gnome-terminal 2.6.0.
Oh right, VTE currently libvte4 0.11.10-8 as found in Debian.
The original reporter here; this bug still lives on rather persistently at gnome-terminal 2.6.1-4 and libvte4 0.11.11-4 as packaged by Debian. Some additional testing has revealed that when I set ISO-8859-1 as the terminal encoding and tell screen to convert my UTF-8 terminal sessions to that charset for the terminal, everything seems to work fine; this further points to errors in multibyte character handling, as suggested (and as opposed to eg. problems with all non-ASCII). Also, I coded a short test program that writes "foobar foobar " and half of the U+00E4 LATIN SMALL LETTER A WITH DIAERESIS encoded in UTF-8, sleeps for a second, and then prints the other half with " xyzzy xyzzy". The program output reliably displays "foobar foobar foobar foobar ä xyzzy xyzzy", that is double the foobars there should be. On xterm the program displays only two foobars as is proper. (Attaching program source after this note.) That said, I'm going to put aside a few hours tonight to see if this is something that I can fix myself with not too much effort considering my total unfamiliarity with the code; I'll put a note here when I fix it or give up. In the latter case, I'd appreciate some attention into this, as it is quite an obnoxious bug for those of us using Gnome's "native" charset (and not just an ASCII subset of it).
Created attachment 29141 [details] A test program for reproducing the problem. The test program referred to in my previous comment.
Righto, I readily give up since Sid's build-dependencies are broken for vte at the moment, therefore precluding building a debug-version without manual labor; I'm not feeling up to going blind into unfamiliar C code right now. I hope the test program is helpful, anyway.
Ok, I've had a dive into the source, and have managed to dig up likely places where the problem occurs. I haven't managed to fix it yet, but I'll share at this point for the record. When running the test program above, libvte debug output includes the following interesting tidbits: When the "foobar foobar Ã" is printed, it says quite correctly: "Handler processing 15 bytes." After this comes the interesting part: "0 chars and 15 bytes left to process." This output comes from the vte_terminal_process_incoming function, and indicates that terminal->pvt->incoming retains all of the input. Obviously, it should only retain the last byte in this case. This data is then reprocessed. I could've fixed this with an ugly, ugly hack, but instead I tried to find the actual problem. Thus far I've come to the conclusion that the bug is _probably_ within _vte_iso2022_process which is called from vte_terminal_process_incoming and given the terminal->pvt->incoming buffer as a parameter so that it could update it properly. At this moment I think it, more spesifically, might be that at the end of _vte_iso2022_process, _vte_buffer_consume isn't being called in some cases where it should. My head hurts too much for me to poke more at it right now.
Checked the priority guidelines and therefore updated the priority, as this does match the following, methinks: "Seriously broken, but not as high impact. Should be fixed before next major release. Frequently includes cosmetic bugs of particularly high visibility,"
Hmmm, isn't it the same as bug 154896?
Seems to be the same.
Ok. marking as duplicate then. *** This bug has been marked as a duplicate of 154896 ***