After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 732586 - Alt charset should be ignored within OSC
Alt charset should be ignored within OSC
Status: RESOLVED FIXED
Product: vte
Classification: Core
Component: general
0.37.x
Other Linux
: Normal normal
: ---
Assigned To: VTE Maintainers
VTE Maintainers
Depends on:
Blocks: 731205
 
 
Reported: 2014-07-01 21:10 UTC by Egmont Koblinger
Modified: 2014-11-22 18:24 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
First draft patch: remove iso2022 support and fix all the bugs mentioned here (63.88 KB, patch)
2014-07-24 22:12 UTC, Egmont Koblinger
none Details | Review
Second draft patch: remove iso2022 support and fix all the bugs mentioned here (67.90 KB, patch)
2014-07-25 00:01 UTC, Egmont Koblinger
none Details | Review
Third draft patch: remove iso2022 support and fix all the bugs mentioned here (68.13 KB, patch)
2014-07-26 09:54 UTC, Egmont Koblinger
none Details | Review
Remove iso2022 support and fix bugs, v4 (69.17 KB, patch)
2014-07-27 14:15 UTC, Egmont Koblinger
none Details | Review
Remove iso2022 support and fix bugs, v5 (69.20 KB, patch)
2014-08-26 22:22 UTC, Egmont Koblinger
accepted-commit_now Details | Review
Remove iso2022 support and fix bugs, v6 (69.72 KB, patch)
2014-11-22 17:52 UTC, Egmont Koblinger
committed Details | Review

Description Egmont Koblinger 2014-07-01 21:10:03 UTC
echo -ne '\e(0\e]0;foobar\a'

xterm => window title becomes "foobar"
vte   => window title becomes "°⎺⎺␉▒⎼"

This easily causes garbled title in Midnight Commander, when reflecting the current directory in the title is enabled (F9/Options/Layout/Xterm) and running with a non-UTF-8 locale.  In this case slang/ncurses (my mc is using slang) needs to switch to alt charset mode to draw the box characters.  Setting the window title can't be done via slang/ncurses methods, so mc needs to use a plain printf/write call, and it happens to get printed when slang/ncurses is internally in alt charset (box drawing) mode.  I guess mc could work around this by properly flushing slang/ncurses, but still, vte is broken.

The same goes for OSC 6/7 (current file/directory), OSC 4/10-19 (set colors) and probably a few more, e.g.

echo -e '\e(0\e]4;0;purple\a\e(B\e[30mis this purple?'

xterm => purple
vte   => black
Comment 1 Egmont Koblinger 2014-07-22 16:14:46 UTC
There are two more discrepancies from xterm:

echo -ne '\e(0áű' =>
xterm: áű (string remains unchanged)
vte: áű (undergoes a latin1->utf8 conversion)

echo -ne '\e(0\ec' =>
xterm: full terminal reset (because of <ESC><c>)
vte: nothing (because it's converted to the non-existing escape sequence <ESC><CRsymbol> first)

Looks like the iso2022 decoder is not the right place to perform charset substitution.  It should happen instead after the data has undergone the caps matching and stuff and we're about to literally emit it into the terminal.
Comment 2 Egmont Koblinger 2014-07-22 17:35:23 UTC
I'm trying to figure out this whole charset/iso2022 business...

- In _vte_iso2022_map_A we only override the dollar sign to pounds, whereas xterm changes a whole lot more:

echo -ne '\e(A!@#$%\e(B'
xterm => ¡À£¤¥
vte   => !@#£%

- There are many more similar simple charsets defined in iso2022.c with the comment that they are for VT220 (the corresponding fkey mode has just been dropped from vte). These work in vte but I can't get them to work in xterm (not even if I enable VT220 fkey mode):

echo -ne '\e(4!@#$%\e(B'
xterm => !@#$%
vte   => !¾£$%

- There is CP437 plus are all these CJK charsets, I can get them to work in vte but not in any other terminals, and I can't seem to find any relevant entry in xterm's source:

echo -ne '\e$@\x21\x23\e(B'
xterm => !#
vte   => 。 (U+3002)

Am I doing something wrong with these legacy CJK charsets, or does xterm really not support them?

Could we maybe get rid of all these and only keep support for the box drawing alternate charset?  This would probably simplify the fix for the initial bug.  (Although I guess we'd still need the SI/SO/SS0-SS3/G0-G3/etc stuff which is a complete mystery to me.)
Comment 3 Egmont Koblinger 2014-07-23 12:45:17 UTC
(In reply to comment #2)

> - There are many more similar simple charsets defined in iso2022.c with the
> comment that they are for VT220 (the corresponding fkey mode has just been
> dropped from vte). These work in vte but I can't get them to work in xterm (not
> even if I enable VT220 fkey mode):
> 
> echo -ne '\e(4!@#$%\e(B'
> xterm => !@#$%
> vte   => !¾£$%

In xterm, the feature first needs to be enabled by DECSET 42.  Vte also recognizes this escape and sets a variable, but never uses that value.

> - There is CP437 plus are all these CJK charsets, I can get them to work in vte
> but not in any other terminals, and I can't seem to find any relevant entry in
> xterm's source:
> 
> echo -ne '\e$@\x21\x23\e(B'
> xterm => !#
> vte   => 。 (U+3002)

Following rfc1554's example, the command
echo -ne '\e$(C\e.A\eNA\e(B'
indeed produces an Á in xterm, but not in vte.  I'm yet to find a CJK thing that works the same in the two terminals.

This whole code was added to VTE in 2002, before widespread adoption of Unicode.  Actually, the pure existence of such complicated specifications indicate that Unicode was not well known back then.  Apparently these features are not supported by other modern terminal emulators (konsole, putty, st, terminology, urxvt) and I can hardly imagine applications relying on them.  If someone still needs these, he can use xterm, or luit which claims to support ISO 2022 (I haven't verified but I believe this; its manpage correctly says "request switching [...] using ISO 2022 [...] is discouraged [...] [use] UTF-8 instead").

The world is moving, and so should we, leaving this legacy nightmare behind for good.  I'm hoping that we could totally ditch iso2022.c, move the handling of a few remaining escaping sequences (SS2, SS3...) to caps.c, and move _vte_iso2022_map_0 to vte.c.
Comment 4 Egmont Koblinger 2014-07-23 12:51:08 UTC
Note: upon doing this, we should remember to remove the "9" from the response to "\e[c", not to advertise national replacement charsets.

Also we should check whether advertising us as vt220 on an "\e[c" or "\e[>c" is correct or maybe we should relax to vt100 only.
Comment 5 Egmont Koblinger 2014-07-24 17:00:30 UTC
In VTE, using the British charset "\e(A", the $ sign is converted to £. According to xterm, konsole and terminology, this is incorrect, it's the # sign that should be converted to £.  Another data point suggesting that probably nobody's using this feature.
Comment 6 Egmont Koblinger 2014-07-24 19:13:14 UTC
I guess this is the bare minimum we have to support because all graphical emulators seem to support these and probably many apps rely on them:

^O (SI, shift in): switch to the G0 character set
^N (SO, shift out): switch to the G1 character set

^[ ( B: make the G0 charset a pass-thru (no mapping)
^[ ( 0: make the G0 charset map codepoints 96-126 to line drawing and such

^[ ) B: make the G1 charset a pass-thru (no mapping)
^[ ) 0: make the G1 charset map codepoints 96-126 to line drawing and such

For VT100-completeness only, but no practical use, we might add ^[ ( A and ^[ ) A that switch G0 or G1 respectively to British keytable where # is replaced by £.

All other things, like the G2 and G3 charsets, single-shift escapes, two-byte CJK encodings can probably safely be dropped.

Here's a good doc:
http://en.wikipedia.org/wiki/ISO/IEC_2022

This is also relevant to some extent:
https://www.midnight-commander.org/ticket/2339
http://www.chiark.greenend.org.uk/~sgtatham/putty/wishlist/utf8-plus-vt100.html

Note that currently vte partially handles these in iso2022 and partially in caps/vteseq (the ^[(B and ^[(0 bits, but not their G1 counterparts). This results in a bug that G1 can't be set to the line drawing charset in VTE. Also, VTE falsely believes the state belongs to the screen, whereas it should belong to the terminal itself (i.e. kept across a change between normal and alternate screen).
Comment 7 Egmont Koblinger 2014-07-24 22:12:04 UTC
Created attachment 281642 [details] [review]
First draft patch: remove iso2022 support and fix all the bugs mentioned here

Here's a first draft. It already works and can be tested if any apps break.

- All the bugs mentioned in this bugreport are fixed.

- Terribly complicated legacy iso2022 code removed, yay!

- I've also removed a workaround for a glibc bug - it's been fixed in glibc for 5+ years.

- iso2022.c is now a very thin wrapper over iconv.

TODO:

- Should rename things so we don't refer to "iso2022" anywhere.

- iso2022.c can probably be further simplified (_vte_iso2022_process => process_block => process_cdata callchain should be reduced to 1 or 2 methods).

- Minor regression: some of the previously used escape sequences are no longer recognized and are being printed literally to the terminal, they should ideally be ignored.

- The test case in main() prints iso2022 codes, it no longer makes sense.

- Needs to reindent the code, I've left it unindented so the patch is smaller and easier to review.

- Shouldn't forget to "git rm *unitables*", again intentionally skipped for now so the patch is smaller.

- Needs to revise the response for \e[c and \e[>c.
Comment 8 Egmont Koblinger 2014-07-25 00:01:34 UTC
Created attachment 281649 [details] [review]
Second draft patch: remove iso2022 support and fix all the bugs mentioned here

Further cleanups.

Also notice that the patch brings a 3-5% speed improvement when cating a large file :)
Comment 9 Egmont Koblinger 2014-07-26 09:54:33 UTC
Created attachment 281758 [details] [review]
Third draft patch: remove iso2022 support and fix all the bugs mentioned here

Remove advertising national replacement charset, plus one minor cleanup.
Comment 10 Egmont Koblinger 2014-07-26 20:58:36 UTC
One more bug I've found that is fixed by the patch :)

./src/vte-2.91 --cjk-width=wide
cat doc/boxes.txt

Line drawing characters that are printed by direct Unicode codepoints are double wide as expected.  However, line drawing characters that are printed by alt charset escape sequences remain single wide.  According to "xterm -cjk_width", they should be double wide too (and obviously this is what makes sense).

On a related note, this test file should be fixed to contain spaces that become double wide instead of normal spaces, so it looks good with double width too.
Comment 11 Egmont Koblinger 2014-07-26 22:40:05 UTC
Public API doc of vte_terminal_set_cjk_ambiguous_width() says:

 * This setting only takes effect the next time the terminal is reset, either
 * via escape sequence or with vte_terminal_reset().

With these changes luckily it's no longer the case.
Comment 12 Egmont Koblinger 2014-07-27 14:15:59 UTC
Created attachment 281817 [details] [review]
Remove iso2022 support and fix bugs, v4

(minor cleanups)
Comment 13 Behdad Esfahbod 2014-07-27 17:26:31 UTC
Yes please!
Comment 14 Egmont Koblinger 2014-07-27 23:36:13 UTC
See also bug 731208 comment 3 onwards (problems with \e[@ and \e[G toggling back-n-forth between native charset and utf-8; keeping that feature may require to keep some of iso2022).
Comment 15 Egmont Koblinger 2014-08-03 18:41:05 UTC
(In reply to comment #13)
> Yes please!

It's quite a big change, I'm waiting for ChPe's approval too.
Comment 16 Egmont Koblinger 2014-08-26 22:22:21 UTC
Created attachment 284562 [details] [review]
Remove iso2022 support and fix bugs, v5

(just a git merge)
Comment 17 Egmont Koblinger 2014-11-22 17:52:46 UTC
Created attachment 291247 [details] [review]
Remove iso2022 support and fix bugs, v6

(just a git merge and a new comment)
Comment 18 Egmont Koblinger 2014-11-22 18:24:36 UTC
Committed.

A future cleanup comment might completely remove the leftover weak layer provided by iso2022.c (it does input decoding only right now).