GNOME Bugzilla – Bug 730154
Support C1 CSI (0x9B)
Last modified: 2019-12-16 09:08:02 UTC
Found at https://bugs.launchpad.net/ubuntu/+source/vte3/+bug/1297051 In at least xterm, konsole and putty, when running with ISO-8859-x charset, the 0x9B byte is equivalent to ESC [. E.g. these are the same: echo -e '\x1B[31mred\x1B[0m' echo -e '\x9B31mred\x9B0m' (As far as I understand the docs, in UTF-8 mode the UTF-8 encoded version of U+009B should have the same effect, but this doesn't work for me in any of these terminals: echo -e '\xC2\x9B31mred\xC2\x9B0m' ) VTE doesn't support the C1 CSI character. http://en.wikipedia.org/wiki/C0_and_C1_control_codes http://en.wikipedia.org/wiki/Control_Sequence_Introducer#Sequence_elements
(Sorry, I was wrong: The UTF-8 version works in konsole.)
Created attachment 282475 [details] [review] Fix
Created attachment 282533 [details] [review] Fix v2 Addressing issues in the downstream bugreport
The patch introduces bug 737792.
It's the U+0090 and U+0098 characters (in whichever locale) that cause hang when printed. $ echo -ne '\u0090' $ echo -ne '\u0098'
behdad:~ 0$ echo -ne '\u0090' Лbehdad:~ 0$ echo -ne '\u0098' ؛behdad:~ 0$
Okay, it doesn't really "hang", it's just waiting for the terminating ST of "device-control-string" or "start-of-string / end-of-string", and in the current design if the escape sequence was opened by C1 notation then the terminating character also has to be a C1 ST (U+009C). Blindly typing «echo -ne '\u9c'» solves the problem. So it boils down to these issues: - These two escapes take a string parameter which can be arbitrarily long, and (as per xterm) not even a newline terminates them. This can lead to an apparent hang both in xterm and in vte, both with C0 and C1, e.g. «echo -ne '\e[P'» hangs xterm, assuming your PS1 and PROMPT_COMMAND don't issue any escapes. - xterm terminates these sequences upon seeing a plain-old-fashioned escape charcter. vte does not terminate on these, it requires either \e\\ (0x1b 0x5C) or U+009C, moreover, it has to match the opening one of the sequence. - Coincidentally, vte happens to set PROMPT_COMMAND to emit OSC 0/7 which in turn contains a \e\\, so printing the prompt happens to terminate the \eP or \eX sequence (but only the C0 representation). - To be more robust, vte should terminate these sequences upon seeing a lone escape character, or upon seeing either of the C0/C1 terminators regardless whether C0 or C1 was used to start the sequence. - We could see if it's safe to terminate these sequences on a newline, or after a certain number of bytes.
(In reply to comment #6) > behdad:~ 0$ echo -ne '\u0090' > Лbehdad:~ 0$ echo -ne '\u0098' > ؛behdad:~ 0$ With the patch from comment #3?
(In reply to comment #8) > (In reply to comment #6) > > behdad:~ 0$ echo -ne '\u0090' > > Лbehdad:~ 0$ echo -ne '\u0098' > > ؛behdad:~ 0$ > > With the patch from comment #3? No no no no. My system vte.
Created attachment 291239 [details] [review] Fix v3 To address comment 4 & 7, here's a new version that adds all possible mixed usages of C0 and C1 to the table. (In the mean time, increasing VTE_TABLE_MAX_LITERAL no longer seems to be necessary, probably due to some other upstream change.)
Committed to master (future 0.40).
xterm-319 just added a section to ctlseqs explaining why it deliberately doesn't support C1 in UTF-8. It clearly states that in UTF-8 you cannot have single-byte C1 values because that'd conflict with valid bytes of UTF-8. Clear story. Then it says "Each byte sent to the terminal can be unambiguously determined to fall into one of a few categories (C0, C1 and graphic characters)" as the reason for a double-byte sequence can't form a C1. Okay, the specs may say so, but konsole and vte managed to implement this without any problem, so maybe the problem is that the spec is older than UTF-8 and should've been updated. Anyway, not that we have it, I think it's fine to keep it.
Anyway, now* that we have it, I think it's fine to keep it.
Continued in https://gitlab.gnome.org/GNOME/vte/issues/209.