GNOME Bugzilla – Bug 730154
Support C1 CSI (0x9B)
Last modified: 2019-12-16 09:08:02 UTC
Found at https://bugs.launchpad.net/ubuntu/+source/vte3/+bug/1297051
In at least xterm, konsole and putty, when running with ISO-8859-x charset, the 0x9B byte is equivalent to ESC [. E.g. these are the same:
echo -e '\x1B[31mred\x1B[0m'
echo -e '\x9B31mred\x9B0m'
(As far as I understand the docs, in UTF-8 mode the UTF-8 encoded version of U+009B should have the same effect, but this doesn't work for me in any of these terminals:
echo -e '\xC2\x9B31mred\xC2\x9B0m'
VTE doesn't support the C1 CSI character.
(Sorry, I was wrong: The UTF-8 version works in konsole.)
Created attachment 282475 [details] [review]
Created attachment 282533 [details] [review]
Addressing issues in the downstream bugreport
The patch introduces bug 737792.
It's the U+0090 and U+0098 characters (in whichever locale) that cause hang when printed.
$ echo -ne '\u0090'
$ echo -ne '\u0098'
behdad:~ 0$ echo -ne '\u0090'
Лbehdad:~ 0$ echo -ne '\u0098'
Okay, it doesn't really "hang", it's just waiting for the terminating ST of "device-control-string" or "start-of-string / end-of-string", and in the current design if the escape sequence was opened by C1 notation then the terminating character also has to be a C1 ST (U+009C). Blindly typing «echo -ne '\u9c'» solves the problem.
So it boils down to these issues:
- These two escapes take a string parameter which can be arbitrarily long, and (as per xterm) not even a newline terminates them. This can lead to an apparent hang both in xterm and in vte, both with C0 and C1, e.g. «echo -ne '\e[P'» hangs xterm, assuming your PS1 and PROMPT_COMMAND don't issue any escapes.
- xterm terminates these sequences upon seeing a plain-old-fashioned escape charcter. vte does not terminate on these, it requires either \e\\ (0x1b 0x5C) or U+009C, moreover, it has to match the opening one of the sequence.
- Coincidentally, vte happens to set PROMPT_COMMAND to emit OSC 0/7 which in turn contains a \e\\, so printing the prompt happens to terminate the \eP or \eX sequence (but only the C0 representation).
- To be more robust, vte should terminate these sequences upon seeing a lone escape character, or upon seeing either of the C0/C1 terminators regardless whether C0 or C1 was used to start the sequence.
- We could see if it's safe to terminate these sequences on a newline, or after a certain number of bytes.
(In reply to comment #6)
> behdad:~ 0$ echo -ne '\u0090'
> Лbehdad:~ 0$ echo -ne '\u0098'
> ؛behdad:~ 0$
With the patch from comment #3?
(In reply to comment #8)
> (In reply to comment #6)
> > behdad:~ 0$ echo -ne '\u0090'
> > Лbehdad:~ 0$ echo -ne '\u0098'
> > ؛behdad:~ 0$
> With the patch from comment #3?
No no no no. My system vte.
Created attachment 291239 [details] [review]
To address comment 4 & 7, here's a new version that adds all possible mixed usages of C0 and C1 to the table.
(In the mean time, increasing VTE_TABLE_MAX_LITERAL no longer seems to be necessary, probably due to some other upstream change.)
Committed to master (future 0.40).
xterm-319 just added a section to ctlseqs explaining why it deliberately doesn't support C1 in UTF-8.
It clearly states that in UTF-8 you cannot have single-byte C1 values because that'd conflict with valid bytes of UTF-8. Clear story.
Then it says "Each byte sent to the terminal can be unambiguously determined to fall into one of a few categories (C0, C1 and graphic characters)" as the reason for a double-byte sequence can't form a C1. Okay, the specs may say so, but konsole and vte managed to implement this without any problem, so maybe the problem is that the spec is older than UTF-8 and should've been updated.
Anyway, not that we have it, I think it's fine to keep it.
Anyway, now* that we have it, I think it's fine to keep it.
Continued in https://gitlab.gnome.org/GNOME/vte/issues/209.