After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 730154 - Support C1 CSI (0x9B)
Support C1 CSI (0x9B)
Status: RESOLVED FIXED
Product: vte
Classification: Core
Component: general
0.37.x
Other Linux
: Normal minor
: ---
Assigned To: VTE Maintainers
VTE Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-05-14 23:54 UTC by Egmont Koblinger
Modified: 2019-12-16 09:08 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Fix (1.25 KB, patch)
2014-08-04 19:29 UTC, Egmont Koblinger
none Details | Review
Fix v2 (2.25 KB, patch)
2014-08-05 09:39 UTC, Egmont Koblinger
none Details | Review
Fix v3 (2.53 KB, patch)
2014-11-22 15:24 UTC, Egmont Koblinger
committed Details | Review

Description Egmont Koblinger 2014-05-14 23:54:51 UTC
Found at https://bugs.launchpad.net/ubuntu/+source/vte3/+bug/1297051

In at least xterm, konsole and putty, when running with ISO-8859-x charset, the 0x9B byte is equivalent to ESC [. E.g. these are the same:
echo -e '\x1B[31mred\x1B[0m'
echo -e '\x9B31mred\x9B0m'

(As far as I understand the docs, in UTF-8 mode the UTF-8 encoded version of U+009B should have the same effect, but this doesn't work for me in any of these terminals:
echo -e '\xC2\x9B31mred\xC2\x9B0m'
)

VTE doesn't support the C1 CSI character.

http://en.wikipedia.org/wiki/C0_and_C1_control_codes
http://en.wikipedia.org/wiki/Control_Sequence_Introducer#Sequence_elements
Comment 1 Egmont Koblinger 2014-05-15 00:02:02 UTC
(Sorry, I was wrong: The UTF-8 version works in konsole.)
Comment 2 Egmont Koblinger 2014-08-04 19:29:14 UTC
Created attachment 282475 [details] [review]
Fix
Comment 3 Egmont Koblinger 2014-08-05 09:39:49 UTC
Created attachment 282533 [details] [review]
Fix v2

Addressing issues in the downstream bugreport
Comment 4 Egmont Koblinger 2014-10-03 11:35:19 UTC
The patch introduces bug 737792.
Comment 5 Egmont Koblinger 2014-10-03 18:15:54 UTC
It's the U+0090 and U+0098 characters (in whichever locale) that cause hang when printed.
$ echo -ne '\u0090'
$ echo -ne '\u0098'
Comment 6 Behdad Esfahbod 2014-10-03 18:19:14 UTC
behdad:~ 0$ echo -ne '\u0090'
Лbehdad:~ 0$ echo -ne '\u0098'
؛behdad:~ 0$
Comment 7 Egmont Koblinger 2014-10-03 18:48:37 UTC
Okay, it doesn't really "hang", it's just waiting for the terminating ST of "device-control-string" or "start-of-string / end-of-string", and in the current design if the escape sequence was opened by C1 notation then the terminating character also has to be a C1 ST (U+009C). Blindly typing «echo -ne '\u9c'» solves the problem.

So it boils down to these issues:

- These two escapes take a string parameter which can be arbitrarily long, and (as per xterm) not even a newline terminates them. This can lead to an apparent hang both in xterm and in vte, both with C0 and C1, e.g. «echo -ne '\e[P'» hangs xterm, assuming your PS1 and PROMPT_COMMAND don't issue any escapes.

- xterm terminates these sequences upon seeing a plain-old-fashioned escape charcter. vte does not terminate on these, it requires either \e\\ (0x1b 0x5C) or U+009C, moreover, it has to match the opening one of the sequence.

- Coincidentally, vte happens to set PROMPT_COMMAND to emit OSC 0/7 which in turn contains a \e\\, so printing the prompt happens to terminate the \eP or \eX sequence (but only the C0 representation).

- To be more robust, vte should terminate these sequences upon seeing a lone escape character, or upon seeing either of the C0/C1 terminators regardless whether C0 or C1 was used to start the sequence.

- We could see if it's safe to terminate these sequences on a newline, or after a certain number of bytes.
Comment 8 Egmont Koblinger 2014-10-03 18:49:50 UTC
(In reply to comment #6)
> behdad:~ 0$ echo -ne '\u0090'
> Лbehdad:~ 0$ echo -ne '\u0098'
> ؛behdad:~ 0$

With the patch from comment #3?
Comment 9 Behdad Esfahbod 2014-10-03 18:54:22 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > behdad:~ 0$ echo -ne '\u0090'
> > Лbehdad:~ 0$ echo -ne '\u0098'
> > ؛behdad:~ 0$
> 
> With the patch from comment #3?

No no no no.  My system vte.
Comment 10 Egmont Koblinger 2014-11-22 15:24:22 UTC
Created attachment 291239 [details] [review]
Fix v3

To address comment 4 & 7, here's a new version that adds all possible mixed usages of C0 and C1 to the table.

(In the mean time, increasing VTE_TABLE_MAX_LITERAL no longer seems to be necessary, probably due to some other upstream change.)
Comment 11 Egmont Koblinger 2014-11-22 15:28:58 UTC
Committed to master (future 0.40).
Comment 12 Egmont Koblinger 2015-08-20 09:32:24 UTC
xterm-319 just added a section to ctlseqs explaining why it deliberately doesn't support C1 in UTF-8.

It clearly states that in UTF-8 you cannot have single-byte C1 values because that'd conflict with valid bytes of UTF-8. Clear story.

Then it says "Each byte sent to the terminal can be unambiguously determined to fall into one of a few categories (C0, C1 and graphic characters)" as the reason for a double-byte sequence can't form a C1. Okay, the specs may say so, but konsole and vte managed to implement this without any problem, so maybe the problem is that the spec is older than UTF-8 and should've been updated.

Anyway, not that we have it, I think it's fine to keep it.
Comment 13 Egmont Koblinger 2015-08-20 09:33:23 UTC
Anyway, now* that we have it, I think it's fine to keep it.
Comment 14 Egmont Koblinger 2019-12-16 09:08:02 UTC
Continued in https://gitlab.gnome.org/GNOME/vte/issues/209.