Bug 730154 – Support C1 CSI (0x9B)

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 730154 - Support C1 CSI (0x9B)


Summary:	Support C1 CSI (0x9B)


Status:	RESOLVED FIXED

Product:	vte
Classification:	Core
Component:	general
Version:	0.37.x
Hardware:	Other Linux

Importance:	Normal minor
Target Milestone:	---
Assigned To:	VTE Maintainers
QA Contact:	VTE Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2014-05-14 23:54 UTC by Egmont Koblinger
Modified:	2019-12-16 09:08 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Fix (1.25 KB, patch) 2014-08-04 19:29 UTC, Egmont Koblinger	none	Details \| Review
Fix v2 (2.25 KB, patch) 2014-08-05 09:39 UTC, Egmont Koblinger	none	Details \| Review
Fix v3 (2.53 KB, patch) 2014-11-22 15:24 UTC, Egmont Koblinger	committed	Details \| Review

Description Egmont Koblinger 2014-05-14 23:54:51 UTC

Found at https://bugs.launchpad.net/ubuntu/+source/vte3/+bug/1297051

In at least xterm, konsole and putty, when running with ISO-8859-x charset, the 0x9B byte is equivalent to ESC [. E.g. these are the same:
echo -e '\x1B[31mred\x1B[0m'
echo -e '\x9B31mred\x9B0m'

(As far as I understand the docs, in UTF-8 mode the UTF-8 encoded version of U+009B should have the same effect, but this doesn't work for me in any of these terminals:
echo -e '\xC2\x9B31mred\xC2\x9B0m'
)

VTE doesn't support the C1 CSI character.

http://en.wikipedia.org/wiki/C0_and_C1_control_codes
http://en.wikipedia.org/wiki/Control_Sequence_Introducer#Sequence_elements

Comment 1 Egmont Koblinger 2014-05-15 00:02:02 UTC

(Sorry, I was wrong: The UTF-8 version works in konsole.)

Comment 2 Egmont Koblinger 2014-08-04 19:29:14 UTC

Created attachment 282475 [details] [review]
Fix

Comment 3 Egmont Koblinger 2014-08-05 09:39:49 UTC

Created attachment 282533 [details] [review]
Fix v2

Addressing issues in the downstream bugreport

Comment 4 Egmont Koblinger 2014-10-03 11:35:19 UTC

The patch introduces bug 737792.

Comment 5 Egmont Koblinger 2014-10-03 18:15:54 UTC

It's the U+0090 and U+0098 characters (in whichever locale) that cause hang when printed.
$ echo -ne '\u0090'
$ echo -ne '\u0098'

Comment 6 Behdad Esfahbod 2014-10-03 18:19:14 UTC

behdad:~ 0$ echo -ne '\u0090'
Лbehdad:~ 0$ echo -ne '\u0098'
؛behdad:~ 0$

Comment 7 Egmont Koblinger 2014-10-03 18:48:37 UTC

Okay, it doesn't really "hang", it's just waiting for the terminating ST of "device-control-string" or "start-of-string / end-of-string", and in the current design if the escape sequence was opened by C1 notation then the terminating character also has to be a C1 ST (U+009C). Blindly typing «echo -ne '\u9c'» solves the problem.

So it boils down to these issues:

- These two escapes take a string parameter which can be arbitrarily long, and (as per xterm) not even a newline terminates them. This can lead to an apparent hang both in xterm and in vte, both with C0 and C1, e.g. «echo -ne '\e[P'» hangs xterm, assuming your PS1 and PROMPT_COMMAND don't issue any escapes.

- xterm terminates these sequences upon seeing a plain-old-fashioned escape charcter. vte does not terminate on these, it requires either \e\\ (0x1b 0x5C) or U+009C, moreover, it has to match the opening one of the sequence.

- Coincidentally, vte happens to set PROMPT_COMMAND to emit OSC 0/7 which in turn contains a \e\\, so printing the prompt happens to terminate the \eP or \eX sequence (but only the C0 representation).

- To be more robust, vte should terminate these sequences upon seeing a lone escape character, or upon seeing either of the C0/C1 terminators regardless whether C0 or C1 was used to start the sequence.

- We could see if it's safe to terminate these sequences on a newline, or after a certain number of bytes.

Comment 8 Egmont Koblinger 2014-10-03 18:49:50 UTC

(In reply to comment #6)
> behdad:~ 0$ echo -ne '\u0090'
> Лbehdad:~ 0$ echo -ne '\u0098'
> ؛behdad:~ 0$

With the patch from comment #3?

Comment 9 Behdad Esfahbod 2014-10-03 18:54:22 UTC

(In reply to comment #8)
> (In reply to comment #6)
> > behdad:~ 0$ echo -ne '\u0090'
> > Лbehdad:~ 0$ echo -ne '\u0098'
> > ؛behdad:~ 0$
> 
> With the patch from comment #3?

No no no no.  My system vte.

Comment 10 Egmont Koblinger 2014-11-22 15:24:22 UTC

Created attachment 291239 [details] [review]
Fix v3

To address comment 4 & 7, here's a new version that adds all possible mixed usages of C0 and C1 to the table.

(In the mean time, increasing VTE_TABLE_MAX_LITERAL no longer seems to be necessary, probably due to some other upstream change.)

Comment 11 Egmont Koblinger 2014-11-22 15:28:58 UTC

Committed to master (future 0.40).

Comment 12 Egmont Koblinger 2015-08-20 09:32:24 UTC

xterm-319 just added a section to ctlseqs explaining why it deliberately doesn't support C1 in UTF-8.

It clearly states that in UTF-8 you cannot have single-byte C1 values because that'd conflict with valid bytes of UTF-8. Clear story.

Then it says "Each byte sent to the terminal can be unambiguously determined to fall into one of a few categories (C0, C1 and graphic characters)" as the reason for a double-byte sequence can't form a C1. Okay, the specs may say so, but konsole and vte managed to implement this without any problem, so maybe the problem is that the spec is older than UTF-8 and should've been updated.

Anyway, not that we have it, I think it's fine to keep it.

Comment 13 Egmont Koblinger 2015-08-20 09:33:23 UTC

Anyway, now* that we have it, I think it's fine to keep it.

Comment 14 Egmont Koblinger 2019-12-16 09:08:02 UTC

Continued in https://gitlab.gnome.org/GNOME/vte/issues/209.