Bug 731208 – default-character-set is useless

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 731208 - default-character-set is useless


Summary:	default-character-set is useless


Status:	RESOLVED FIXED

Product:	vte
Classification:	Core
Component:	general
Version:	0.37.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	VTE Maintainers
QA Contact:	VTE Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2014-06-04 13:50 UTC by Egmont Koblinger
Modified:	2017-01-25 14:50 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
v1 (2.12 KB, patch) 2014-11-22 18:15 UTC, Egmont Koblinger	committed	Details \| Review

Description Egmont Koblinger 2014-06-04 13:50:35 UTC

The set-default-charset escape sequence (\e[@) is quite useless. It sets the charset according to the locale, but new g-t forces the locale to be UTF-8. Hence it's the same as its supposedly opposite counterpart utf-8-character-set (\e[G).

The charset can be specified in g-t's profile prefs, but it's set only once when a terminal is created. If you emit either of the two sequences above, you get stuck in UTF-8 and no escape sequence takes you out from there.

What would IMO make sense:

The process's locale should be irrelevant. vte_terminal_set_encoding() would set the default encoding, the one to which \e[@ (or a more complete terminal reset) resets to. It only changes the encoding immediately if the UTF-8 override (via \e[G) is not in effect at that moment.

Now some people might complain that after an app switches to UTF-8 mode the charset is not reflected in the menu entries, and changing it has no effect either. But it's the same as with the color palette, there the conclusion was the same: if the terminal overrides with escapes, it's stronger than the value set via API and is not reflected on the prefs UI. Switching back-n-forth between default and utf8 charset is the terminal's private internal state. So I think this solution is defendable.

Any opinions?

(Note: This is pretty much the same as 359961, simplified now that the app locale is forced to UTF-8.)

Comment 1 Egmont Koblinger 2014-06-04 13:53:30 UTC

> The set-default-charset escape sequence (\e[@)

Sorry, it is called default-character-set

Comment 2 Christian Persch 2014-06-10 10:37:22 UTC

Yes, I think this is the right approach.

Comment 3 Egmont Koblinger 2014-07-27 01:59:59 UTC

There's a bit of a problem with the interaction between this bug and bug 732586 (dropping iso2022 – I began working on this bug on top of that one.)

Expected behavior with Latin-1 terminal (the behavior of xterm, and vte < 0.35):
$ echo -e '\xc5\xb1 \e%G \xc5\xb1 \e%@ \xc5\xb1'
Å± ű Å±

So far the switching between the two charsets by \e[@ or \e[G was handled in the iso2022 layer, we automatically received the bytestreams properly decoded to Unicode, these escape sequences took effect immediately. (Well, this actually broke somewhere in vte 0.36., don't know where.)

With the removal of that layer, this would no longer be the case. We might receive a complete chunk of data with the old charset, and switch to the new charset later than we should. Introducing special workarounds brings back a small slowdown and a few ugly lines of code (explicitly looking for these escapes in _vte_iso2022_process and stop processing after we find one, so we call the handlers before continuing.)

I can see three possibilities:
1. Accept the small slowdown and a bit of hack to behave correctly.
2. Live with the race condition and hence possible screen corruption (konsole does this).
3. Entirely drop support for these escapes (charset would be only settable via API).

I tend towards #3. They broke in vte 0.36 and nobody complained. I wonder why anyone would create a non-UTF8 profile and then switch encodings back-n-forth. These escapes made sense way back when 8-bit charsets were the default and you temporarily switched to UTF-8 for an advanced app, but I can't see the point in the reverse direction.

Comment 4 Egmont Koblinger 2014-07-27 16:04:40 UTC

Sorry, I was wrong saying it broke in vte-0.36.  It still works perfectly.

Comment 5 Behdad Esfahbod 2014-07-27 17:24:30 UTC

I'm fine with removing it.

Comment 6 Egmont Koblinger 2014-07-27 23:32:12 UTC

So, to be absolutely clear: the feature still works in plain VTE toggling between the locale's encoding and UTF-8, but it's useless via gnome-terminal which forces UTF-8.

xterm, pterm/putty => support these escapes and switch immediately
konsole => supports these escapes but switches with a delay
urxvt => doesn't support these escapes but supports another one to switch locale (from "man 7 urxvt": \e]701;newlocale\a), switches immediately
st, terminology => only support UTF-8

The hack I had in mind wouldn't work reliably, as in vte_terminal_process_incoming() we might make multiple calls to _vte_iso2022_process() before the resulting Unicode char sequence is handled.

Although according bug 732586 the right place to handle the line drawing charset is ineed to move out of iso2022 to a higher level, it seems that toggling UTF-8 back and forth should remain there at that low level, in the iso2022 handler.  We'd still be able to get rid of most of iso2022, just not all of it.

Let me still think about it...

Comment 7 Egmont Koblinger 2014-08-03 18:41:23 UTC

(In reply to comment #5)
> I'm fine with removing it.

I'm waiting for ChPe's opinion too before going ahead.

Comment 8 Egmont Koblinger 2014-11-22 18:15:55 UTC

Created attachment 291250 [details] [review]
v1

Drop handling \e%G and \e%@, they didn't work anyways.

Patch goes on top of bug 732586 comment 17.

Comment 9 Egmont Koblinger 2014-11-22 18:31:38 UTC

Committed.

In the unlikely event that someone really misses this feature and wants to bring it back (and he's not okay with "luit" or something similar), we should consider two approaches:

- Implement in the iso2022 layer, practically being the only thing we do with the input prior to decoding from its character set. Has a runtime cost and makes the code more complex (brings back the iso2022 layer).

- Implement along with the other caps. Has the drawback that it doesn't immediately have an effect, some parts of the input following these escapes will still be decoded according to the old charset.

So far, vte had a mixture of these two (well, both of them I think).

But anyways, it's 2014 now, everyone should be using UTF-8. Some new terminal emulators (as far as I remember: terminology(enlightenment) and st(suckless)) only support UTF-8. In g-t you can choose from the menus, have a separate profile, and of course you can use luit/screen/tmux for charset conversion.

Comment 10 Christian Persch 2014-11-24 21:38:43 UTC

Just FTR, I concur with comment 5 :-)

Comment 11 Egmont Koblinger 2017-01-25 14:50:36 UTC

See bug 777747 for followup.