After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 673981 - Various UTF-8 interpretation issues
Various UTF-8 interpretation issues
Status: RESOLVED OBSOLETE
Product: vte
Classification: Core
Component: general
unspecified
Other Windows
: Normal normal
: ---
Assigned To: VTE Maintainers
VTE Maintainers
Depends on:
Blocks:
 
 
Reported: 2012-04-12 11:31 UTC by Jasper St. Pierre (not reading bugmail)
Modified: 2021-06-10 14:35 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Jasper St. Pierre (not reading bugmail) 2012-04-12 11:31:52 UTC
On the Mosh website, there are a number of UTF8 interpretation issues listed for popular terminal emulators, including VTE:

  http://mosh.mit.edu/#techinfo

We should fix some of these.
Comment 1 Egmont Koblinger 2013-10-24 09:41:14 UTC
I see three bugs mentioned there, illustrated with screenshots.

1. Mosh claims that a line beginning with combining accents should place these accents in the first cell. I'm not against this at all, but it would be nice to see some official Unicode documentation that this is indeed the desired behavior.

2. It shouldn't be possible to switch to line drawing character mode. I personally agree with this, however, vte aims to mimic xterm and xterm doesn't do this, so I'm not sure if this is a practical change. We should ask xterm's developer for his opinion. I see a chance that such change would break some old (non Unicode aware) applications that use English + line drawing characters only and currently happily run under a UTF-8 terminal. (If someone's worried about hieroglyphs prompt, they can put "\e(B" in their PS1.)

3. It claims that printing xyz, then positioning the cursor over y, then printing a combining accent should put it over y. Note that if you go back to the beginning of the line and print that x again (the cursor is now over y) and print a combining char, it should most definitely be put over x. The same if you backspace over z and y. Mosh's preferred behavior would mean that moving the cursor from (2,2) to (2,2) is no longer a no-op. Until I see a counter evidence (docs or such), I disagree with mosh and agree with vte's current behavior.
Comment 2 Egmont Koblinger 2014-08-01 13:54:10 UTC
Unfortunately xterm has changed 1 & 3 to match mosh's expectations.  I don't like it because 1 makes a character of wcwidth==0 advance the cursor which is probably against screen libraries' expectation; and 3 makes a cursor movement no longer a no-op.

The change to 3 reminds me to bug 731155, probably we can reuse some code from there.  It's common in both bugs that we need to remember whether printing a character or moving the cursor was the more recent operation.
Comment 3 Behdad Esfahbod 2014-08-01 20:36:52 UTC
Well Unicode says if a combining mark comes at the beginning of text, show it on a dotted circle.  But that's a recommendation, not requirement.  I don't think we have to follow xterm or mosh here.  I agree that breaking wcwidth() assumptions is bad.
Comment 4 Egmont Koblinger 2014-08-01 20:48:45 UTC
What is Unicode's recommendation if the combining mark comes after a cursor movement operation (does Unicode care about terminals at all?), shouldn't then the base character also be replaced by a dotted circle?

What's the recommendation if the combining mark comes after a markup change, e.g. in terminal a base letter followed by a color change followed by the accent, what color should the cell have?  Or, in html, something like <b>e</b>&#x0301; => should it be a bold é, a plain é, or a bold e followed by an accent over dotted circle?

In the 1st point, mosh's (and new xterm's) behavior is better in the sense that it preserves data for copy-pasting.  Well, I don't know how many combining characters xterm remembers, in vte there's a limit, so we should say it only preserves a couple of them.

I'm inclined to say that we shouldn't care about these at all.  I guess it's okay to say that the terminal's behavior is undefined.  Anyone who cares about what's visible in terminal should never output either 1 or 3.
Comment 5 Behdad Esfahbod 2014-08-01 20:54:35 UTC
Unicode only makes recommendations about plain text.

If there's color markup on base but not mark (or the other way around), it is desired that they rendered with different colors.  But even in our browsers and desktops we currently don't do that.
Comment 6 Mike Frysinger 2017-09-03 23:06:31 UTC
(2) wrt graphics/line drawing mode, i've filed bug 787228 with, imo, a way forward that should include legacy/xterm compatibility while allowing people to move forward into a stable UTF-8 world.
Comment 7 GNOME Infrastructure Team 2021-06-10 14:35:39 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vte/-/issues/1935.