After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 772890 - Use wcwidth instead of g_unichar_iswide
Use wcwidth instead of g_unichar_iswide
Status: RESOLVED OBSOLETE
Product: vte
Classification: Core
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: VTE Maintainers
VTE Maintainers
Depends on:
Blocks:
 
 
Reported: 2016-10-13 20:27 UTC by Egmont Koblinger
Modified: 2021-06-10 15:17 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Egmont Koblinger 2016-10-13 20:27:25 UTC
Inspired by bug 762052 and bug 772812:

Every once in a while wcwidth() and g_unichar_iswide() disagree whether a character is wide or not. This can be caused by changes in the Unicode standard (and lack of versioning in these methods), or by implementation bugs.

In order for apps running inside the terminal emulator not to fall apart, it's important that the app has the same belief about width than the terminal emulator itself.

Apps are way more likely to use the generic wcwidth() (most of the indirectly, e.g. via ncurses) than the glib-specific g_unichar_iswide().

So if we also used the generic wcwidth(), such breakages would be less frequent.

I guess the same should go for zero-width/combining chars too.
Comment 1 Christian Persch 2016-10-13 22:15:04 UTC
From the wcwidth docs: "The behavior of wcwidth() depends on the LC_CTYPE category of the current locale."

Since that locale would be the g-t-server's locale, not the one of the programme running inside the terminal, I don't think this is the correct solution.
Comment 2 Egmont Koblinger 2016-10-13 22:27:36 UTC
I think it theoretically might depend on the locale but actually in glibc it does not (at least as long as UTF-8 locales are concerned). Even the proposal to have ambiguous-is-wide locale definitions (which would differ here) was refused (or is at least put aside for now).

There is no "correct solution", e.g. there's no way to guarantee full consistency across an ssh session. But most of the time I think this approach would result in a better behavior than the current one. (Also let's not forget that almost all users use a single locale only, that is, g-t-s runs with the same one as the apps inside.)
Comment 3 Egmont Koblinger 2016-11-17 09:17:59 UTC
Just for convenience: Here are the widths for Unicode 8.0 and 9.0, there are many differences:

ftp://ftp.unicode.org/Public/8.0.0/ucd/EastAsianWidth.txt
ftp://ftp.unicode.org/Public/9.0.0/ucd/EastAsianWidth.txt

We're keep getting reports here as well as on stackoverflow+friends...
Comment 4 Christian Persch 2017-04-30 20:56:38 UTC
Even if we did switch to using wcwidth, that still wouldn't guarantee the right results, since unless the terminal application is very very unicode-savy, it probably takes wcwidth per-character and doesn't consider that (e.g. for emoji) a later character may change the width of the preceding character.
Comment 5 Johannes Löthberg 2017-05-29 16:21:42 UTC
Buggy programs being broken doesn't exactly seem as bad as properly written programs being very broken, like currently is the case.  Having to reset my terminal relatively often because of this bug is getting rather old.
Comment 6 Egmont Koblinger 2017-05-29 19:33:40 UTC
I really don't get what you're trying to say. What makes a program "buggy" vs "properly written" according to your definition?
Comment 7 Christian Persch 2017-05-29 20:03:24 UTC
@commenter 5: If your distribution's glibc is still using unicode <= 8 data, maybe they should also keep glib at a version that uses the same unicode data.

---

Another problem with just using the host's wcwidth is that when you e.g. ssh to another host, that one's wcwidth may still be using a different unicode version.

We could stop using glib and glibc for this, include the wcwidth data for multiple unicode versions (<= 8 and >= 9, at least), and have a sequence to switch between them (like iterm2 has). The vte.sh integration could then probe the host's wcwidth and emit the sequence to switch to a matching wcwidth.

Still, that seems overkill to me...
Comment 8 Egmont Koblinger 2017-05-29 20:09:13 UTC
> seems overkill to me...

To me too. Hopefully it won't be frequent that the widths change. By now, I think all major Linux distros have fully switched to Unicode 9.0 (both glibc and glib), or going to switch real soon, making this bug kinda obsolete.

(That being said, I'd still prefer vte relying on wcwidth rather than g_whatever, but it's quickly losing its practical importance.)

> The vte.sh integration could then probe the host's wcwidth

which is tricky 'cause VTE_VERSION is not forwarded by ssh.
Comment 9 Egmont Koblinger 2017-08-16 15:35:16 UTC
glibc's wcwidth() is about to deviate from the Unicode standard... let's hope we can stop this crazyness!

https://sourceware.org/bugzilla/show_bug.cgi?id=21750 comments 5-7 for the time being.
Comment 10 GNOME Infrastructure Team 2021-06-10 15:17:07 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vte/-/issues/2350.