Bug 109587 – vte doesn't recognize certain double width characters as double width

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 109587 - vte doesn't recognize certain double width characters as double width


Summary:	vte doesn't recognize certain double width characters as double width


Status:	RESOLVED FIXED

Product:	vte
Classification:	Core
Component:	general
Version:	0.10.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	VTE Maintainers
QA Contact:	VTE Maintainers

URL:
Whiteboard:

Duplicates:	118939 339984 430565 (view as bug list)
Depends on:	338305
Blocks:

Reported:	2003-03-31 10:17 UTC by Ken Deeter (Kentarou Shinohara)
Modified:	2007-04-17 06:44 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
shot of half covered full circle character (9.24 KB, image/png) 2003-03-31 10:21 UTC, Ken Deeter (Kentarou Shinohara)		Details
shot of same effect in preedit buffer (6.24 KB, image/png) 2003-03-31 10:21 UTC, Ken Deeter (Kentarou Shinohara)		Details
screen grab of spaces (5.18 KB, image/png) 2003-04-22 06:16 UTC, Ken Deeter (Kentarou Shinohara)		Details
underlining (last two lines) (17.35 KB, image/png) 2003-04-22 06:19 UTC, Ken Deeter (Kentarou Shinohara)		Details
Double-width char This picture shows the good display of characters in an aixterm and the bad in gnome-terminal. (3.61 KB, image/png) 2005-05-24 16:45 UTC, christophe belle		Details
patch from RHEL-4.5 (463 bytes, patch) 2007-04-10 16:10 UTC, Behdad Esfahbod	committed	Details \| Review

Description Ken Deeter (Kentarou Shinohara) 2003-03-31 10:17:03 UTC

Running vte 0.10.26 on gentoo.

This problem appears when I attempt to type in certain symbol characters
using a Japanese input method.

The terminal widget appears to not recognize that characters such as &#9711; and
&#9661; (open full-width circle and open downwards triangle) as double with
characters. As such, once one is typed in, the cursor covers the latter
half of the character, and when I type the left arrow, the whole thing
shifts over by one full character.

I will attach a screenshot to show the effect.

The problem also happens in the preedit buffer when inputing japanese. (not
sure about other languages)

Comment 1 Ken Deeter (Kentarou Shinohara) 2003-03-31 10:21:10 UTC

Created attachment 15328 [details]
shot of half covered full circle character

Comment 2 Ken Deeter (Kentarou Shinohara) 2003-03-31 10:21:42 UTC

Created attachment 15329 [details]
shot of same effect in preedit buffer

Comment 3 Nalin Dahyabhai 2003-04-22 04:53:39 UTC

Please check if this is still a problem in the CVS tree (it shouldn't be).

Comment 4 Ken Deeter (Kentarou Shinohara) 2003-04-22 05:37:37 UTC

Just checked out cvs.. looks like the bug is fixed. The only thing is
now I get a weird space inserted when I hit backspace.. maybe
unrelated.. i'm running it just from the src/ directory of the cvs module

Comment 5 Nalin Dahyabhai 2003-04-22 05:49:26 UTC

What exactly does this weird space look like?  Can you attach a
screengrab?

Comment 6 Ken Deeter (Kentarou Shinohara) 2003-04-22 06:16:32 UTC

Created attachment 15896 [details]
screen grab of spaces

Comment 7 Ken Deeter (Kentarou Shinohara) 2003-04-22 06:18:28 UTC

Attached a screen grab of the spaces.. thats after typing "hello" and
3 backspaces.

Also, the thick underlining did not happen before.. If I hit enter a
few more times, some of the underlining is thickened and some is not..
will attach another screenie.

TERM=xterm
LANG=ja_JP.eucJP

if that matters..

Comment 8 Ken Deeter (Kentarou Shinohara) 2003-04-22 06:19:26 UTC

Created attachment 15897 [details]
underlining (last two lines)

Comment 9 Ken Deeter (Kentarou Shinohara) 2003-04-22 06:21:11 UTC

Weird underlining doesn't happen on all fonts.. I tried "kochi gothic
12" and "ms gothic 12" and both looked ok. backspace still resulted in
same behaviour though.

Comment 10 Kjartan Maraas 2003-10-30 13:46:08 UTC

Reopening.

Comment 11 Kjartan Maraas 2005-02-14 20:48:17 UTC

Is this problem still there btw?

Comment 12 christophe belle 2005-05-24 16:29:47 UTC

Still exists (seen on GNOME AIX)

Problem is in vte/src/vteglyph.c

in _vte_glyph_draw() and _vte_glyph_draw(),

the position "x" in pixels of the character is computed by "row * width".
where width is the average width of the characters of this font
and row is the number of characters from the beginning of the line,
but row is always increment by 1 even if the character is a double-width
character.

Solution needs major changes in algorithm...

Comment 13 christophe belle 2005-05-24 16:45:41 UTC

Created attachment 46843 [details]
Double-width char

This picture shows the good display of characters in an aixterm and the bad in
gnome-terminal.

This picture shows the good display of characters in an aixterm and the bad in
gnome-terminal.

Comment 14 Behdad Esfahbod 2006-04-12 09:26:42 UTC

Closing as WONTFIX.  Vte uses g_unichar_iswide, which in turn uses data from Unicode Character Database to determine which characters are "wide", and the characters you are exhibiting are not wide characters according to the latest UCD.

Comment 15 Ken Deeter (Kentarou Shinohara) 2006-04-12 15:53:17 UTC

I think these characters are part of the set that unicode defines as 'ambiguous'. Does g_unichar_iswide account for this (i.e., is there some global setting or environment variable that can toggle whether ambiguous characters are interpreted as wide or not).

The Apple terminal deals with this with a setting in its preferences that controls whether certain characters are interpreted as wide.

Hiding behind the unicode standard is not the right answer here.. especially when it leads to broken behavior. For Japanese characters at least, you have to match what the outputing program thinks is the column width of the character.

For example, if a process running in ja_JP.eucJP outputs a character that maps to one of the ambiguous characters in question, the process will always treat it as double width (because they are considered wide in eucJP) affect how it lines things up on screen or handling backspaces. So looking at what unicode says isn't technically correct, becuase a process running in the terminal doesn't even know that it's output is getting translated to unicode.

Maintaining a mapping of which unicode code pionts are considered double width in which encodings is somewhat impractical..a nd the only contentious ones are in the 'ambiguous' set.

Here's the relevant unicode info: http://www.unicode.org/reports/tr11/tr11-14.html

Specifically, it says that for ambiguous characters, the width needs to be figured out from context.. in a terimnal's case it can be the source encoding. At minimum, if its EUC-JP or SJIS, then the ambiguous characters should be treated as full-width, not half-width.

It's not even clear that when a process is running in utf-8 or some other unicode encoding that you'll still get the right behavior. That's because the actual width of a glyph at these code points depends on the font.

If you look at the full circle glyph in a Japanese font, it's always full-width. If you look at it in a wester font, it's usually half width. So a japanese translator may try to use that code point in a string expecting it to be full width, because that's how it shows up in his display, but then western users may see something else depending on their font setting. But this is more of a display problem.. as long as the process running the terminal and the terminal itself agree on the width of a character in question, you won't run into problems.

But the problem that this bug originally describes is when the process and the terminal don't agree. Things like readline (which internally store how long a line is in terms of terminal columns) start to get out of sync with what the terminal displays, and you end up with this bug.

Comment 16 Behdad Esfahbod 2006-04-13 06:05:12 UTC

Note that we are not hiding behind Unicode, no.  We comply with Unicode *exactly* because we are supposed to implement the same thing that the clients expect, and Unicode is by far the safest way to achieve that.

As for the circled chars, you are right, they are ambiguous.  So how should the terminal resolve them to wide?  adjacent characters on the screen?  In the stream?  what?  What if you remove the adjacent wide chars?  Should change width?  That's exactly why I believe terminals should not try to be intelligent.  As you said, readline does keep track of character widths itself, as well as all editors.  So, does readline resolve the ambiguous chars to wide anytime?  when?  Without having answers to these questions we cannot implement anything further in vte.

Comment 17 Ken Deeter (Kentarou Shinohara) 2006-04-13 06:22:48 UTC

> We comply with Unicode
> *exactly* because we are supposed to implement the same thing that the clients
> expect, and Unicode is by far the safest way to achieve that.

I'm not sure what you mean by 'clients' here, but a terminal should behave in the way that the programming running inside the terminal expects it to behave. Like I mentioned, if youre running in the ja_JP.eucJP locale, and you have gnome-terminal set to interpret the output of a program as ja_JP.eucJP, then it should disambiguate the wideness of ambiguous characters based on how those characters would be treated in ja_JP.eucJP.

Users (especially users that don't use utf-8 as their locale) don't expect gnome-terminal to only act according to the unicode standard. They expect it to act just like how their locale encoding acts.

In other words, to answer your question, I think the correct answer is to use gnome-terminal's current encoding setting to disambiguate. For encodings that cover CJK languages, it's safe to assume that these ambiguous width characters should be treated as full-width.

It's reasonable to assume that the user will set gnome-terminal's encoding to match the locale encoding of the program he's running in the terminal. If it were mismatched, it would be useless. So we should use this information that the user is telling us and act accordingly.

Comment 18 Behdad Esfahbod 2006-04-13 06:43:22 UTC

By clients I meant the programs running inside the terminal.

Your proposal sounds very reasonable.  Can you provide a list of charsets that should default to wide ambiguous?

Comment 19 Behdad Esfahbod 2006-04-13 06:43:58 UTC

I opened bug 338305 to get the support for ambiguous width into glib first.

Comment 20 Ken Deeter (Kentarou Shinohara) 2006-04-13 07:16:21 UTC

Unfortunately, I don't know what all the appropriate encodings would be. How about adding an optoin that says 'treat ambiguous width characters as full width' .. this is what xterm, Terminal.app, and mlterm do. That's usually good enough.

Comment 21 Behdad Esfahbod 2006-04-28 02:35:30 UTC

Make it searchable :)

Comment 22 Behdad Esfahbod 2006-04-29 04:26:00 UTC

*** Bug 118939 has been marked as a duplicate of this bug. ***

Comment 23 Behdad Esfahbod 2006-04-29 04:27:43 UTC

*** Bug 339984 has been marked as a duplicate of this bug. ***

Comment 24 Behdad Esfahbod 2006-04-29 04:51:04 UTC

Ok, seems like vte already has all the machinery for this in place, and it should be working for East Asian encodings AND locales already.  Can somebody confirm in what encodings/locales it's supposed to work that it is not?

Bug 339984 has a patch to always turn the ambiguous characters wide unconditionally under UTF-8.  I don't think that's going to happen.

Comment 25 Ken Deeter (Kentarou Shinohara) 2006-04-29 17:21:13 UTC

That's good to know most of the code is already there. Instead of hardcoding utf-8 => always full width, can it just be a gui preference? This is what other terminals, notably Terminal.app do.

Just like that patch except:

+	if (ASSUME_AMBIGUOUS_WIDTH_AS_FULL_PREFERENCE_SET) == 0)
+	  return _vte_iso2022_ambiguous_width_guess ();

Where the condition is just determined by a gui check box.

Comment 26 Behdad Esfahbod 2006-04-29 17:24:45 UTC

That's doable, but needs new API for vte_terminal, and support from gnome-terminal.  Waiting for patches :)

Comment 27 Takashi Matsuo 2007-02-14 15:53:30 UTC

I wrote a patch against libvte4-1:0.12.2-4 debian package. 
Please see
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=395133
and
http://bugs.debian.org/cgi-bin/bugreport.cgi/vte-width_cjk.patch?bug=395133;msg=10;att=1

By this patch: gnome-terminal treats ambiguous width
chars as FULLWIDTH only when 'VTE_WIDTH_CJK' environment variable is set.  It
only affects people who set this environment variable .

Sorry for that this patch is specific for debian package. But this patch is quite simple one, so I think it's not so hard applying this patch against another version of libvte.

Please think about applying this patch.

Comment 28 Mariano Suárez-Alvarez 2007-02-14 17:54:12 UTC

Caching the result of the test into a gboolean would avoid running getenv that often...

I'm not sure using a environment variable is the way to go here. Specially of there is going to be an UI for this. 

Is there going to be a UI for this?

Comment 29 Behdad Esfahbod 2007-02-14 20:34:27 UTC

Actually for an upcoming redhat release, I picked the patch from bug 339984.

Comment 30 Behdad Esfahbod 2007-04-10 16:10:48 UTC

Created attachment 86117 [details] [review]
patch from RHEL-4.5

This uses the env var VTE_CJK_WIDTH.  It should cache getenv results...

I suggest we commit this even if we are going to do an API and UI later.  This can determine the default later.

Comment 31 Chris Wilson 2007-04-10 16:24:53 UTC

The result of _vte_iso2022_ambiguous_width() is cached inside the _vte_iso2022_state for the lifetime of the state i.e. until the terminal is reset, so we no longer need to worry about caching one additional g_getenv().

Comment 32 Behdad Esfahbod 2007-04-10 16:54:55 UTC

Then this can go in.

Comment 33 Behdad Esfahbod 2007-04-17 06:42:41 UTC

*** Bug 430565 has been marked as a duplicate of this bug. ***

Comment 34 Behdad Esfahbod 2007-04-17 06:44:49 UTC

2007-04-17  Behdad Esfahbod  <behdad@gnome.org>

        * src/iso2022.c (_vte_iso2022_ambiguous_width): Consider
        ambiguous-width chars if VTE_CJK_WIDTH env var is set and we are
        under a CJK locale.