GNOME Bugzilla – Bug 109587
vte doesn't recognize certain double width characters as double width
Last modified: 2007-04-17 06:44:49 UTC
Running vte 0.10.26 on gentoo. This problem appears when I attempt to type in certain symbol characters using a Japanese input method. The terminal widget appears to not recognize that characters such as ◯ and ▽ (open full-width circle and open downwards triangle) as double with characters. As such, once one is typed in, the cursor covers the latter half of the character, and when I type the left arrow, the whole thing shifts over by one full character. I will attach a screenshot to show the effect. The problem also happens in the preedit buffer when inputing japanese. (not sure about other languages)
Created attachment 15328 [details] shot of half covered full circle character
Created attachment 15329 [details] shot of same effect in preedit buffer
Please check if this is still a problem in the CVS tree (it shouldn't be).
Just checked out cvs.. looks like the bug is fixed. The only thing is now I get a weird space inserted when I hit backspace.. maybe unrelated.. i'm running it just from the src/ directory of the cvs module
What exactly does this weird space look like? Can you attach a screengrab?
Created attachment 15896 [details] screen grab of spaces
Attached a screen grab of the spaces.. thats after typing "hello" and 3 backspaces. Also, the thick underlining did not happen before.. If I hit enter a few more times, some of the underlining is thickened and some is not.. will attach another screenie. TERM=xterm LANG=ja_JP.eucJP if that matters..
Created attachment 15897 [details] underlining (last two lines)
Weird underlining doesn't happen on all fonts.. I tried "kochi gothic 12" and "ms gothic 12" and both looked ok. backspace still resulted in same behaviour though.
Reopening.
Is this problem still there btw?
Still exists (seen on GNOME AIX) Problem is in vte/src/vteglyph.c in _vte_glyph_draw() and _vte_glyph_draw(), the position "x" in pixels of the character is computed by "row * width". where width is the average width of the characters of this font and row is the number of characters from the beginning of the line, but row is always increment by 1 even if the character is a double-width character. Solution needs major changes in algorithm...
Created attachment 46843 [details] Double-width char This picture shows the good display of characters in an aixterm and the bad in gnome-terminal. This picture shows the good display of characters in an aixterm and the bad in gnome-terminal.
Closing as WONTFIX. Vte uses g_unichar_iswide, which in turn uses data from Unicode Character Database to determine which characters are "wide", and the characters you are exhibiting are not wide characters according to the latest UCD.
I think these characters are part of the set that unicode defines as 'ambiguous'. Does g_unichar_iswide account for this (i.e., is there some global setting or environment variable that can toggle whether ambiguous characters are interpreted as wide or not). The Apple terminal deals with this with a setting in its preferences that controls whether certain characters are interpreted as wide. Hiding behind the unicode standard is not the right answer here.. especially when it leads to broken behavior. For Japanese characters at least, you have to match what the outputing program thinks is the column width of the character. For example, if a process running in ja_JP.eucJP outputs a character that maps to one of the ambiguous characters in question, the process will always treat it as double width (because they are considered wide in eucJP) affect how it lines things up on screen or handling backspaces. So looking at what unicode says isn't technically correct, becuase a process running in the terminal doesn't even know that it's output is getting translated to unicode. Maintaining a mapping of which unicode code pionts are considered double width in which encodings is somewhat impractical..a nd the only contentious ones are in the 'ambiguous' set. Here's the relevant unicode info: http://www.unicode.org/reports/tr11/tr11-14.html Specifically, it says that for ambiguous characters, the width needs to be figured out from context.. in a terimnal's case it can be the source encoding. At minimum, if its EUC-JP or SJIS, then the ambiguous characters should be treated as full-width, not half-width. It's not even clear that when a process is running in utf-8 or some other unicode encoding that you'll still get the right behavior. That's because the actual width of a glyph at these code points depends on the font. If you look at the full circle glyph in a Japanese font, it's always full-width. If you look at it in a wester font, it's usually half width. So a japanese translator may try to use that code point in a string expecting it to be full width, because that's how it shows up in his display, but then western users may see something else depending on their font setting. But this is more of a display problem.. as long as the process running the terminal and the terminal itself agree on the width of a character in question, you won't run into problems. But the problem that this bug originally describes is when the process and the terminal don't agree. Things like readline (which internally store how long a line is in terms of terminal columns) start to get out of sync with what the terminal displays, and you end up with this bug.
Note that we are not hiding behind Unicode, no. We comply with Unicode *exactly* because we are supposed to implement the same thing that the clients expect, and Unicode is by far the safest way to achieve that. As for the circled chars, you are right, they are ambiguous. So how should the terminal resolve them to wide? adjacent characters on the screen? In the stream? what? What if you remove the adjacent wide chars? Should change width? That's exactly why I believe terminals should not try to be intelligent. As you said, readline does keep track of character widths itself, as well as all editors. So, does readline resolve the ambiguous chars to wide anytime? when? Without having answers to these questions we cannot implement anything further in vte.
> We comply with Unicode > *exactly* because we are supposed to implement the same thing that the clients > expect, and Unicode is by far the safest way to achieve that. I'm not sure what you mean by 'clients' here, but a terminal should behave in the way that the programming running inside the terminal expects it to behave. Like I mentioned, if youre running in the ja_JP.eucJP locale, and you have gnome-terminal set to interpret the output of a program as ja_JP.eucJP, then it should disambiguate the wideness of ambiguous characters based on how those characters would be treated in ja_JP.eucJP. Users (especially users that don't use utf-8 as their locale) don't expect gnome-terminal to only act according to the unicode standard. They expect it to act just like how their locale encoding acts. In other words, to answer your question, I think the correct answer is to use gnome-terminal's current encoding setting to disambiguate. For encodings that cover CJK languages, it's safe to assume that these ambiguous width characters should be treated as full-width. It's reasonable to assume that the user will set gnome-terminal's encoding to match the locale encoding of the program he's running in the terminal. If it were mismatched, it would be useless. So we should use this information that the user is telling us and act accordingly.
By clients I meant the programs running inside the terminal. Your proposal sounds very reasonable. Can you provide a list of charsets that should default to wide ambiguous?
I opened bug 338305 to get the support for ambiguous width into glib first.
Unfortunately, I don't know what all the appropriate encodings would be. How about adding an optoin that says 'treat ambiguous width characters as full width' .. this is what xterm, Terminal.app, and mlterm do. That's usually good enough.
Make it searchable :)
*** Bug 118939 has been marked as a duplicate of this bug. ***
*** Bug 339984 has been marked as a duplicate of this bug. ***
Ok, seems like vte already has all the machinery for this in place, and it should be working for East Asian encodings AND locales already. Can somebody confirm in what encodings/locales it's supposed to work that it is not? Bug 339984 has a patch to always turn the ambiguous characters wide unconditionally under UTF-8. I don't think that's going to happen.
That's good to know most of the code is already there. Instead of hardcoding utf-8 => always full width, can it just be a gui preference? This is what other terminals, notably Terminal.app do. Just like that patch except: + if (ASSUME_AMBIGUOUS_WIDTH_AS_FULL_PREFERENCE_SET) == 0) + return _vte_iso2022_ambiguous_width_guess (); Where the condition is just determined by a gui check box.
That's doable, but needs new API for vte_terminal, and support from gnome-terminal. Waiting for patches :)
I wrote a patch against libvte4-1:0.12.2-4 debian package. Please see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=395133 and http://bugs.debian.org/cgi-bin/bugreport.cgi/vte-width_cjk.patch?bug=395133;msg=10;att=1 By this patch: gnome-terminal treats ambiguous width chars as FULLWIDTH only when 'VTE_WIDTH_CJK' environment variable is set. It only affects people who set this environment variable . Sorry for that this patch is specific for debian package. But this patch is quite simple one, so I think it's not so hard applying this patch against another version of libvte. Please think about applying this patch.
Caching the result of the test into a gboolean would avoid running getenv that often... I'm not sure using a environment variable is the way to go here. Specially of there is going to be an UI for this. Is there going to be a UI for this?
Actually for an upcoming redhat release, I picked the patch from bug 339984.
Created attachment 86117 [details] [review] patch from RHEL-4.5 This uses the env var VTE_CJK_WIDTH. It should cache getenv results... I suggest we commit this even if we are going to do an API and UI later. This can determine the default later.
The result of _vte_iso2022_ambiguous_width() is cached inside the _vte_iso2022_state for the lifetime of the state i.e. until the terminal is reset, so we no longer need to worry about caching one additional g_getenv().
Then this can go in.
*** Bug 430565 has been marked as a duplicate of this bug. ***
2007-04-17 Behdad Esfahbod <behdad@gnome.org> * src/iso2022.c (_vte_iso2022_ambiguous_width): Consider ambiguous-width chars if VTE_CJK_WIDTH env var is set and we are under a CJK locale.