GNOME Bugzilla – Bug 762052
ncurses text gets shifted with unicode Japanese characters
Last modified: 2016-10-13 20:28:41 UTC
Scrolling over lines in ncurses interfaces with specific Japanese characters/character combinations results in text and colors getting shifted. Some text in the same line as troublesome Japanese characters gets shifted left while the color of the shifted text gets inverted with the background color. Version of vte is 0.42.4. Steps to reproduce: - Open an ncurses program like ranger or ncmpcpp - Find files with Japanese characters - Scroll over them Actual results: Text gets shifted and colors get inverted Expected results: Normal rendering of text, background, and colors like other non-Japanese text Does this happen every time? It occurs with specific Japanese characters and/or specific character combinations. It can be repeated with following Japanese text reliably: 'BLUNTSIDE - 若いジー - 01 イントロ.flac' 'BLUNTSIDE - 若いジー - 02 幸せの施設.flac' 'BLUNTSIDE - 若いジー - 03 ビーチ.flac' 'BLUNTSIDE - 若いジー - 04 魂.flac' 'BLUNTSIDE - 若いジー - 05 若いジー (REDUX).flac' 'BLUNTSIDE - 若いジー - 06 ドライブ.flac' 'BLUNTSIDE - 若いジー - cover.png' The problem might not show up with other Japanese text.
Created attachment 321163 [details] viewing ncmpcpp in vte
Created attachment 321164 [details] viewing ranger in vte
*** Bug 762051 has been marked as a duplicate of this bug. ***
What's the behavior in other terminal emulators (e.g. xterm)? This looks to me like a bug either in ncurses, or in the said apps.
The problem doesn't show up in urxvt.
Could you please attach screenshots highlighting the difference? For me, it looks broken in urxvt too. What happens in xterm? xterm is our reference, urxvt feels to me like the black sheep of terminal emulators (doing most of the things differently than all other terminals).
Created attachment 321173 [details] viewing ncmpcpp in urxvt
Created attachment 321174 [details] viewing ranger in urxvt
Created attachment 321175 [details] viewing ncmpcpp in xterm
Created attachment 321176 [details] viewing ranger in xterm
The problem doesn't show up in xterm either.
For me it's equally buggy in all three emulators. What's the output of the "locale" command in each?
In urxvt: LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= vte: LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= xterm: LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
Isn't this simply due to these programmes using the standard ncurses library instead of ncursesw library?
That may be a possibility. Having no issues with urxvt makes me bit skeptical though, but it may be due to how urxvt handles text spacing more rigidly than other terminals. It's weird since I haven't noticed this issue with any other Japanese text or other Unicode text either.
Both of these apps use ncursesw on my system. derverrucktefuchs: What's your OS, version?
What's the value of $TERM in xterm and in vte? Could you please do the following steps: - Start vteapp or gnome-terminal - Take note of the window size - Change to the appropriate directory - Start the "script" utility - Start "ranger" - Press as few keys as possible to trigger the bug (you might not need to press any if the screen is corrupted straight away), take note of the exact keys you press - Press 'q' to quit from "ranger" - Press Ctrl-D to quit from "script" - Rename the resulted "typescript" file to "typescript-vte" - Repeat the exact same steps in xterm (make sure to have the exact same window size (as in columns x rows), press exactly the same keys (which do not trigger the bug this time), and at the end rename "typescript" to "typescript-xterm". Please attach both files, and let us know the window size you had. (Apart from the timestamp, these two files may or may not be the same, we'll see.)
Created attachment 321323 [details] typescript for vte
Created attachment 321324 [details] typescript for xterm
xterm $TERM: xterm vte $TERM: xterm-256color The window size for both terminals is maximized in i3 using a 1680x1050 monitor. I'm using a recently updated version of Arch.
It would be important that the two windows have the exact same number of character rows and columns, otherwise I can't compare the outputs. The pixel size doesn't matter, since apparently you have different fonts in the two terminals. Please don't maximize the windows, instead manually resize them to the same size (and no larger than 200x50-ish because my monitor is not as good as yours and I'd like to replay these typescripts :-)). You can use the command "stty size" to check the size. Also please execute "export TERM=xterm-256color" in xterm (or the other way around) to make sure that this setting is the same in the two terminals. Do they still behave differently? I'd like to see the typescripts from this setup: xterm and vte running with the same TERM and the same logical window size. Please also let me know this exact logical window size (character rows x columns) that you chose for both terminals, so that I can replay your typescripts.
I think I got it. The filenames have tons of zero-width (combining) U+3099 characters: before every U+30FC (ー) and at other positions as well. This is where ncurses (or these particular apps) get the width computation wrong. As seen on your screenshots, your xterm and urxvt don't seem to support combining chars, these symbols occupy their own cells. In vte, they don't take up extra space, they modify the look of the previous glyph. Not sure if your xterm and urxvt was compiled without the necessary feature, or you have an older libc or Unicode database. What does this command say? echo -e '\u3099' | wc -L
urxvt, xterm, and vte all output '2'.
http://unicode.org/reports/tr11/ "ED4. East Asian Wide (W): All other characters that are always wide." "6.2 Combining Marks [...] nonspacing marks used only with wide characters are given a W" http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt "3099..309A;W # Mn [2] COMBINING [...]" I guess someone (ncursesw? ranger/ncmpcpp?) misinterprets U+3099's "W" as wide (2 cells) whereas it's actually a combining (0 cell) character because of "Mn". Or did it by any chance change only recently? As far as I understand: - Vte correctly does not advance the cursor on U+3099, but modifies the previous glyph's look. So does my xterm and urxvt. - Your xterm and urxvt draws it as a separate standalone glyph, advancing the cursor by 2 cells. Indeed in the screenshots you can see that the text is rendered differently. I can't tell for sure (I can't read Japanese) but I assume vte's look is the correct one. - I've no clue if the difference between our xterm/urxvt comes from different compile flags, different ncurses version, different glibc or what else. - Ranger/ncmpcpp incorrectly assume that the cursor will advance by 2 cells. - For you, xterm/urxvt's bug and ranger/ncmpcpp's bug cancel out each other's effect, resulting in a correct overall layout (yet incorrectly rendered filename). Any real Unicode guru around here to confirm? Behdad?
For me, the command "echo -e '\u3099' | wc -L" outputs 0. At this point I'm really uncertain which one is the correct, or the newer. What's your glibc version? For me (Ubuntu Xenial) it's 2.21. (Look for a /lib/libc-2.xx.so or /lib/x86_64-linux-gnu/libc-2.xx.so or similar file.)
Markus Kuhn's wcwidth (https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c) returns 0.
I also don't get why vte does not advance by 2 cells if wcwidth(0x3099) indeed returns 2 on your system. After all, vte should respect what wcwidth says. Maybe vte first checks whether it's a combining character, and only if not then it goes for wcwidth. If so, then I think the order should be swapped so that wcwidth has the absolute precedence, and only if it returns zero then we should check if it's a combining one.
"ldd --version" is a simpler and better way to figure out libc's version. I assume you'll get 2.22 (that's what Arch contains currently). If so then it's either a bug (regression) in libc, or a change that I don't understand.
"ldd --version" confirms that libc 2.22 is installed.
U+3099 should definitely take zero cells. vte uses g_unichar_iszerowidth() and g_unichar_iswide() instead of libc wcwidth. I imagine this is done (originally was done), because we cannot rely on wchar to be Unicode. Not sure what the best resolution to this bug is.
If it's indeed a change in glibc, I guess we should file a bug against them. But first it would be good test on other distros to see if it's not a downstream issue.
glibc-2.23 is being released nowish. I'd be curious to hear if it brings a change.
So glibc-2.23 is available in Arch now. I tried a manual compile of vte with the Arch Build System and the issue seemed to stick around. I'm going to wait for the package maintainer to do a more official recompile of vte against glibc-2.23 and see if that will be any different.
Recompiling shouldn't make any difference. If you just upgrade glibc and then restart vte (or all windows of gnome-terminal at once), it fully uses the new one. So I take this as no change from 2.22 to 2.23.
I rebooted after the upgrade, so that eliminates that and shows no change between versions.
Ubuntu Xenial beta has just upgraded glibc from 2.21 to 2.23. The output of "echo -e '\u3099' | wc -L" has changed from 0 to 2 for me too. Also I can see the screen corruption in ranger now. So we're indeed facing a changed behavior in mainstream glibc 2.22 (not some downstream patch).
Filed glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19852
The bug was introduced in glib commit 4a4839c94a4c93ffc0d5b95c69a08b02a57007f2. It's due to a bug in the unicode generation scripts, see https://sourceware.org/bugzilla/show_bug.cgi?id=14094#c18 where the problem was mentioned but the wrong choice made; the script needs to be smarter.
...glibC commit..., of course
-> NOTGNOME.
See also bug 772890.