Bug 762052 – ncurses text gets shifted with unicode Japanese characters

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 762052 - ncurses text gets shifted with unicode Japanese characters


Summary:	ncurses text gets shifted with unicode Japanese characters


Status:	RESOLVED NOTGNOME

Product:	vte
Classification:	Core
Component:	general
Version:	0.42.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	VTE Maintainers
QA Contact:	VTE Maintainers

URL:
Whiteboard:

Duplicates:	762051 (view as bug list)
Depends on:
Blocks:

Reported:	2016-02-14 22:48 UTC by derverrucktefuchs
Modified:	2016-10-13 20:28 UTC

See Also:	https://sourceware.org/bugzilla/show_bug.cgi?id=19852
GNOME target:	---
GNOME version:	---

Attachments
viewing ncmpcpp in vte (65.12 KB, image/png) 2016-02-14 22:50 UTC, derverrucktefuchs	Details
viewing ranger in vte (83.09 KB, image/png) 2016-02-14 22:50 UTC, derverrucktefuchs	Details
viewing ncmpcpp in urxvt (64.59 KB, image/png) 2016-02-15 01:06 UTC, derverrucktefuchs	Details
viewing ranger in urxvt (73.76 KB, image/png) 2016-02-15 01:06 UTC, derverrucktefuchs	Details
viewing ncmpcpp in xterm (32.07 KB, image/png) 2016-02-15 01:07 UTC, derverrucktefuchs	Details
viewing ranger in xterm (34.51 KB, image/png) 2016-02-15 01:07 UTC, derverrucktefuchs	Details
typescript for vte (6.42 KB, text/plain) 2016-02-15 22:47 UTC, derverrucktefuchs	Details
typescript for xterm (6.44 KB, text/plain) 2016-02-15 22:47 UTC, derverrucktefuchs	Details

Description derverrucktefuchs 2016-02-14 22:48:42 UTC

Scrolling over lines in ncurses interfaces with specific Japanese characters/character combinations results in text and colors getting shifted. Some text in the same line as troublesome Japanese characters gets shifted left while the color of the shifted text gets inverted with the background color. Version of vte is 0.42.4.

Steps to reproduce:
- Open an ncurses program like ranger or ncmpcpp
- Find files with Japanese characters
- Scroll over them

Actual results:
Text gets shifted and colors get inverted


Expected results:
Normal rendering of text, background, and colors like other non-Japanese text


Does this happen every time?
It occurs with specific Japanese characters and/or specific character
combinations. It can be repeated with following Japanese text reliably:

'BLUNTSIDE - 若いジー - 01 イントロ.flac'   
'BLUNTSIDE - 若いジー - 02 幸せの施設.flac' 
'BLUNTSIDE - 若いジー - 03 ビーチ.flac'  
'BLUNTSIDE - 若いジー - 04 魂.flac'	
'BLUNTSIDE - 若いジー - 05 若いジー (REDUX).flac'  
'BLUNTSIDE - 若いジー - 06 ドライブ.flac'
'BLUNTSIDE - 若いジー - cover.png'

The problem might not show up with other Japanese text.

Comment 1 derverrucktefuchs 2016-02-14 22:50:13 UTC

Created attachment 321163 [details]
viewing ncmpcpp in vte

Comment 2 derverrucktefuchs 2016-02-14 22:50:46 UTC

Created attachment 321164 [details]
viewing ranger in vte

Comment 3 Egmont Koblinger 2016-02-14 23:58:34 UTC

*** Bug 762051 has been marked as a duplicate of this bug. ***

Comment 4 Egmont Koblinger 2016-02-15 00:04:56 UTC

What's the behavior in other terminal emulators (e.g. xterm)?

This looks to me like a bug either in ncurses, or in the said apps.

Comment 5 derverrucktefuchs 2016-02-15 00:08:35 UTC

The problem doesn't show up in urxvt.

Comment 6 Egmont Koblinger 2016-02-15 00:15:53 UTC

Could you please attach screenshots highlighting the difference? For me, it looks broken in urxvt too.

What happens in xterm? xterm is our reference, urxvt feels to me like the black sheep of terminal emulators (doing most of the things differently than all other terminals).

Comment 7 derverrucktefuchs 2016-02-15 01:06:06 UTC

Created attachment 321173 [details]
viewing ncmpcpp in urxvt

Comment 8 derverrucktefuchs 2016-02-15 01:06:42 UTC

Created attachment 321174 [details]
viewing ranger in urxvt

Comment 9 derverrucktefuchs 2016-02-15 01:07:11 UTC

Created attachment 321175 [details]
viewing ncmpcpp in xterm

Comment 10 derverrucktefuchs 2016-02-15 01:07:42 UTC

Created attachment 321176 [details]
viewing ranger in xterm

Comment 11 derverrucktefuchs 2016-02-15 01:08:23 UTC

The problem doesn't show up in xterm either.

Comment 12 Egmont Koblinger 2016-02-15 07:35:58 UTC

For me it's equally buggy in all three emulators.

What's the output of the "locale" command in each?

Comment 13 derverrucktefuchs 2016-02-15 08:55:16 UTC

In urxvt:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

vte:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

xterm:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Comment 14 Christian Persch 2016-02-15 09:51:38 UTC

Isn't this simply due to these programmes using the standard ncurses library instead of ncursesw library?

Comment 15 derverrucktefuchs 2016-02-15 10:59:32 UTC

That may be a possibility. Having no issues with urxvt makes me bit skeptical though, but it may be due to how urxvt handles text spacing more rigidly than other terminals. It's weird since I haven't noticed this issue with any other Japanese text or other Unicode text either.

Comment 16 Egmont Koblinger 2016-02-15 15:15:45 UTC

Both of these apps use ncursesw on my system.

derverrucktefuchs: What's your OS, version?

Comment 17 Egmont Koblinger 2016-02-15 15:50:44 UTC

What's the value of $TERM in xterm and in vte?

Could you please do the following steps:

- Start vteapp or gnome-terminal
- Take note of the window size
- Change to the appropriate directory
- Start the "script" utility
- Start "ranger"
- Press as few keys as possible to trigger the bug (you might not need to press any if the screen is corrupted straight away), take note of the exact keys you press
- Press 'q' to quit from "ranger"
- Press Ctrl-D to quit from "script"
- Rename the resulted "typescript" file to "typescript-vte"

- Repeat the exact same steps in xterm (make sure to have the exact same window size (as in columns x rows), press exactly the same keys (which do not trigger the bug this time), and at the end rename "typescript" to "typescript-xterm".

Please attach both files, and let us know the window size you had. (Apart from the timestamp, these two files may or may not be the same, we'll see.)

Comment 18 derverrucktefuchs 2016-02-15 22:47:27 UTC

Created attachment 321323 [details]
typescript for vte

Comment 19 derverrucktefuchs 2016-02-15 22:47:51 UTC

Created attachment 321324 [details]
typescript for xterm

Comment 20 derverrucktefuchs 2016-02-15 22:52:09 UTC

xterm $TERM:

xterm

vte $TERM:

xterm-256color

The window size for both terminals is maximized in i3 using a 1680x1050 monitor. I'm using a recently updated version of Arch.

Comment 21 Egmont Koblinger 2016-02-15 23:37:39 UTC

It would be important that the two windows have the exact same number of character rows and columns, otherwise I can't compare the outputs. The pixel size doesn't matter, since apparently you have different fonts in the two terminals.

Please don't maximize the windows, instead manually resize them to the same size (and no larger than 200x50-ish because my monitor is not as good as yours and I'd like to replay these typescripts :-)). You can use the command "stty size" to check the size.

Also please execute "export TERM=xterm-256color" in xterm (or the other way around) to make sure that this setting is the same in the two terminals.

Do they still behave differently? I'd like to see the typescripts from this setup: xterm and vte running with the same TERM and the same logical window size.

Please also let me know this exact logical window size (character rows x columns) that you chose for both terminals, so that I can replay your typescripts.

Comment 22 Egmont Koblinger 2016-02-16 00:34:14 UTC

I think I got it.

The filenames have tons of zero-width (combining) U+3099 characters: before every U+30FC (ー) and at other positions as well. This is where ncurses (or these particular apps) get the width computation wrong.

As seen on your screenshots, your xterm and urxvt don't seem to support combining chars, these symbols occupy their own cells. In vte, they don't take up extra space, they modify the look of the previous glyph.

Not sure if your xterm and urxvt was compiled without the necessary feature, or you have an older libc or Unicode database.

What does this command say?

echo -e '\u3099' | wc -L

Comment 23 derverrucktefuchs 2016-02-16 01:00:46 UTC

urxvt, xterm, and vte all output '2'.

Comment 24 Egmont Koblinger 2016-02-16 01:04:28 UTC

http://unicode.org/reports/tr11/

"ED4. East Asian Wide (W): All other characters that are always wide."

"6.2 Combining Marks [...] nonspacing marks used only with wide characters are given a W"

http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt

"3099..309A;W     # Mn     [2] COMBINING [...]"

I guess someone (ncursesw? ranger/ncmpcpp?) misinterprets U+3099's "W" as wide (2 cells) whereas it's actually a combining (0 cell) character because of "Mn".

Or did it by any chance change only recently?

As far as I understand:

- Vte correctly does not advance the cursor on U+3099, but modifies the previous glyph's look. So does my xterm and urxvt.

- Your xterm and urxvt draws it as a separate standalone glyph, advancing the cursor by 2 cells. Indeed in the screenshots you can see that the text is rendered differently. I can't tell for sure (I can't read Japanese) but I assume vte's look is the correct one.

- I've no clue if the difference between our xterm/urxvt comes from different compile flags, different ncurses version, different glibc or what else.

- Ranger/ncmpcpp incorrectly assume that the cursor will advance by 2 cells.

- For you, xterm/urxvt's bug and ranger/ncmpcpp's bug cancel out each other's effect, resulting in a correct overall layout (yet incorrectly rendered filename).

Any real Unicode guru around here to confirm? Behdad?

Comment 25 Egmont Koblinger 2016-02-16 01:06:47 UTC

For me, the command "echo -e '\u3099' | wc -L" outputs 0.

At this point I'm really uncertain which one is the correct, or the newer.

What's your glibc version? For me (Ubuntu Xenial) it's 2.21. (Look for a /lib/libc-2.xx.so or /lib/x86_64-linux-gnu/libc-2.xx.so or similar file.)

Comment 26 Egmont Koblinger 2016-02-16 01:12:41 UTC

Markus Kuhn's wcwidth (https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c) returns 0.

Comment 27 Egmont Koblinger 2016-02-16 01:35:07 UTC

I also don't get why vte does not advance by 2 cells if wcwidth(0x3099) indeed returns 2 on your system. After all, vte should respect what wcwidth says.

Maybe vte first checks whether it's a combining character, and only if not then it goes for wcwidth. If so, then I think the order should be swapped so that wcwidth has the absolute precedence, and only if it returns zero then we should check if it's a combining one.

Comment 28 Egmont Koblinger 2016-02-16 01:41:08 UTC

"ldd --version" is a simpler and better way to figure out libc's version. I assume you'll get 2.22 (that's what Arch contains currently).

If so then it's either a bug (regression) in libc, or a change that I don't understand.

Comment 29 derverrucktefuchs 2016-02-16 01:59:42 UTC

"ldd --version" confirms that libc 2.22 is installed.

Comment 30 Behdad Esfahbod 2016-02-17 12:05:45 UTC

U+3099 should definitely take zero cells.  vte uses g_unichar_iszerowidth() and g_unichar_iswide() instead of libc wcwidth.

I imagine this is done (originally was done), because we cannot rely on wchar to be Unicode.  Not sure what the best resolution to this bug is.

Comment 31 Egmont Koblinger 2016-02-17 13:42:00 UTC

If it's indeed a change in glibc, I guess we should file a bug against them. But first it would be good test on other distros to see if it's not a downstream issue.

Comment 32 Egmont Koblinger 2016-02-18 18:27:03 UTC

glibc-2.23 is being released nowish. I'd be curious to hear if it brings a change.

Comment 33 derverrucktefuchs 2016-02-22 18:24:42 UTC

So glibc-2.23 is available in Arch now. I tried a manual compile of vte with the Arch Build System and the issue seemed to stick around. I'm going to wait for the package maintainer to do a more official recompile of vte against glibc-2.23 and see if that will be any different.

Comment 34 Egmont Koblinger 2016-02-22 18:47:54 UTC

Recompiling shouldn't make any difference. If you just upgrade glibc and then restart vte (or all windows of gnome-terminal at once), it fully uses the new one.

So I take this as no change from 2.22 to 2.23.

Comment 35 derverrucktefuchs 2016-02-22 20:21:12 UTC

I rebooted after the upgrade, so that eliminates that and shows no change between versions.

Comment 36 Egmont Koblinger 2016-03-22 05:23:19 UTC

Ubuntu Xenial beta has just upgraded glibc from 2.21 to 2.23.

The output of "echo -e '\u3099' | wc -L" has changed from 0 to 2 for me too. Also I can see the screen corruption in ranger now.

So we're indeed facing a changed behavior in mainstream glibc 2.22 (not some downstream patch).

Comment 37 Egmont Koblinger 2016-03-22 09:14:49 UTC

Filed glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19852

Comment 38 Christian Persch 2016-03-22 09:25:07 UTC

The bug was introduced in glib commit 4a4839c94a4c93ffc0d5b95c69a08b02a57007f2. It's due to a bug in the unicode generation scripts, see https://sourceware.org/bugzilla/show_bug.cgi?id=14094#c18 where the problem was mentioned but the wrong choice made; the script needs to be smarter.

Comment 39 Christian Persch 2016-03-22 09:25:34 UTC

...glibC commit..., of course

Comment 40 Christian Persch 2016-05-08 08:27:28 UTC

-> NOTGNOME.

Comment 41 Egmont Koblinger 2016-10-13 20:28:41 UTC

See also bug 772890.