GNOME Bugzilla – Bug 535896
Some unicode characters cause incorrect cursor position
Last modified: 2014-06-12 17:06:40 UTC
Please describe the problem: Try to edit http://darkk.net.ru/garbage/xfce-vte-bug with vim and move cursor to the end of first line with `$` - cursor is displayed in wrong place. It looks like vte bug as it works without issues with xterm and I have alike bug with weechat while displaying some of unicode characters. Screenshots of weechat-related issues: vte-based Terminal: http://darkk.net.ru/weechat/unicode-bug.png I see same bug with pure `vte` And xterm is working: http://darkk.net.ru/weechat/unicode-bug-xterm.png I assume, these are related things. Steps to reproduce: Actual results: Expected results: Does this happen every time? Yes, editing the file is easy way to reproduce the problem. Other information: I have x11-libs/vte-0.16.13 built with -debug, +doc, +opengl, +python USE-flags.
Created attachment 275245 [details] the testcase from the URL in comment 0 Confirmed.
I get the same incorrect cursor position in xterm and in vte. The file contains this: 0924 E0 A4 A4 त DEVANAGARI LETTER TA (width: 1) 0941 E0 A5 81 ु DEVANAGARI VOWEL SIGN U (width: 0) 092E E0 A4 AE म DEVANAGARI LETTER MA (width: 1) 094D E0 A5 8D ् DEVANAGARI SIGN VIRAMA (width: 0) 0020 20 SPACE 0915 E0 A4 95 क DEVANAGARI LETTER KA (width: 1) 0948 E0 A5 88 ै DEVANAGARI VOWEL SIGN AI (width: 0) 0938 E0 A4 B8 स DEVANAGARI LETTER SA (width: 1) 0947 E0 A5 87 े DEVANAGARI VOWEL SIGN E (width: 0) 0020 20 SPACE 0939 E0 A4 B9 ह DEVANAGARI LETTER HA (width: 1) 094B E0 A5 8B ो DEVANAGARI VOWEL SIGN O (width: 1 - ?!?!?) 003F 3F ? QUESTION MARK 000A 0A LINE FEED (LF) According to glibc's wcwidth() or http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c, U+094B has a width of 1. It's one thing that the way it is rendered in vte is buggy. The circle shouldn't be rendered at all (it's the placeholder to see how the accent is aligned to the base letter). Plus, the glyph overflows to the next cell (where the question mark is) and the cursor also becomes wider when it is over this character. Is this character perhaps a combining accent that widenes the base character? Is there such a thing at all??? On the other hand, vim's column and byte offset indicator shows when you walk the cells one by one that the byte offset jumps by 6, exactly as with the preceding (basechar + vowel sign) pairs. This looks like a bug in vim, it should respect wcwidth() and not come up with its different assumption.
(In reply to comment #2) > I get the same incorrect cursor position in xterm and in vte. > > The file contains this: > > 0924 E0 A4 A4 त DEVANAGARI LETTER TA (width: 1) > 0941 E0 A5 81 ु DEVANAGARI VOWEL SIGN U (width: 0) > 092E E0 A4 AE म DEVANAGARI LETTER MA (width: 1) > 094D E0 A5 8D ् DEVANAGARI SIGN VIRAMA (width: 0) > 0020 20 SPACE > 0915 E0 A4 95 क DEVANAGARI LETTER KA (width: 1) > 0948 E0 A5 88 ै DEVANAGARI VOWEL SIGN AI (width: 0) > 0938 E0 A4 B8 स DEVANAGARI LETTER SA (width: 1) > 0947 E0 A5 87 े DEVANAGARI VOWEL SIGN E (width: 0) > 0020 20 SPACE > 0939 E0 A4 B9 ह DEVANAGARI LETTER HA (width: 1) > 094B E0 A5 8B ो DEVANAGARI VOWEL SIGN O (width: 1 - ?!?!?) > 003F 3F ? QUESTION MARK > 000A 0A LINE FEED (LF) > > According to glibc's wcwidth() or http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c, > U+094B has a width of 1. > > It's one thing that the way it is rendered in vte is buggy. The circle > shouldn't be rendered at all (it's the placeholder to see how the accent is > aligned to the base letter). Plus, the glyph overflows to the next cell (where > the question mark is) and the cursor also becomes wider when it is over this > character. > > Is this character perhaps a combining accent that widenes the base character? > Is there such a thing at all??? Correct. It's a spacing-mark, as opposed to a non-spacing-mark. Ie. it has Unicode General_Category Mc as opposed to Mn. > On the other hand, vim's column and byte offset indicator shows when you walk > the cells one by one that the byte offset jumps by 6, exactly as with the > preceding (basechar + vowel sign) pairs. This looks like a bug in vim, it > should respect wcwidth() and not come up with its different assumption.
At any rate, the root of the problem is that we don't support complex text in vte.
How much more is there that we don't support? :) CJK, zero-width combining => okay RTL, spacing-mark => not okay What else is there? As for spacing-mark (or anything new, fwiw) there are more sides of the story. Before we get to rendering, probably the terminal behavior should be clarified. What if the base char is printed in the last column, and then is followed by a spacing-mark? Should they be combined (as if they were a CJK-like double wide char) and move the base char to the next line? And then live together forever, copy-pasting or rewrapping shouldn't separate them, the cursor over them should be double wide etc., just as with CJKs? Or is it okay if we just do a little bit of rendering fix, that is, in case of such character we'd align the glyph according to the previous cell? This could be a not-that-hard fix. Logically, these vowel signs would live a separate life from their base characters.
It appears that there are spacing-mark characters that put the vowel sign in front of the base character (e.g. U+093F, http://www.fileformat.info/info/unicode/category/Mc/list.htm). So the "simple" approach (in the previous comment) cannot work. I believe the right approach would be the CJK-like approach from the previous comment. The base character and the vowel are combined into a vteunistr, and is stored in a double wide cell. Rendering probably wouldn't be that hard, since we render correctly with non-spacing-mark accents already and we also render CJK, this would the combination of these two. Chances are it'd work out of the box. The logic to combine the base char and the vowel (especially across a line wrap) is a bit tricky, but not that hopeless. We'd need to study if let's say multiple spacing-marks over a base character are allowed (I hope not!) and have corresponding safety guards.
Leonid: the incorrect cursor position (what the original report is about) is a bug in vim. It's buggy under vte, konsole, xterm - and is correct in all of these terminals in emacs. (The report is quite old and your screenshot is no longer available, chances are that xterm was also broken those days.) Could you please report the bug to vim's developers? As for the rendering issue discovered, it's discussed in more detail in bug 584160 so I recommend we continue there.