Bug 535896 – Some unicode characters cause incorrect cursor position

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 535896 - Some unicode characters cause incorrect cursor position


Summary:	Some unicode characters cause incorrect cursor position


Status:	RESOLVED NOTGNOME

Product:	vte
Classification:	Core
Component:	general
Version:	0.16.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	VTE Maintainers
QA Contact:	VTE Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2008-05-31 10:56 UTC by Leonid Evdokimov
Modified:	2014-06-12 17:06 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
the testcase from the URL in comment 0 (130 bytes, text/plain) 2014-04-27 07:48 UTC, Christian Persch	Details

Description Leonid Evdokimov 2008-05-31 10:56:34 UTC

Please describe the problem:
Try to edit http://darkk.net.ru/garbage/xfce-vte-bug with vim and move
cursor to the end of first line with `$` - cursor is displayed in wrong
place.  It looks like vte bug as it works without issues with xterm and
I have alike bug with weechat while displaying some of unicode characters.

Screenshots of weechat-related issues:
vte-based Terminal: http://darkk.net.ru/weechat/unicode-bug.png
I see same bug with pure `vte`
And xterm is working: http://darkk.net.ru/weechat/unicode-bug-xterm.png

I assume, these are related things.


Steps to reproduce:


Actual results:


Expected results:


Does this happen every time?
Yes, editing the file is easy way to reproduce the problem.

Other information:
I have x11-libs/vte-0.16.13 built with -debug, +doc, +opengl, +python USE-flags.

Comment 1 Christian Persch 2014-04-27 07:48:08 UTC

Created attachment 275245 [details]
the testcase from the URL in comment 0

Confirmed.

Comment 2 Egmont Koblinger 2014-06-11 16:32:09 UTC

I get the same incorrect cursor position in xterm and in vte.

The file contains this:

0924  E0 A4 A4    त      DEVANAGARI LETTER TA      (width: 1)
0941  E0 A5 81    ु      DEVANAGARI VOWEL SIGN U   (width: 0)
092E  E0 A4 AE    म      DEVANAGARI LETTER MA      (width: 1)
094D  E0 A5 8D    ्      DEVANAGARI SIGN VIRAMA   (width: 0)
0020  20                  SPACE
0915  E0 A4 95    क      DEVANAGARI LETTER KA      (width: 1)
0948  E0 A5 88    ै      DEVANAGARI VOWEL SIGN AI   (width: 0)
0938  E0 A4 B8    स      DEVANAGARI LETTER SA      (width: 1)
0947  E0 A5 87    े      DEVANAGARI VOWEL SIGN E   (width: 0)
0020  20                  SPACE
0939  E0 A4 B9    ह      DEVANAGARI LETTER HA      (width: 1)
094B  E0 A5 8B    ो      DEVANAGARI VOWEL SIGN O   (width: 1 - ?!?!?)
003F  3F          ?      QUESTION MARK
000A  0A                  LINE FEED (LF)

According to glibc's wcwidth() or http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c, U+094B has a width of 1.

It's one thing that the way it is rendered in vte is buggy. The circle shouldn't be rendered at all (it's the placeholder to see how the accent is aligned to the base letter). Plus, the glyph overflows to the next cell (where the question mark is) and the cursor also becomes wider when it is over this character.

Is this character perhaps a combining accent that widenes the base character? Is there such a thing at all???

On the other hand, vim's column and byte offset indicator shows when you walk the cells one by one that the byte offset jumps by 6, exactly as with the preceding (basechar + vowel sign) pairs. This looks like a bug in vim, it should respect wcwidth() and not come up with its different assumption.

Comment 3 Behdad Esfahbod 2014-06-11 16:36:16 UTC

(In reply to comment #2)
> I get the same incorrect cursor position in xterm and in vte.
> 
> The file contains this:
> 
> 0924  E0 A4 A4    त      DEVANAGARI LETTER TA      (width: 1)
> 0941  E0 A5 81    ु      DEVANAGARI VOWEL SIGN U   (width: 0)
> 092E  E0 A4 AE    म      DEVANAGARI LETTER MA      (width: 1)
> 094D  E0 A5 8D    ्      DEVANAGARI SIGN VIRAMA   (width: 0)
> 0020  20                  SPACE
> 0915  E0 A4 95    क      DEVANAGARI LETTER KA      (width: 1)
> 0948  E0 A5 88    ै      DEVANAGARI VOWEL SIGN AI   (width: 0)
> 0938  E0 A4 B8    स      DEVANAGARI LETTER SA      (width: 1)
> 0947  E0 A5 87    े      DEVANAGARI VOWEL SIGN E   (width: 0)
> 0020  20                  SPACE
> 0939  E0 A4 B9    ह      DEVANAGARI LETTER HA      (width: 1)
> 094B  E0 A5 8B    ो      DEVANAGARI VOWEL SIGN O   (width: 1 - ?!?!?)
> 003F  3F          ?      QUESTION MARK
> 000A  0A                  LINE FEED (LF)
> 
> According to glibc's wcwidth() or http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c,
> U+094B has a width of 1.
> 
> It's one thing that the way it is rendered in vte is buggy. The circle
> shouldn't be rendered at all (it's the placeholder to see how the accent is
> aligned to the base letter). Plus, the glyph overflows to the next cell (where
> the question mark is) and the cursor also becomes wider when it is over this
> character.
> 
> Is this character perhaps a combining accent that widenes the base character?
> Is there such a thing at all???

Correct.  It's a spacing-mark, as opposed to a non-spacing-mark.  Ie. it has Unicode General_Category Mc as opposed to Mn.

> On the other hand, vim's column and byte offset indicator shows when you walk
> the cells one by one that the byte offset jumps by 6, exactly as with the
> preceding (basechar + vowel sign) pairs. This looks like a bug in vim, it
> should respect wcwidth() and not come up with its different assumption.

Comment 4 Behdad Esfahbod 2014-06-11 16:36:44 UTC

At any rate, the root of the problem is that we don't support complex text in vte.

Comment 5 Egmont Koblinger 2014-06-11 16:46:19 UTC

How much more is there that we don't support? :)

CJK, zero-width combining => okay
RTL, spacing-mark => not okay

What else is there?

As for spacing-mark (or anything new, fwiw) there are more sides of the story. Before we get to rendering, probably the terminal behavior should be clarified.

What if the base char is printed in the last column, and then is followed by a spacing-mark? Should they be combined (as if they were a CJK-like double wide char) and move the base char to the next line? And then live together forever, copy-pasting or rewrapping shouldn't separate them, the cursor over them should be double wide etc., just as with CJKs?

Or is it okay if we just do a little bit of rendering fix, that is, in case of such character we'd align the glyph according to the previous cell? This could be a not-that-hard fix. Logically, these vowel signs would live a separate life from their base characters.

Comment 6 Egmont Koblinger 2014-06-12 16:59:49 UTC

It appears that there are spacing-mark characters that put the vowel sign in front of the base character (e.g. U+093F, http://www.fileformat.info/info/unicode/category/Mc/list.htm).  So the "simple" approach (in the previous comment) cannot work.

I believe the right approach would be the CJK-like approach from the previous comment.  The base character and the vowel are combined into a vteunistr, and is stored in a double wide cell.  Rendering probably wouldn't be that hard, since we render correctly with non-spacing-mark accents already and we also render CJK, this would the combination of these two.  Chances are it'd work out of the box.

The logic to combine the base char and the vowel (especially across a line wrap) is a bit tricky, but not that hopeless.

We'd need to study if let's say multiple spacing-marks over a base character are allowed (I hope not!) and have corresponding safety guards.

Comment 7 Egmont Koblinger 2014-06-12 17:06:40 UTC

Leonid: the incorrect cursor position (what the original report is about) is a bug in vim.  It's buggy under vte, konsole, xterm - and is correct in all of these terminals in emacs.  (The report is quite old and your screenshot is no longer available, chances are that xterm was also broken those days.)  Could you please report the bug to vim's developers?

As for the rendering issue discovered, it's discussed in more detail in bug 584160 so I recommend we continue there.