GNOME Bugzilla – Bug 162262
decomposed UTF8 characters display with spurious spaces
Last modified: 2006-05-13 16:46:21 UTC
Displaying text with decomposed utf-8 characters shows a spurous space after that character. Copied+pasted: $ touch $(echo -e 'ta\314\210st') $ ls täst $ What actually gets displayed: $ ls tä st $ The problem does not show up with composite characters: $ touch $(echo -e 't\303\244st') $ ls täst $ Needless to say, this messes up command line editing.
If you see this in the above bug täst you're seeing a browser bug, which I'm going to file a separate bug on in a minute. :-/ What that line should look like is this: täst i.e. "t" "a-umlaut" "s" "t". The 'what gets displayed' text has a space in the middle.
I faced the same problem, too. It does also occur with many other characters for whom glibc's wcwidth() returns zero. This causes trouble not only at command line editing, but also in full-screen text editors (e.g. joe-3.1 in utf-8 mode) where these kind of characters, if appear in a long line, may cause that line to overflow and wrap in the next line, which in turn causes the topmost status bar of joe to scroll out and the whole screen becomes garbled and unusable. xterm and konsole do not suffer from this bug. Anyway, someone please move it from gnome-terminal to vte :-)
Is this a pango issue rather than a vte issue? It seems to affect everything that displays text e.g.: echo -e 'Spin\314\210al Tap' >tap.txt gedit tap.txt Appears as Spin<umlaut over space>al Tap. And yes - I really think Gnome should correctly render the heavy metal umlaut :)
Created attachment 65306 [details] Incorrectly rendered heavy metal umlaut
Jon could you try using a font that has the combining dieresis with zero width, i.e. a font that is not monowidth. Only a few fonts will actually place the diereisis at the right place.
In gedit I changed the font to a variety of ones I have - they all still render the umlaut to the right of the n. Some of the fonts tried include ones that are part of the MS "corefonts", e.g. Arial. This font renders the text correctly on Windows. Fonts tried (all failed): Arial Bitstream Vera Sans Roman Comic Sans MS URW Chancery L Medium Italic It does make sense that this might be a font rather than a renderer issue.
This is more related with bug 322234. Arial doesn't have a combining dieresis, try instead with U+0300, U+0301 or U+0303. Bitstream Vera Sans, Comic Sans doesn't have any combining diacritics so they come from another font on your machine. URW Chancery L is not a Unicode, no combining diacritics there either. The dieresis you have probably comes from a font that has it with some non-zero width, that's a font bug or the font is monowidth.
The fact that the dieresis is coming from another font means that it's shaped separately, so it's expected that it doesn't combine. As for the vte problem, vte does not use Pango for main rendering in a typical install, and vte does not support combining marks. Dupping. *** This bug has been marked as a duplicate of 149631 ***