GNOME Bugzilla – Bug 781676
U+1F3DB CLASSICAL BUILDING incorrectly rendered as a single width character
Last modified: 2017-04-30 20:43:19 UTC
I'm forwarding this from https://bugs.debian.org/860445. I'm not sure whether you would consider this to be a bug in vte, glib's unicode functions, the Unicode spec, or not a bug at all. ----- original report ----- I noticed recently that U+1F3DB CLASSICAL BUILDING is incorrectly being rendered as a singlewidth character when it, like most (all?) emoji, is actually doublewidth. The result is that it overlaps with the subsequent character. It should be rendered as doublewidth. --------------------------- The Unicode spec is ambiguous about some of these characters. The 9.0 EastAsianWidth.txt (ftp://ftp.unicode.org/Public/9.0.0/ucd/EastAsianWidth.txt) lists the 1F3D4..1F3DF range as Neutral, and therefore single width, though in a few other places it says that all emoji are Wide. http://www.unicode.org/reports/tr11/tr11-31.html says: "Emoji style standardized variation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value." The original reporter has reproduced this in Terminator. I have reproduced this in Terminator and gnome-terminal. If I modify glib's g_unichar_iswide function so that it returns true for this character, it renders correctly as a wide character.
Do you know what "Emoji style standardized variation sequences" means? I don't understand these words in this context :-) Also, is there a list somewhere about the codepoints that this applies to? I have a "wtf" feeling... there's a "width" property, Unicode folks fill that out with "narrow" and somewhere else they make a note that "hey, that's actually wide"? Why not mark them as wide, then? I really don't understand this entire situation. Anyways, VTE uses glib's character database, although there's a bug 772890 that probably it should use glibc's. I think here both databases claim the said characters are narrow. If the given characters are really wide, IMO it should be fixed in glibc and glib. VTE should not override what these libraries say, that can only lead to even worse display corruptions than just an overflowing glyph.
So this is bug 767529 last paragraph. http://unicode.org/Public/emoji/5.0/emoji-variation-sequences.txt http://unicode.org/emoji/charts/emoji-variants.html http://unicode.org/reports/tr51/proposed.html
So digging into Unicode TR11 (linked above), it says that any emoji with Emoji_Presentation, with a series of exceptions, are classified as East Asian Wide. Others are given Neutral as that is the default. Unicode says "Strictly speaking, it makes no sense to talk of narrow and wide for neutral characters, but because for all practical purposes they behave like Na, they are treated as narrow characters (the same as Na) under the recommendations below." The Emoji_Presentation property is defined in TR51 (http://www.unicode.org/reports/tr51/) as marking a character as displayed as emoji (the colourful, filled-in glyph) by default rather than as text (black-and-white glyph), even though they may be emoji characters as defined by the Emoji property. CLASSICAL BUILDING is only one of many such characters, which have East Asian Width of N, not W. Emoji presentation can be explicitly requested by way of U+FE0F VARIATION SELECTOR-16, although adding this to my file had no effect on the display of the character. I believe that the font my terminal is using is unifont, which specifies width 2 for these characters. I have not looked into other fixed-width fonts, but I imagine they mostly behave similarly; having to encode the seemingly arbitrary Unicode distinction between text presentation and emoji presentation by default into the width of your characters seems rather ridiculous. The ideal approach would be to get the width from the font, rather than assuming that width is consistent among fonts. Assuming that is not an option, the correct approach may be to mark all characters with the Emoji property as wide for terminals.
(In reply to Alexis Hunt from comment #3) > The ideal approach would be to get the width from the font, > rather than assuming that width is consistent among fonts. No way! I mean: if you'd want the font to influence the layout to such extent, you can do it. Pretty much every graphical app does that. Maybe that's what you want. :) In terminal emulation, though, no way! In order for the displayed contents not to fall apart, apps need to know how much space their output will consume, and of course this needs to match the actual behavior of the terminal emulator. We already have way too many problems due to this not always working as expected, see e.g. bug 772890 and the bugs linked from there, but that's not an excute to make it even worse. . We also have a setting Compatibility -> Ambiguous width characers: narrow/wide (in gnome-terminal's UI; and an API method in VTE). Unfortunately the "wide" setting hardly works correctly with apps because there's no matching locale definition available, that is, the terminal emulator behaves differently than apps expect it. We could introduce a similar option for emojis, but I'm afraid that would similarly be a source of even more troubles rather than solution/workaround to some problems. But maybe that's an acceptable compromise. There's no way the width defined in the font could reasonably be communicated to the apps running inside the terminal emulator, so apps would have to second guess, and sure would guess it wrong quite often. I'm firmly against introducing the possibility of the overall layout of an app being correct vs broken depending on the user's choice of their preferred font.
(In reply to Alexis Hunt from comment #3) > So digging into Unicode TR11 (linked above), it says that any emoji with > Emoji_Presentation, with a series of exceptions, are classified as East > Asian Wide. Others are given Neutral as that is the default. Unicode says > "Strictly speaking, it makes no sense to talk of narrow and wide for neutral > characters, but because for all practical purposes they behave like Na, they > are treated as narrow characters (the same as Na) under the recommendations > below." > > The Emoji_Presentation property is defined in TR51 > (http://www.unicode.org/reports/tr51/) as marking a character as displayed > as emoji (the colourful, filled-in glyph) by default rather than as text > (black-and-white glyph), even though they may be emoji characters as defined > by the Emoji property. CLASSICAL BUILDING is only one of many such > characters, which have East Asian Width of N, not W. Emoji presentation can > be explicitly requested by way of U+FE0F VARIATION SELECTOR-16, although > adding this to my file had no effect on the display of the character. That's because vte doesn't implement any special emoji handling yet. *** This bug has been marked as a duplicate of bug 767529 ***