GNOME Bugzilla – Bug 787284
Unicode Double Diacritics Not Positioned Correctly
Last modified: 2017-12-02 14:15:05 UTC
Created attachment 359142 [details] Sample with double diacritic typed between 'a' and 'b'. I am using GNOME with the default installation on the latest stable release of Debian GNU/Linux, Debian 9 (stretch). Unicode has some special diacritics that are supposed to span across the preceding glyph and the following glyph. Gedit is placing the double diacritic under the first glyph, extending to the left of the first glyph rather than positioning it half-way under the first glyph and half-way under the glyph that follows. Here is an example: a᷼bcd. [Well what do you know, it displays even differently in this window--maybe because the font is different--but it is still incorrect!] The double diacritic is typed in after the 'a', so it should span across the 'a' and 'b'. Some double diacritics can be seen in this Unicode code chart in the range U+035C..U+0362: http://www.unicode.org/charts/PDF/U0300.pdf To test this, I recommend using my Unifont TrueType font, either the current version (10.0.06) or a later version, because I know the offset (left one charcell width) and spacing (zero) are set correctly. I have looked at other fonts and they do not have these parameters set correctly. Relevant files are: The font: ftp.gnu.org/gnu/unifont/unifont-10.0.06/unifont-10.0.06.ttf GPG Signature: ftp.gnu.org/gnu/unifont/unifont-10.0.06/unifont-10.0.06.ttf.sig Public Key: http://unifoundry.com/1A09227B1F435A33_public.asc My profile sets "LC_ALL=en_US.UTF-8". LibreOffice, the terminal application under GNOME, and Emacs also display double diacritics incorrectly, each in their own way. If there is some commonality in font rendering technology that they share with GNOME and Gedit, please let me know. Otherwise I will try to track this down to its various sources in those other applications as well. I am also attaching a sample UTF-8 file with the same character sequence. The file does not start with the UTF-8 signature, but I have tried with and without that and it has made no difference. Thank you, Paul Hardy
gedit doesn't do any rendering itself but uses GtkSourceView to render source code to the screen, which is a fork of GtkTextView. gnome-terminal uses VTE: https://github.com/GNOME/vte. I have no idea about LibreOffice (apart from the fact they use Qt instead of GTK+) and Emacs. I'll move this over to GtkSourceView.
GtkSourceView is a subclass of GtkTextView, not a fork. LibreOffice uses GTK+ for the GtkWindow and a few other things, but I think it renders text with its own toolkit, VCL. Can you reproduce the bug with pango-view? GtkTextView uses Pango to render the text. If the bug doesn't occur with pango-view, does it happen in GtkTextView by using gtk3-demo?
Created attachment 359249 [details] pango-view rendering of a_bcd.txt Attached is a snapshot of pango-view with the a_bcd.txt file on my system. The double diacritic spans across 'b' and 'c', instead of across 'a' and 'b'. Thank you, Paul Hardy
Created attachment 359250 [details] pango-view rendering of a_bcd.txt with --font=unifont The previous screenshot was not with Unifont; this one is. I am not deleting the other image because the double diacritic is displayed wrong in that font too, but in a different position. The pango-view image using Unifont shows the double diacritic under the 'a' and extending to the left of the 'a'. It should show it spanning across the 'a' and 'b'. Thank you, Paul Hardy
OK, so the bug happens with pango-view. Moving the bug to pango.
This is a font bug. We cannot do anything about it.
What do you believe the bug is in the font? I checked over the glyph metrics with Fontforge before submitting this bug, verified that the widths of the combining marks in question are all zero as they should be, and that their x-offsets are negative half the width of the combining mark as they should be. I also tried fonts that did not have metrics set correctly and whatever gedit uses for rendering rendered those fonts in a similar way. The font rendering engine (pango?) seems to be making up its own mind that because these are combining characters, they need to be handled the same as all other combining characters.
(In reply to Paul Hardy from comment #7) > What do you believe the bug is in the font? I checked over the glyph metrics > with Fontforge before submitting this bug, verified that the widths of the > combining marks in question are all zero as they should be, and that their > x-offsets are negative half the width of the combining mark as they should > be. The width and x-offset of the combining mark are irrelevant. There are two cases: 1. If the font does NOT have a GPOS table, HarfBuzz tries to position things as it sees fit. If that's not good enough, the fix is to add a GPOS table to the font, 2. If there is a GPOS table, then HarfBuzz just applies that. In particular, the MarkBase attachment tables, and any possible contextual adjustments. > I also tried fonts that did not have metrics set correctly and whatever > gedit uses for rendering rendered those fonts in a similar way. The font > rendering engine (pango?) seems to be making up its own mind that because > these are combining characters, they need to be handled the same as all > other combining characters. They are handled the same as all other combining characters.
Thank you for the detailed reply. Now I understand the situation better. GPOS is an OpenType extension to TrueType. Therefore, I would not consider it a bug that Unifont does not have a GPOS table. Unifont just uses TrueType's glyph metrics for horizontal positioning. Would you consider modifying HarfBuzz to use TrueType glyph metrics in the absence of a GPOS table? Alternatively, I could consider adding a GPOS table to Unifont. However, I first would have to figure out how to generate one in FontForge. Also, Unifont is already gigantic. I would be hesitant to add a table that will make it even larger when the font already contains horizontal spacing information. Modifying Unifont also will not change the rendering of HarfBuzz with other plain old TrueType (non-OpenType) fonts.
You can added a empty GPOS table with FontTools after building the font, it shouldn’t take much space. Alternatively you can added some empty GPOS lookups in FontForge, not sure if it will create an empty GPOS table in this case or not.
(In reply to Paul Hardy from comment #9) > Thank you for the detailed reply. Now I understand the situation better. > > GPOS is an OpenType extension to TrueType. Therefore, I would not consider > it a bug that Unifont does not have a GPOS table. Unifont just uses > TrueType's glyph metrics for horizontal positioning. OpenType has been the dominant industry font format of the last 20 years. Without it, scripts used by billions of people CANNOT be rendered correctly. Hence, why Unifont is really nothing more than a collection of bitmaps, not a font useful for rendering real text. Sorry if I'm blunt. > Would you consider modifying HarfBuzz to use TrueType glyph metrics in the > absence of a GPOS table? > > Alternatively, I could consider adding a GPOS table to Unifont. However, I > first would have to figure out how to generate one in FontForge. Also, > Unifont is already gigantic. I would be hesitant to add a table that will > make it even larger when the font already contains horizontal spacing > information. Modifying Unifont also will not change the rendering of > HarfBuzz with other plain old TrueType (non-OpenType) fonts.
(In reply to Behdad Esfahbod from comment #11) > (In reply to Paul Hardy from comment #9) > > Thank you for the detailed reply. Now I understand the situation better. > > > > GPOS is an OpenType extension to TrueType. Therefore, I would not consider > > it a bug that Unifont does not have a GPOS table. Unifont just uses > > TrueType's glyph metrics for horizontal positioning. > > OpenType has been the dominant industry font format of the last 20 years. > Without it, scripts used by billions of people CANNOT be rendered correctly. > Hence, why Unifont is really nothing more than a collection of bitmaps, not > a font useful for rendering real text. Sorry if I'm blunt. Certainly, Unifont is only a font of last resort to render something rather than nothing for an otherwise unknown code point. 20+ years ago, many fonts only supported Latin-1 and font rendering engines could not render complex scripts dynamically on a screen. The font world is certainly a much nicer place today with OpenType. > > Would you consider modifying HarfBuzz to use TrueType glyph metrics in the > > absence of a GPOS table? So I conclude that you do not want to spend the time getting HarfBuzz to work with pre-OpenType fonts that only contain TrueType horizontal spacing information, which is understandable. At least now I know that HarfBuzz is where the rendering is happening. I did not have your depth of experience to determine that initially. The problem with adding anything to Unifont is it is built automatically with a make file. Anything that gets added needs to be doable through FontForge's automated scripting commands. Given the very low quality of Unifont, and its inability to properly render complex scripts anyways, it might not be worth the time even if it turns out that FontForge's scripting capabilities can handle automatically adding a GPOS table. And trying to add full GPOS capabilities to Unifont would definitely be going too far, because of its low resolution. So I suppose the situation will just remain as is.
(In reply to Paul Hardy from comment #12) > (In reply to Behdad Esfahbod from comment #11) > > > > Would you consider modifying HarfBuzz to use TrueType glyph metrics in the > > > absence of a GPOS table? > > So I conclude that you do not want to spend the time getting HarfBuzz to > work with pre-OpenType fonts that only contain TrueType horizontal spacing > information, which is understandable. HarfBuzz actually *does* handle TrueType fonts. It stacks the diacritic marks on top of eachother. That doesn't seem to do what you want in the case of double diacritics. For the double diacritics it tries to do its best centering. I'll check if it can be improved. Which character is it btw in your screenshot? I don't see that in the U+035C..U+0362 range. > The problem with adding anything to Unifont is it is built automatically > with a make file. Anything that gets added needs to be doable through > FontForge's automated scripting commands. Given the very low quality of > Unifont, and its inability to properly render complex scripts anyways, it > might not be worth the time even if it turns out that FontForge's scripting > capabilities can handle automatically adding a GPOS table. And trying to > add full GPOS capabilities to Unifont would definitely be going too far, > because of its low resolution. > > So I suppose the situation will just remain as is. Looks like my understanding of the double-diacritics was wrong. I thought they should be positioned over the two previous characters, not the previous and following ones. I was also confused by your screenshot, because that didn't show the actual problem, probably because the diacritic was rendered using a different font. Fixed: https://github.com/harfbuzz/harfbuzz/commit/8d55340593ce32e55cfbd86a17c0be8750e8fb72 Sorry about that!
Thank you for continuing to pursue this. I did not realize the misunderstanding about Unicode's double diacritic positioning or I would have explained that in more detail. Your original interpretation of how they should be handled actually makes more sense, given how Unicode handles most of its combining marks. I'm glad that this turned out to be a small change in HarfBuzz.
(In reply to Behdad Esfahbod from comment #13) > > Which character is it btw in your screenshot? I don't see that in the > U+035C..U+0362 range. I had forgotten, so I just now copied the text and piped it into "od -c". The UTF-8 output from od of that double diacritic in octal is "341 267 274", so the Unicode code point is U+1DFC. That is in the Combining Diacritical Marks Supplement range, which contains more double diacritics. Here are links to the Unicode code charts for ranges that either do or I think possibly could contain double diacritics in the future: U+0300..U+036F Combining Diacritical Marks: https://www.unicode.org/charts/PDF/U0300.pdf U+1AB0..U+1AFF Combining Diacritical Marks Extended: https://www.unicode.org/charts/PDF/U1AB0.pdf U+1DC0..U+1DFF Combining Diacritical Marks Supplement: https://www.unicode.org/charts/PDF/U1DC0.pdf Of those three, the U+1AB0..U+1AFF range does not currently contain any double diacritics, though it is reasonable that it could contain them in the future. Thanks again.