GNOME Bugzilla – Bug 472657
Renders U+200B (zero width space) visibly under certain conditions
Last modified: 2012-08-18 17:46:39 UTC
[ From http://bugs.debian.org/439767 by Rich Felker ] "After upgrading my system, the latest Pango renders U+200B (zero-width space) visibly under certain conditions, as a "missing glyph" box containing the hex value. Particularly, Pango seems to be a looking for a glyph for this character matching the current language/script. In my case, Tibetan characters adjacent to U+200B cause the misrendering to happen. I first observed the problem on a Google search (in Iceweasel) but have since been able to reproduce it in GTK+ text widgets by first typing U+200B (using an input method) then moving the cursor before the U+200B character and typing any Tibetan character. Thus, I am fairly confident that the bug is in Pango itself and not GTK+ or Iceweasel. I suspect the Tibetan fonts I am using lack a glyph for U+200B, but Pango should not be insisting on trying to find a "Tibetan version" of this character. It probably shouldn't even look for glyphs at all, but instead always treat it as a zero-width character with no visible glyph... but if it is going to use a glyph it should grab one from any available font. Screenshot of the issue: http://www.aerifal.cx/~dalias/images/200b.png I'm using Monlam Uni Ochan1, from lobsangmonlam.org. Indeed, the problem goes away if I remove it so that Jomolhari is used. But in the past, the problem didn't exist even though I was using Monlam's font. And, opening up both fonts in FontForge, I see that Jomolhari has blank glyphs for all the various spaces while Monlam's fonts lack them. Still, I think the old behavior was correct. Space characters (especially zero-width whose "glyph" is 100% defined by Unicode and cannot vary) should never appear as [200B] etc. replacement glyphs regardless of what's present or missing in the font, and especially when the font wasn't even selected manually but rather used as a fallback for scripts not in the selected font."
Do you also see it in gedit?
This is the response I got; "Yes, see screenshot attached. Typing any Latin character between the Tibetan text and the U+200B makes the visible glyph vanish. The Monlam Uni Ochan1 font I'm using can be obtained from the Monlam Bod-yig v2 zip file on my Tibetan fonts site, http://www.aerifal.cx/~dalias/bodyig/fonts/ (The alternative is the upstream site that distributes it only as a Windows .exe file, www.lobsangmonlam.org.) My site has it listed under legacy fonts, but this is just because it has glyphs on top of other script ranges that do not belong to Tibetan, as well as other non-Unicode fonts packaged with it; the font in question here does have valid OpenType tables and works fine for displaying Unicode-encoded Tibetan on both Linux/Pango and Windows." Screenshot: http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=22;filename=gedit_pango_tib_bug.png;att=1;bug=439767
We've merged HarfBuzz branch now, which fixes this.