GNOME Bugzilla – Bug 170180
workaround for Arabic harakat liguature bug in Tahoma
Last modified: 2009-08-05 19:42:19 UTC
Please describe the problem: Pango mistakenly renders harakat ligatures even when a base letter (or a sequence of them) appears in between. A sequence of <Meem, Shadda, Noon, Fatha> should result in two different harakat on Meem and Noon, while pango currently renders it with a Shadda+Fatha ligature on Meem. Steps to reproduce: 1. Install an Arabic font with harakat ligature glyphs (some is available from http://www.farsiweb.info/font/farsifonts-0.4.zip). 2. Open the attached file in gedit. Actual results: It shows a harakat ligature of Shadda+Fatha over Meem. Expected results: It should show a Shadda over Meem and a Fatha over Noon. Does this happen every time? Yes. Other information:
Created attachment 38641 [details] Text file showing the bug
Well, my diagnosis suggests that it's a bug in your fonts AND in Tahoma. You are passing a LookupFlag value of 7 which according to the spec (http://www.microsoft.com/typography/otspec/chapter2.htm) means RightToLeft+IgnoreBaseGlyphs+IgnoreLigatures, while what you really want is 1 (RightToLeft). I fixed this in your fonts and it works perfectly. Remains the problem with Tahoma. I believe we can forget about it if the latest Tahoma has fixed this, otherwise we're stuck with yet another MS bug :(.
Created attachment 38746 [details] [review] Support LookupFlag for ottest With this patch ottest shows the LookupFlags.
Just to make the situation clear, we need a patch to deviate from the OpenType spec and get closer to Uniscribe. Looks like Uniscribe is ignoring the IgnoreBaseGlyphs flag in LookupFlag. Don't know whether IgnoreLigatures should be ignored.
For the record, here is the response I got from Eric Mader: Date: Wed, 6 Apr 2005 14:04:10 -0400 From: Eric Mader To: Behdad Esfahbod Subject: Re: Question about LookupFlag Hi Behdad, I have to say that I find the description of the lookup flags somewhat confusing as well. I think they were invented before any real OT fonts had been built, and were not well thought out - or at least not well documented :-) I'll answer some of your questions below. Regards, Eric Behdad Esfahbod wrote: > Hi Eric, > > Sorry for the noise. I sent the following question to the > OpenType list two times, with no response. Somebody suggested > that I ask you, since you have been the main author of OT in ICU. > > Thanks, > > --behdad > http://behdad.org/ > > ---------- Forwarded message ---------- > Date: Tue, 22 Mar 2005 03:47:07 -0500 > From: Behdad Esfahbod <behdad> > Reply-To: opentype-list > To: Multiple recipients of opentype <opentype-list> > Subject: [OpenType] Question about LookupFlag > > OpenType list address: opentype-list > > Hi, > I'm all confused about how one is supposed to handle LookupFlag. > The OpenType spec once says [1]: "The LookupFlag specifies lookup > qualifiers that assist a text-processing client in substituting > or positioning glyphs." and the LookupFlag bit enumeration > defines: > > 0x0002 IgnoreBaseGlyphs If set, skips over base glyphs > 0x0004 IgnoreLigatures If set, skips over ligatures > 0x0008 IgnoreMarks If set, skips over combining marks > > The Arabic example following the table suggests that one is > supposed to skip some glyphs when matching lookups, which is what > one would expect, but then how one justifies these: > > > * In Example 4 in the same page, a fictional implementation of > the ffi and fi ligatures, sets LookupFlag to 0x000C = > IgnoreLigatures, IgnoreMarks. What does it mean to > IgnoreLigatures here? Does it mean if a ligatures comes in > between an "f" and an "i", the "fi" ligature should be used? > IgnoreLigatures would mean that following the Arabic example > mentioned above. I have to say that I've never been clear on what IgnoreLigatures means. Of course it means to ignore glyphs which are marked as ligatures in the GDEF glyph class table, but I don't really understand how one would use them. Your interpretation of the above example matches my understanding, and seems nonsensical. > * In the page on GSUB [2], it says: "When a string of glyphs > can be replaced with a single ligature glyph, the first glyph is > substituted with the ligature. The remaining glyphs in the string > are deleted, this includes those glyphs that are skipped as a > result of lookup flags." Is it true? So for example if one maps > LAM+ALEF to LAM-ALEF LIGATURE and sets LookupFlags to > IgnoreMarks, then the marks will be lost? I'm pretty sure the > current implementations do not follow this. I agree with you. My understanding of the purpose of IgnoreMarks is that it allows you to form the LAM-ALEF ligature even if there are marks applied to the LAM. Of course, you don't want to delete the marks after you've ignored them - they're still important. My engine *does not* delete ignored glyphs, only the glyphs that were explicitly matched to form the ligature. > * Tahoma sets LookupFlag to 7 = RightToLeft, IgnoreBaseGlyphs, > IgnoreLigatures for its mark ligatures. Isn't it wrong? For > example, a sequence of <Meem, Shadda, Noon, Fatha> should result > in two different harakat on Meem and Noon, while an > implementation (Pango) currently renders it with a Shadda+Fatha > ligature on Meem, followed by Noon [3]. If following the > deletion note above, the conforming rendering should be a > Shadda+Fatha ligature followed by Noon, since the Meem is between > two components of a ligature and should be deleted! Yes, this seems strange. It may be this way because Tahoma was built before all of the subtle details of OT Arabic layout were worked out. I did a couple of experiments and discovered that Uniscribe renders the above sequence correctly, and my code does not - it ligates the Shadda and Fatha. I can think of a few reasons why Uniscribe gets this right - either they ignore the IgnoreBaseGlyphs flag, they're clever about how they tag the glyphs, or they process Meem Shadda separately from Noon Fatha... I find this sort of thing all the time - Uniscribe gets things right because they know what the spec. is *supposed* to say - the rest of us get it wrong because we only know what the spec. *does* say :-) > Thanks in advance, > --behdad > http://behdad.org/
Apparently pango ligates these harakat even across word boundaries when IgnoreBaseGlyphs is on. This happens on both pango 1.13.4 and 1.8.1.
Created attachment 69867 [details] Sample file showing the wide span of IgnoreBaseMarks with pango Attached a sample file
Sergey Malkin says on the OpenType list: "OTLS (Uniscribe) limits positioning by checking whether two marks belong to the same base (or ligature component) and only then applies mark-to-mark rule." Need to carve that exception in our beautifully generic OpenType engine...
Ok, apparently what Sergey describes is part of the intended semantics of the GPOS mechanisms. I merged harfbuzz-ng right now that does this correctly.