Bug 170180 – workaround for Arabic harakat liguature bug in Tahoma

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 170180 - workaround for Arabic harakat liguature bug in Tahoma


Summary:	workaround for Arabic harakat liguature bug in Tahoma


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	general
Version:	1.8.x
Hardware:	Other All

Importance:	High normal
Target Milestone:	Medium fix
Assigned To:	pango-maint
QA Contact:	pango-maint

URL:
Whiteboard:

Depends on:
Blocks:	Persian

Reported:	2005-03-13 13:46 UTC by Roozbeh Pournader
Modified:	2009-08-05 19:42 UTC

See Also:
GNOME target:	---
GNOME version:	2.9/2.10

Attachments
Text file showing the bug (9 bytes, text/plain) 2005-03-13 13:47 UTC, Roozbeh Pournader		Details
Support LookupFlag for ottest (3.12 KB, patch) 2005-03-15 12:36 UTC, Behdad Esfahbod	committed	Details \| Review
Sample file showing the wide span of IgnoreBaseMarks with pango (32 bytes, text/plain) 2006-07-29 10:59 UTC, Roozbeh Pournader		Details

Description Roozbeh Pournader 2005-03-13 13:46:15 UTC

Please describe the problem:
Pango mistakenly renders harakat ligatures even when a base letter (or a
sequence of them) appears in between. A sequence of <Meem, Shadda, Noon, Fatha>
should result in two different harakat on Meem and Noon, while pango currently
renders it with a Shadda+Fatha ligature on Meem.

Steps to reproduce:
1. Install an Arabic font with harakat ligature glyphs (some is available from
http://www.farsiweb.info/font/farsifonts-0.4.zip).
2. Open the attached file in gedit.



Actual results:
It shows a harakat ligature of Shadda+Fatha over Meem.

Expected results:
It should show a Shadda over Meem and a Fatha over Noon.

Does this happen every time?
Yes.

Other information:

Comment 1 Roozbeh Pournader 2005-03-13 13:47:31 UTC

Created attachment 38641 [details]
Text file showing the bug

Comment 2 Behdad Esfahbod 2005-03-15 12:32:53 UTC

Well, my diagnosis suggests that it's a bug in your fonts AND in Tahoma.  You
are passing a LookupFlag value of 7 which according to the spec
(http://www.microsoft.com/typography/otspec/chapter2.htm) means
RightToLeft+IgnoreBaseGlyphs+IgnoreLigatures, while what you really want is 1
(RightToLeft).  I fixed this in your fonts and it works perfectly.  Remains the
problem with Tahoma.  I believe we can forget about it if the latest Tahoma has
fixed this, otherwise we're stuck with yet another MS bug :(.

Comment 3 Behdad Esfahbod 2005-03-15 12:36:29 UTC

Created attachment 38746 [details] [review]
Support LookupFlag for ottest

With this patch ottest shows the LookupFlags.

Comment 4 Behdad Esfahbod 2005-06-14 19:55:53 UTC

Just to make the situation clear, we need a patch to deviate from the OpenType
spec and get closer to Uniscribe.  Looks like Uniscribe is ignoring the
IgnoreBaseGlyphs flag in LookupFlag.  Don't know whether IgnoreLigatures should
be ignored.

Comment 5 Behdad Esfahbod 2005-07-22 18:37:59 UTC

For the record, here is the response I got from Eric Mader:

Date: Wed, 6 Apr 2005 14:04:10 -0400
From: Eric Mader
To:     Behdad Esfahbod
Subject: Re: Question about LookupFlag

Hi Behdad,

I have to say that I find the description of the lookup flags somewhat
confusing as well. I think they were invented before any real OT fonts
had been built, and were not well thought out - or at least not well
documented :-)

I'll answer some of your questions below.

Regards,
Eric

Behdad Esfahbod wrote:
> Hi Eric,
>
> Sorry for the noise.  I sent the following question to the
> OpenType list two times, with no response.  Somebody suggested
> that I ask you, since you have been the main author of OT in ICU.
>
> Thanks,
>
> --behdad
> http://behdad.org/
>
> ---------- Forwarded message ----------
> Date: Tue, 22 Mar 2005 03:47:07 -0500
> From: Behdad Esfahbod <behdad>
> Reply-To: opentype-list
> To: Multiple recipients of opentype <opentype-list>
> Subject: [OpenType] Question about LookupFlag
>
> OpenType list address: opentype-list
>
> Hi,
> I'm all confused about how one is supposed to handle LookupFlag.
> The OpenType spec once says [1]: "The LookupFlag specifies lookup
> qualifiers that assist a text-processing client in substituting
> or positioning glyphs."  and the LookupFlag bit enumeration
> defines:
>
> 0x0002        IgnoreBaseGlyphs        If set, skips over base glyphs
> 0x0004        IgnoreLigatures         If set, skips over ligatures
> 0x0008        IgnoreMarks             If set, skips over combining marks
>
> The Arabic example following the table suggests that one is
> supposed to skip some glyphs when matching lookups, which is what
> one would expect, but then how one justifies these:
>
>
>   * In Example 4 in the same page, a fictional implementation of
> the ffi and fi ligatures, sets LookupFlag to 0x000C =
> IgnoreLigatures, IgnoreMarks.  What does it mean to
> IgnoreLigatures here?  Does it mean if a ligatures comes in
> between an "f" and an "i", the "fi" ligature should be used?
> IgnoreLigatures would mean that following the Arabic example
> mentioned above.

I have to say that I've never been clear on what IgnoreLigatures means.
Of course it means to ignore glyphs which are marked as ligatures in the
GDEF glyph class table, but I don't really understand how one would use
them. Your interpretation of the above example matches my understanding,
and seems nonsensical.

>   * In the page on GSUB [2], it says: "When a string of glyphs
> can be replaced with a single ligature glyph, the first glyph is
> substituted with the ligature. The remaining glyphs in the string
> are deleted, this includes those glyphs that are skipped as a
> result of lookup flags."  Is it true?  So for example if one maps
> LAM+ALEF to LAM-ALEF LIGATURE and sets LookupFlags to
> IgnoreMarks, then the marks will be lost?  I'm pretty sure the
> current implementations do not follow this.

I agree with you. My understanding of the purpose of IgnoreMarks is that
it allows you to form the LAM-ALEF ligature even if there are marks
applied to the LAM. Of course, you don't want to delete the marks after
you've ignored them - they're still important. My engine *does not*
delete ignored glyphs, only the glyphs that were explicitly matched to
form the ligature.

>   * Tahoma sets LookupFlag to 7 = RightToLeft, IgnoreBaseGlyphs,
> IgnoreLigatures for its mark ligatures.  Isn't it wrong?  For
> example, a sequence of <Meem, Shadda, Noon, Fatha> should result
> in two different harakat on Meem and Noon, while an
> implementation (Pango) currently renders it with a Shadda+Fatha
> ligature on Meem, followed by Noon [3].  If following the
> deletion note above, the conforming rendering should be a
> Shadda+Fatha ligature followed by Noon, since the Meem is between
> two components of a ligature and should be deleted!

Yes, this seems strange. It may be this way because Tahoma was built
before all of the subtle details of OT Arabic layout were worked out.

I did a couple of experiments and discovered that Uniscribe renders the
above sequence correctly, and my code does not - it ligates the Shadda
and Fatha. I can think of a few reasons why Uniscribe gets this right -
either they ignore the IgnoreBaseGlyphs flag, they're clever about how
they tag the glyphs, or they process Meem Shadda separately from Noon
Fatha...

I find this sort of thing all the time - Uniscribe gets things right
because they know what the spec. is *supposed* to say - the rest of us
get it wrong because we only know what the spec. *does* say :-)

> Thanks in advance,
> --behdad
> http://behdad.org/

Comment 6 Roozbeh Pournader 2006-07-29 10:55:12 UTC

Apparently pango ligates these harakat even across word boundaries when IgnoreBaseGlyphs is on. This happens on both pango 1.13.4 and 1.8.1.

Comment 7 Roozbeh Pournader 2006-07-29 10:59:45 UTC

Created attachment 69867 [details]
Sample file showing the wide span of IgnoreBaseMarks with pango

Attached a sample file

Comment 8 Behdad Esfahbod 2008-06-30 22:37:12 UTC

Sergey Malkin says on the OpenType list:

"OTLS (Uniscribe) limits positioning by checking whether two marks belong to the same base (or ligature component) and only then applies mark-to-mark rule."

Need to carve that exception in our beautifully generic OpenType engine...

Comment 9 Behdad Esfahbod 2009-08-05 19:42:19 UTC

Ok, apparently what Sergey describes is part of the intended semantics of the GPOS mechanisms.  I merged harfbuzz-ng right now that does this correctly.