Bug 595539 – Regressions in rendering certain Thai sequences with OpenType font

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 595539 - Regressions in rendering certain Thai sequences with OpenType font


Summary:	Regressions in rendering certain Thai sequences with OpenType font


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	general
Version:	1.25.x
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	---
Assigned To:	pango-maint
QA Contact:	pango-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-09-18 05:30 UTC by Theppitak Karoonboonyanan
Modified:	2009-09-22 01:25 UTC

See Also:
GNOME target:	2.28.x
GNOME version:	2.27/2.28

Attachments
Sample text (466 bytes, text/plain) 2009-09-18 05:31 UTC, Theppitak Karoonboonyanan		Details
Sample font (72.00 KB, application/x-font-ttf) 2009-09-18 05:32 UTC, Theppitak Karoonboonyanan		Details
Screenshot as rendered by pango master (44.53 KB, image/png) 2009-09-18 05:34 UTC, Theppitak Karoonboonyanan		Details
Screenshot as rendered by pango 1.24.5 (expected result) (44.52 KB, image/png) 2009-09-18 05:35 UTC, Theppitak Karoonboonyanan		Details
first patch fixing wrong GSUB triggering (916 bytes, patch) 2009-09-18 11:20 UTC, Theppitak Karoonboonyanan	none	Details \| Review
Patch fixing case 1 and 2 (616 bytes, patch) 2009-09-21 10:50 UTC, Theppitak Karoonboonyanan	none	Details \| Review
Completed patch (1009 bytes, patch) 2009-09-21 11:54 UTC, Theppitak Karoonboonyanan	none	Details \| Review

Description Theppitak Karoonboonyanan 2009-09-18 05:30:24 UTC

Some certain Thai sequences are rendered incorrectly in Pango master:

1. cons + tone + vowel
   rendered with excessive NIKHAHIT under the tone mark

2. cons + tone + whitespace/punct
   rendered with excessive NIKHAHIT under the tone mark

3. cons + lower-vowel + tone
   rendered with tone mark at high position, rather than lowered down

These used to be rendered correctly in Pango 1.24.x. Case 1 and 2 are serious, as they change the meanings of several words. Case 3 is about elegance.

Sample text, sample font, and screenshots will be attached soon.

Comment 1 Theppitak Karoonboonyanan 2009-09-18 05:31:17 UTC

Created attachment 143417 [details]
Sample text

Comment 2 Theppitak Karoonboonyanan 2009-09-18 05:32:23 UTC

Created attachment 143418 [details]
Sample font

Comment 3 Theppitak Karoonboonyanan 2009-09-18 05:34:59 UTC

Created attachment 143419 [details]
Screenshot as rendered by pango master

Comment 4 Theppitak Karoonboonyanan 2009-09-18 05:35:57 UTC

Created attachment 143420 [details]
Screenshot as rendered by pango 1.24.5 (expected result)

Comment 5 Theppitak Karoonboonyanan 2009-09-18 11:13:22 UTC

First anomaly I have found: the excessive NIKHAHIT in case 1 and 2 is caused by wrong GSUB rule triggering.

Part of the 'ccmp' rules for handling upper combining characters are (input is in bracket, left to it is backtrack)

rule 0: cons + [SARA AM]
  - SARA AM -> NIKHAHIT + SARA AA
rule 1: cons + [tone + SARA AM]
  - tone -> NIKHAHIT + tone
  - SARA AM -> SARA AA
rule 2: cons + [tone]
  - tone -> tone.low_variant
rule 3: upper-vowel + [upper-dacritic]
  - upper-diacritic -> upper-diacritic.high_variant

Here, rule 1 is wrongly triggered, because of this loop in match_input() [hb-ot-layout-gsubgpos-private.hh]:

  for (i = 1, j = buffer->in_pos + 1; i < count; i++, j++)
  {
     ...
  }

The loop is iterated only count-1 times, and only the first input of rule 1 is checked, and it happily matches no matter what follows it.

Comment 6 Theppitak Karoonboonyanan 2009-09-18 11:20:27 UTC

Created attachment 143433 [details] [review]
first patch fixing wrong GSUB triggering

This patch fixes the loop iteration count, with compensation to retain previous loop logic.

This only prevents wrong GSUB triggering, and eliminates the excessive NIKHAHIT for case 1 and 2, but it does not totally fixes this bug. The rules for lowing down tone marks are still not applied properly. I'll still have to check more.

Comment 7 Theppitak Karoonboonyanan 2009-09-19 13:18:50 UTC

The loop may already be correct, and my patch may be wrong. It's said somewhere else in the code that input[] starts with second glyph. So, I wonder where the first glyph is matched.

Comment 8 Theppitak Karoonboonyanan 2009-09-19 15:08:52 UTC

Comment on attachment 143433 [details] [review]
first patch fixing wrong GSUB triggering

Obsolete the wrong patch.

Comment 9 Theppitak Karoonboonyanan 2009-09-19 16:11:52 UTC

I've got another clue: by adding some unclassified glyphs as a new class, the rules work.

In a proof-of-concept font, I add some Thai vowels and punctuations, namely PAIYANNOI, SARA A, SARA AA, SARA E, SARA AE, SARA O, MAIMUAN, MAIMALAI, LAKKHANGYAO, and MAIYAMOK, which were previously not explicitly classified, as a new explicit class. And case 1 is gone, but not for case 2, as I haven't added whitespace and double quote to this new class.

This shouldn't be done in real life. Just a proof that there must be some funny thing in the matching function, where unclassified glyphs (class 0) are compared as equal to other classes.

Comment 10 Theppitak Karoonboonyanan 2009-09-20 04:15:37 UTC

Add some version info. Raise importance, as it's a serious regression.

Comment 11 Theppitak Karoonboonyanan 2009-09-20 08:59:41 UTC

By inserting debug message printouts, I find match_class() sometimes compare against input class 0, which is not expected for this font, where class 0 is never mentioned in any rule.

If I inserted this code, it would fix case 1 and 2 with the font:

  if (class_def.get_class (glyph_id) == 0)
    return false;

That is, completely ignore input class 0 matching. But this is not the correct fix, as it can fail on fonts which actually want to match class 0.

How come class 0 is wrongly issued here? Hmm...

Comment 12 Theppitak Karoonboonyanan 2009-09-21 10:50:57 UTC

Created attachment 143581 [details] [review]
Patch fixing case 1 and 2

OK. I think I've found the cause of the wrong matching.

In ChainRule::apply(), 'input.array + 1' is passed to chain_context_lookup(), while 'input' itself is already a HeadlessArrayOf<>, that is, without the first element. So, just 'input.array' should be passed instead. (Compare: e.g. ChainContextFormat3::apply().)

With this patch applied, case 1 and 2 are gone. But case 3 becomes partially done. That is, instead of no substitution is done at all, only one of the two substitutions in the rule is applied. This leaves the text at intermediate state and becomes serious error, instead of just elegance issue as before.

So, there is still another bug left to solve.

Comment 13 Theppitak Karoonboonyanan 2009-09-21 11:54:28 UTC

Created attachment 143587 [details] [review]
Completed patch

And here it is: in apply_lookup():

  for (unsigned int i = 0; i < count; i++)
  {
    ...
  }

'i' is double-incremented. Once at usual for-loop increment, and another within the loop (either 'i += buffer->in_pos - old_pos;' or 'i++').

So, this patch solves all regression cases as reported in this bug.

Comment 14 Behdad Esfahbod 2009-09-21 17:41:58 UTC

I actually think the other invocation of chain_context_lookup() needs to use input+1.  So, yeah, my bad, those two should be switched.

Comment 15 Behdad Esfahbod 2009-09-21 17:47:09 UTC

Ok, pushed the fixes.  Please test.

Comment 16 Theppitak Karoonboonyanan 2009-09-22 01:25:32 UTC

I have no idea about that other chain_context_lookup() call. But for the regression cases in this bug, it works. Verified.