GNOME Bugzilla – Bug 595539
Regressions in rendering certain Thai sequences with OpenType font
Last modified: 2009-09-22 01:25:32 UTC
Some certain Thai sequences are rendered incorrectly in Pango master: 1. cons + tone + vowel rendered with excessive NIKHAHIT under the tone mark 2. cons + tone + whitespace/punct rendered with excessive NIKHAHIT under the tone mark 3. cons + lower-vowel + tone rendered with tone mark at high position, rather than lowered down These used to be rendered correctly in Pango 1.24.x. Case 1 and 2 are serious, as they change the meanings of several words. Case 3 is about elegance. Sample text, sample font, and screenshots will be attached soon.
Created attachment 143417 [details] Sample text
Created attachment 143418 [details] Sample font
Created attachment 143419 [details] Screenshot as rendered by pango master
Created attachment 143420 [details] Screenshot as rendered by pango 1.24.5 (expected result)
First anomaly I have found: the excessive NIKHAHIT in case 1 and 2 is caused by wrong GSUB rule triggering. Part of the 'ccmp' rules for handling upper combining characters are (input is in bracket, left to it is backtrack) rule 0: cons + [SARA AM] - SARA AM -> NIKHAHIT + SARA AA rule 1: cons + [tone + SARA AM] - tone -> NIKHAHIT + tone - SARA AM -> SARA AA rule 2: cons + [tone] - tone -> tone.low_variant rule 3: upper-vowel + [upper-dacritic] - upper-diacritic -> upper-diacritic.high_variant Here, rule 1 is wrongly triggered, because of this loop in match_input() [hb-ot-layout-gsubgpos-private.hh]: for (i = 1, j = buffer->in_pos + 1; i < count; i++, j++) { ... } The loop is iterated only count-1 times, and only the first input of rule 1 is checked, and it happily matches no matter what follows it.
Created attachment 143433 [details] [review] first patch fixing wrong GSUB triggering This patch fixes the loop iteration count, with compensation to retain previous loop logic. This only prevents wrong GSUB triggering, and eliminates the excessive NIKHAHIT for case 1 and 2, but it does not totally fixes this bug. The rules for lowing down tone marks are still not applied properly. I'll still have to check more.
The loop may already be correct, and my patch may be wrong. It's said somewhere else in the code that input[] starts with second glyph. So, I wonder where the first glyph is matched.
Comment on attachment 143433 [details] [review] first patch fixing wrong GSUB triggering Obsolete the wrong patch.
I've got another clue: by adding some unclassified glyphs as a new class, the rules work. In a proof-of-concept font, I add some Thai vowels and punctuations, namely PAIYANNOI, SARA A, SARA AA, SARA E, SARA AE, SARA O, MAIMUAN, MAIMALAI, LAKKHANGYAO, and MAIYAMOK, which were previously not explicitly classified, as a new explicit class. And case 1 is gone, but not for case 2, as I haven't added whitespace and double quote to this new class. This shouldn't be done in real life. Just a proof that there must be some funny thing in the matching function, where unclassified glyphs (class 0) are compared as equal to other classes.
Add some version info. Raise importance, as it's a serious regression.
By inserting debug message printouts, I find match_class() sometimes compare against input class 0, which is not expected for this font, where class 0 is never mentioned in any rule. If I inserted this code, it would fix case 1 and 2 with the font: if (class_def.get_class (glyph_id) == 0) return false; That is, completely ignore input class 0 matching. But this is not the correct fix, as it can fail on fonts which actually want to match class 0. How come class 0 is wrongly issued here? Hmm...
Created attachment 143581 [details] [review] Patch fixing case 1 and 2 OK. I think I've found the cause of the wrong matching. In ChainRule::apply(), 'input.array + 1' is passed to chain_context_lookup(), while 'input' itself is already a HeadlessArrayOf<>, that is, without the first element. So, just 'input.array' should be passed instead. (Compare: e.g. ChainContextFormat3::apply().) With this patch applied, case 1 and 2 are gone. But case 3 becomes partially done. That is, instead of no substitution is done at all, only one of the two substitutions in the rule is applied. This leaves the text at intermediate state and becomes serious error, instead of just elegance issue as before. So, there is still another bug left to solve.
Created attachment 143587 [details] [review] Completed patch And here it is: in apply_lookup(): for (unsigned int i = 0; i < count; i++) { ... } 'i' is double-incremented. Once at usual for-loop increment, and another within the loop (either 'i += buffer->in_pos - old_pos;' or 'i++'). So, this patch solves all regression cases as reported in this bug.
I actually think the other invocation of chain_context_lookup() needs to use input+1. So, yeah, my bad, those two should be switched.
Ok, pushed the fixes. Please test.
I have no idea about that other chain_context_lookup() call. But for the regression cases in this bug, it works. Verified.