GNOME Bugzilla – Bug 549818
Sequence 0D16,0D4D,0D30, 0D16,D4D,0D30, etc of Malayalam, comparison with uniscribe,icu,harfbuzz
Last modified: 2012-08-18 17:44:11 UTC
Please describe the problem: An analysis of rendering 0d16, 0d4d, 0d30 with uniscribe, icu, harfbuzz and pango. If we want to render Malayalam the same as uniscribe, then I suspect that reclassifying 0x0d30 as not having a post-base form may be a hack-around, and that there is deeper magic at work. Might be of interest anyway. Steps to reproduce: In the attachment: D16_D4D_D30.ttf is a font which just contains fake glyphs for those unicode points D16_D4D_D30_random.ttf is a font which just contains fake glyphs for those unicode points, and a pstf entry for some random combination of glyphs which we're not going to use. D16_combined_D4D_D30.ttf is a font which has a pstf table with combos D4D + D30 and D30 + D4d combined_D16_D4D_D30.ttf is a font whch has a pstf table with combos D16 + D4D + D30 and D16 + D30 + D4D Attached are screenshots of the string D16,D4D,D30 (D16_D4D_D30.txt) rendered with these fonts using... vanilla icu 4.0 vanilla pango 1.21.5 vanilla uniscribe 1.0420.2600.2180 a) The first interesting thing is that with a font with no gsubs the pure software reordering for pango and uniscribe *appears* to be the same, i.e no reordering at all of 0d16,0d4d,0d30 while the icu and harfbuzz reordering results in glyphs 0d16, 0d30, 0d4d. Reading http://www.microsoft.com/typography/otfntdev/indicot/shaping.aspx "The shaping engine finds the base consonant of the syllable, using the following algorithm: starting from the end of the syllable, move backwards until a consonant is found that does not have a below-base or post-base form (post-base forms have to follow below-base forms), or arrive at the first consonant. The consonant stopped at will be the base." suggests that the base-consonant should be 0xd16, given that according to icu and http://www.microsoft.com/typography/otfntdev/indicot/appen.aspx 0d30 (RA) has a post-base form "If the base consonant is not the last one, Uniscribe moves the halant from the base consonant to the last one. " giving 0d16 0d30 0d4d which is the order that icu gets, and the order that harfbuzz gets. FWIW pango gets different results because 0xd30 has been tweaked to be tagged as a normal consonant, making 0xd30 the base consonant for the algorithm, intead of 0xd4d b) Things get interesting when repeating with D16_D4D_D30_random.ttf. That now shows that uniscribe is ordering the glyphs as 0d16, 0d30, 0d4d, i.e. agreeing with icu and harfbuzz and disageeing with pango. Given that the only difference is the existance of a pstf table, it suggests that uniscribe does agree with the basic re-ordering mechanism of icu/harfbuzz, except that it has a quirk in that it doesn't appear to do it if there is no pstf table in the font. c) Looking at the same text with D16_combined_D4D_D30.ttf then *both* icu and uniscribe select the 0d30+0d4d pstf replacement, suggesting that the sequence sent for gsub processing by uniscribe is actually 0d30,0d4d, matching the order of icu and harfbuzz of that subsequence, and not 0d4d,0d30 as used by pango. With the clear difference between uniscribe and icu in that the replacement glyph is ordered at the beginning of the syllable in uniscribe and to the right in icu. Given b) that seems to suggest that uniscribe may have a magic extra step in moving the output glyph to the start of the sequence if there has been a pstf replacement, and that step takes place *after* gsub processing. d) Looking at the text with combined_D16_D4D_D30.ttf shows the same results in icu and uniscribe of 0d16,0d30,0d4d, with neither entriy in the pstf table used, while pango used the pstf table for the 0d16+0d4d+0d30 combo. Which further re-inforces that uniscribe agrees with icu (and harfbuzz), and not pango, that 0d30 should be '_pb'. Summary: I'm clueless about Malayalam, but if uniscribe compatibility is of interest it looks like icu/pango/harfbuzz needs some sort of additional post-gsub replacement vaguely along the lines of re-ordering the result of a glyph substitution of this special type of sequence to the beginning of the syllable ? And that the pango change to the classification of 0x0d30 moves it further away from uniscribe. Actual results: Expected results: Does this happen every time? Other information:
Created attachment 117587 [details] test-case
I confirm that the combination works perfect with pano.
(In reply to comment #1) > Created an attachment (id=117587) [details] > test-case It took a while for me to figure out the issue from the attached test cases and scrot pics. :) The issue you have raised exists. But I don't think,IMHO, it is good is a nice idea to kowtow the algorithm or even the standard put forward by proprietary software vendors. Different ideas can co-exist. The prime focus should be the proper rendering, consistent with the language in question. Coming to the issue, the icu and harfbuzz use the post-base form of RA(0x0d30) and below-base form of LA(0x0d32) while pango doesn't. Since these forms are classified as 'HAVE_POST(BELOW)_FORMS' by the shaping engine beforehand and the halant(0x0d4d) moving done by it to get the post(below)-base form by applying the respective feature, these forms appear invariably, disregarding the orthography of the script involved. For example there are base consonants that won't take post(below)-base forms(eg. YA,RA,NNA etc.). Therefore, pango gives the most acceptable results in this regard. Now, as per the new opentype specs(v.1.6), the uniscribe uses a new algorithm to dynamically assign classes for consonants. So, the halant-moving exercise for post(below)-base is done away with. The gsub rules and features will finally decide the contextual nature of the characters. By this the post-base form of RA carrying 'pref' feature tag,if found, is moved to the pre-base position after the higher order substitutions are made. I have tried to incorporate the new opentype specs in the recent version of my font, Suruma(http://suruma.freeflux.net/blog/archive/2009/09/22/new-suruma-font.html). Thanks Suresh
QT bug on this issue http://bugreports.qt.nokia.com/browse/QTBUG-1887
*** Bug 679198 has been marked as a duplicate of this bug. ***
We've merged the HarfBuzz branch. Closing obsolete.