GNOME Bugzilla – Bug 121882
decomposed vowels (AU, O, OO) in Tamil, Malayalam,Oriya, Bengali are not rendered correctly.
Last modified: 2004-12-22 21:47:04 UTC
Vowels AU, O, and OO have two parts (left and right) and can be represented either composed or decomposed in Unicode. Pango renders well syallables with two-part vowel signs represented in a single Unicode character(composed) but it doesn't render decomposed two-part vowel signs as well. For instance, in Malayalam, the sequence A is rendered differently from the sequence B. A : JA + OO : <U+0D1C, U+0D4B> B : JA + OO : <U+0D1C, U+0D47, U+0D3E> The sequence B is rendered as if it were <U+0D1C, U+0D47>(JA + EE) followed by <U+0020, U+0D3E>
Created attachment 19838 [details] Malaylalam example (with ThoolikaUnicode font)
Stragne is that CA+AU and JA+OO are regarded as a single syllable whether AU and OO are represented composed or decomposed.
The image is an example of the way it currently looks, right? Can you attach an example of the way it should look?
The correct rendering is included in the screenshot. Dependent vowel signs, OO, U, and AU (with two parts, left and right parts) should be rendered identically whether they're represented with a sequence of two Unicode characters(decomposed) or with a single Unicode character (composed) because both representations are canonically equivalent to each other. I've tried the same example on Win2k with the same font (with Mozilla) and it worked fine there. So, it's surely not a font issue.
Pango doesn't make any attempt to normalize before feeding text to the shape engines, so the fact that the sequences are canonically equivalent is only marginally relevant. It's probably just something that needs to be handled within the indic code.
hmmm... ... > It's probably just something that needs to be handled within > the indic code. That's exactly the point of this bug if you thought otherwise.
Adding a patch that fixes this - it has two parts: - First change the syllable-state table to allow a dependent vowel to be followed by another dependent vowel. - Second, change the reordering code to allow several dependent vowel signs to combine to give The main problem with the new code is overpermissiveness: - It allows mixing improper combinations of left and right matras. - It allows the left and right matras to be reversed - It allows sequences of three or more dependent vowels (though sequences of three are needed for for Kannada?) Fixing that would require significantly more complexity in the state table.
Created attachment 19944 [details] [review] Two-part vowels patch
Created attachment 30092 [details] [review] Simpler version Last patch was larger than it needed to be since the tags and cluster index are the same for all matras in a syllable.
Fri Jul 30 14:05:25 2004 Owen Taylor <otaylor@redhat.com> Improve handling of decomposed two-part vowels (#121882, Jungshik Shin) * modules/indic/indic-ot-class-tables.c (stateTable): allow a dependent vowel to be followed by another dependent vowel. * modules/indic/indic-ot.c (indic_ot_reorder): Handle multiple vowel matras. Filed as ICU bug: http://www.jtcsv.com/cgibin/icu-bugs?findid=4026