GNOME Bugzilla – Bug 569244
Support for tsa/dza (U+0C58/9) and lu/luu-matras (U+0C62/3,U+0CE2/3) in Telugu & Kannada
Last modified: 2012-08-25 20:48:01 UTC
In http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot-class-tables.c the character classes of U+0CBC is set to _xx, which is different from the way nukta is handled for other scripts where it is set to _nu. Was there a conscious decision taken to handle it this way? Unicode 5.0 introduced the characters lu- & luu-matras in Kannada at U+0CE2 & U+0CE3 -- http://unicode.org/charts/PDF/Unicode-5.0/U50-0C80.pdf. Unicode 5.1 introduced the corresponding Telugu lu- & luu-matras at U+0C62 & U+0C63, and also introduced tsa & dza (alveolar variants of ca & ja) at U+0C58 & U+0C59. However http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot-class-tables.c sets the character classes for these code points to _xx which needs to be changed. U+0C62/63/E2/E3's class can be set to _lm like U+0CC3/C4's is now and U+0C43/44's is requested be in bug 565599. Alternatively as they attach below the consonant they could be set to _db. These are probably never used with consonant clusters anyway. U+0C58/U+0C59 form ligatures with vowel matras and their class should hence at least include _ct. The Pothana2000 font from http://www.kavya-nandanam.com/dload.htm includes select ligatures of U+0C58/ U+0C59 with vowel matras which don't render properly with their class set to _xx. The font maps below-base ligation of U+0C58/U+0C59 with U+0C4D (halanth) to the same glyphs as below-base ligation of U+0C1A/U+0C1C (ca/ja) respectively so there probably have never been separate below-base representations of tsa/dza in Telugu. Hence there are a few options on handling the below-base forms of these: 1. Set the character class to only _ct and expect users to use U+0C1A/U+0C1C where below-base forms are needed. 2. Set the character class to _bb and expect the font to have proper below-base ligation rules for U+0C58/U+0C59 U+0C4D. 3. Set the character class to _bb and include special case code in indic_ot_reorder() in http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot.c that substitutes U+0C1A/U+0C1C for U+0C58/U+0C59 whenever their below-base form is required.
(In reply to comment #0) > In http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot-class-tables.c > the character classes of U+0CBC is set to _xx, which is different from the way > nukta is handled for other scripts where it is set to _nu. Was there a > conscious decision taken to handle it this way? Oops! Ignore this paragraph pasted here in error.
Created attachment 127291 [details] [review] Patch to change character classes and substitute below-base forms The diffs are taken against the current version of the files available at http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot.c & http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot-class-tables.c The patch uses _lm for the character classes of the lu/luu-matras and adds code to substitute the below-base forms of tsa/dza to ca/ja respectively. The other options (_db instead of _lm, _ct instead of _bb, not substituting) are straightforward changes to the attached patch. However I found that even with these changes the rendering works only in the context of other Telugu/Kannada characters. This is found to be due to a change required in my pango/pango-script-table.h to recognise tsa/dza/lu-/luu-matras as belonging to Telugu/Kannada. This table has been superseded by g_unichar_get_script() in http://svn.gnome.org/viewvc/pango?view=revision&revision=2406 so the function might require a change.
Thanks Padmanabhan for your bug report. I will include this bug fix in patch to 594101 bug.
HarfBuzz has been merged, Indic shaper removed. Marking OBSOLETE.