Bug 569244 – Support for tsa/dza (U+0C58/9) and lu/luu-matras (U+0C62/3,U+0CE2/3) in Telugu & Kannada

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 569244 - Support for tsa/dza (U+0C58/9) and lu/luu-matras (U+0C62/3,U+0CE2/3) in Telugu & Kannada


Summary:	Support for tsa/dza (U+0C58/9) and lu/luu-matras (U+0C62/3,U+0CE2/3) in Telug...


Status:	RESOLVED OBSOLETE

Product:	pango
Classification:	Platform
Component:	indic
Version:	1.22.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Pango Indic
QA Contact:	pango-maint

URL:
Whiteboard:

Depends on:
Blocks:	594101

Reported:	2009-01-26 20:31 UTC by Padmanabhan V. K.
Modified:	2012-08-25 20:48 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch to change character classes and substitute below-base forms (1.97 KB, patch) 2009-01-26 21:04 UTC, Padmanabhan V. K.	none	Details \| Review

Description Padmanabhan V. K. 2009-01-26 20:31:52 UTC

In http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot-class-tables.c the character classes of U+0CBC is set to _xx, which is different from the way nukta is handled for other scripts where it is set to _nu. Was there a conscious decision taken to handle it this way?

Unicode 5.0 introduced the characters lu- & luu-matras in Kannada at U+0CE2 & U+0CE3 -- http://unicode.org/charts/PDF/Unicode-5.0/U50-0C80.pdf. Unicode 5.1 introduced the corresponding Telugu lu- & luu-matras at U+0C62 & U+0C63, and also introduced tsa & dza (alveolar variants of ca & ja) at U+0C58 & U+0C59.

However http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot-class-tables.c sets the character classes for these code points to _xx which needs to be changed.

U+0C62/63/E2/E3's class can be set to _lm like U+0CC3/C4's is now and U+0C43/44's is requested be in bug 565599. Alternatively as they attach below the consonant they could be set to _db. These are probably never used with consonant clusters anyway.

U+0C58/U+0C59 form ligatures with vowel matras and their class should hence at least include _ct. The Pothana2000 font from http://www.kavya-nandanam.com/dload.htm includes select ligatures of U+0C58/ U+0C59 with vowel matras which don't render properly with their class set to _xx.

The font maps below-base ligation of U+0C58/U+0C59 with U+0C4D (halanth) to the same glyphs as below-base ligation of U+0C1A/U+0C1C (ca/ja) respectively so there probably have never been separate below-base representations of tsa/dza in Telugu.

Hence there are a few options on handling the below-base forms of these:
1. Set the character class to only _ct and expect users to use U+0C1A/U+0C1C where below-base forms are needed.
2. Set the character class to _bb and expect the font to have proper below-base ligation rules for U+0C58/U+0C59 U+0C4D.
3. Set the character class to _bb and include special case code in indic_ot_reorder() in http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot.c that substitutes U+0C1A/U+0C1C for U+0C58/U+0C59 whenever their below-base form is required.

Comment 1 Padmanabhan V. K. 2009-01-26 20:32:54 UTC

(In reply to comment #0)
> In http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot-class-tables.c
> the character classes of U+0CBC is set to _xx, which is different from the way
> nukta is handled for other scripts where it is set to _nu. Was there a
> conscious decision taken to handle it this way?

Oops! Ignore this paragraph pasted here in error.

Comment 2 Padmanabhan V. K. 2009-01-26 21:04:46 UTC

Created attachment 127291 [details] [review]
Patch to change character classes and substitute below-base forms

The diffs are taken against the current version of the files available at
http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot.c & http://svn.gnome.org/svn/pango/trunk/modules/indic/indic-ot-class-tables.c

The patch uses _lm for the character classes of the lu/luu-matras and adds code to substitute the below-base forms of tsa/dza to ca/ja respectively. The other options (_db instead of _lm, _ct instead of _bb, not substituting) are straightforward changes to the attached patch.

However I found that even with these changes the rendering works only in the context of other Telugu/Kannada characters. This is found to be due to a change required in my pango/pango-script-table.h to recognise tsa/dza/lu-/luu-matras as belonging to Telugu/Kannada. This table has been superseded by g_unichar_get_script() in http://svn.gnome.org/viewvc/pango?view=revision&revision=2406 so the function might require a change.

Comment 3 Parag AN 2009-10-01 08:09:00 UTC

Thanks Padmanabhan for your bug report. I will include this bug fix in patch to 594101 bug.

Comment 4 Behdad Esfahbod 2012-08-25 20:48:01 UTC

HarfBuzz has been merged, Indic shaper removed.  Marking OBSOLETE.