Bug 385168 – indic, khmer, and tibetan modules don't apply ccmp

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 385168 - indic, khmer, and tibetan modules don't apply ccmp


Summary:	indic, khmer, and tibetan modules don't apply ccmp


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	High normal
Target Milestone:	---
Assigned To:	Behdad Esfahbod
QA Contact:	pango-maint

URL:
Whiteboard:

Duplicates:	356006 (view as bug list)
Depends on:
Blocks:

Reported:	2006-12-12 19:35 UTC by Behdad Esfahbod
Modified:	2007-05-16 02:27 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Illustrates requirement for CCMP processing for Tibetan (164.62 KB, application/pdf) 2006-12-17 13:58 UTC, Christopher Fynn	Details

Description Behdad Esfahbod 2006-12-12 19:35:14 UTC

The probably should.  There is a report that the tibetan module doesn't work because of this.

Comment 1 Christopher Fynn 2006-12-13 14:52:48 UTC

Suppport of ccmp feature is required for Tibetan as there are a number of complex compound vowel characters (U+0F73, U+0F76, U+0F76, U+0F77, U+0F78, U+0F79, and U+0F81) with glyph elements above and below the base stack which need decomposing (GSUB lookup type 2) before other lookups proceed. It is also useful to pre-compose (GSUB lookup type 4) other compound characters if they are entered by their elements since this greatly simplifies other lookups down the line under blws and abvs features.

Typical ccmp lookups for a Tibetan font:

feature ccmp { # Glyph Composition/Decomposition
script tibt; # Tibetan
lookup decompose {
sub uni0F73 by uni0F71 uni0F72;
sub uni0F76 by uni0FB2 uni0F80;
sub uni0F77 by uni0FB2 uni0F71 uni0F80;
sub uni0F78 by uni0FB3 uni0F80;
sub uni0F79 by uni0FB3 uni0F71 uni0F80;
sub uni0F81 by uni0F71 uni0F80;
} decompose;
lookup compose {
sub uni0F40 uni0FB4 by uni0F69;
sub uni0F42 uni0FB7 by uni0F43;
sub uni0F4C uni0FB7 by uni0F4D;
sub uni0F51 uni0FB7 by uni0F52;
sub uni0F58 uni0FB7 by uni0F59;
sub uni0F7A uni0F7A by uni0F7B;
sub uni0F7C uni0F7C by uni0F7D;
sub uni0F90 uni0FB4 by uni0FB9;
sub uni0F92 uni0FB7 by uni0F93;
sub uni0F9C uni0FB7 by uni0F9D;
sub uni0FA1 uni0FB7 by uni0FA2;
sub uni0FA8 uni0FB7 by uni0FA9;
} compose;
} ccmp;    

Without support for ccmp many Tibetan combinations will not be rendered properly. Note that in decomposition a single glyph may need to be replaced by as many as three glyphs.

- Chris

Comment 2 Christopher Fynn 2006-12-17 13:58:01 UTC

Created attachment 78510 [details]
Illustrates requirement for CCMP processing for Tibetan

This document explains & illustrates the use of CCMP feature in OT Tibetan fonts.

Comment 3 Behdad Esfahbod 2007-01-22 02:50:15 UTC

*** Bug 356006 has been marked as a duplicate of this bug. ***

Comment 4 Christopher Fynn 2007-03-04 20:11:45 UTC

Whenever someone gets round to this - I don't know what "pref", "blwf", "abvf" and "pstf" are doing in Tibetan module. These features are *not* needed for Tibetan which only uses: "ccmp", "blws", "abvs", "calt", "blwm", "abvm" and "kern".

Comment 5 Mathieu Pellerin 2007-03-14 02:40:46 UTC

For historical purposes, Khmer Unicode also need the ccmp feature to be fully supported by Pango. 

I'm trying to set up a team in Cambodia to create a Khmer translated Gnome OS but hard to convince as KDE Unicode engine offers full Khmer Unicode support.

Would be great to have an ETA on the implementation of ccmp (i.e. pango 1.8? :o) )

Comment 6 Behdad Esfahbod 2007-03-14 14:52:17 UTC

I'll try to get this in 1.16.2.

Comment 7 Christopher Fynn 2007-03-14 16:37:38 UTC

According to the OpenType Specification lookups under ccmp feature should take precedence over lookups under any other feature - therefore lack of support for ccmp is a fairly major bug in Pango since it can affect proper processeing of all subsequent features. 

Latin script:
In Latin the ccmp feature is used e.g. to form the dotless i (used when the 'i' is followed by an above base diacritic mark.
see: <http://www.microsoft.com/typography/otfntdev/standot/features.aspx>

Arabic script:
In Arabic script ccmp feature may be used e.g. to decompose the individual elements in the glyphs for characters such as U+0623
see: <http://www.microsoft.com/typography/otfntdev/arabicot/features.aspx>

Hebrew script:
In Hebrew script ccmp may be used to a compose number of glyphs into one glyph (GSUB lookup type 4) e.g. uni05F2 + uni05B7 -> uniFB1F
or decompose one glyph into a number of glyphs.
see: <http://www.microsoft.com/typography/otfntdev/hebrewot/features.aspx>

In Hangul script the ccmp feature is used in the composition of Old Hangul Jamos see: <http://www.microsoft.com/typography/otfntdev/hangulot/features.htm>

In Lao script ccmp is used to decompose characters like U+0EB3 to its component parts (U+0ECD + U+0EB2) for individual positioning
see: <http://www.microsoft.com/typography/otfntdev/laoot/features.htm> 

Similarly in Thai script ccmp is used to decompose characters like U+0E33  
to component parts (U+0E4D U+0E32) for individual positioning and also for alterng a base glyph when it is followed by a combining mark see: <http://www.microsoft.com/typography/otfntdev/thaiot/features.htm> 
 
ccmp may also be useful for decomposing (GSUB lookup type 2) any of the following characters so that their individual glyph elements can be placed seperatly according to the dimensions of the different base glyphs with which they can combine.

Syriac:
U+0734 - combining glyph elements above & below base glyph
see: <http://www.microsoft.com/typography/otfntdev/syriacot/features.aspx>

Bengali:
U+09CB - combining glyph elements before & after base glyph
U+09CD - combining glyph elements before & after base glyph

Tamil:
U+0BCA - combining glyph elements before & after base glyph
U+0BCB - combining glyph elements before & after base glyph
U+0BCB - combining glyph elements before & after base glyph

Telugu:
U+0C48 - combining glyph elements above & below base glyph

Malayalam:
U+0D4A - combining glyph elements before & after base glyph
U+0D4B - combining glyph elements before & after base glyph
U+0D4C - combining glyph elements before & after base glyph

Sinhala:  
U+0DDC - combining glyph elements before & after base glyph
U+0DDD - combining glyph elements before & after base glyph
U+0DDE - combining glyph elements before & after base glyph

Tibetan:
U+0F73 - combining glyph elements above & below base glyph
U+0F76 - combining glyph elements above & below base glyph
U+0F77 - combining glyph elements above & below base glyph
U+0F78 - combining glyph elements above & below base glyph
U+0F79 - combining glyph elements above & below base glyph
U+0F81 - combining glyph elements above & below base glyph

Khmer:
U+17BE - combining glyph elements before & after base glyph
U+17BF - combining glyph elements before & after base glyph
U+17C0 - combining glyph elements before & after base glyph
U+17C4 - combining glyph elements before & after base glyph
U+17C5 - combining glyph elements before & after base glyph

Balinese:
U+1B3B - combining glyph elements above & after base glyph
U+1B3C - combining glyph elements above & below base glyph
U+1B3D - combining glyph elements above, below and after base glyph
U+1B40 - combining glyph elements before & after base glyph
U+1B41 - combining glyph elements before & after base glyph
U+1B43 - combining glyph elements above & after base glyph

e.g Microsoft's Tibetan script font "Microsoft Himalaya" uses 
the ccmp feature do decompose glyphs for U+0F43, U+0F4D, U+0F52, U+0F57, U+0F5C, U+0F69,  U+0F73, U+0F76, U+0F77, U+0F78, U+0F79, U+0F81, U+0F93, U+0F9D, U+0FA2, U+0FA7, U+0FAC and U+0FB9 to their component parts so that they may be positioned seperatly and/or to simplify subsequent lookups. ccmp is also used in that font to compose licatures of a nember of vowel combinations.

Of course in individual fonts it may be possible for a font developer to workaround the lack of support for ccmp; but, IMO, the burden should not be placed on font developers to provide workarounds for the lack of support for a particular feature in individual OT shaping engines. Even if such a work around were provided in fonts it would then force users to use fonts tied to specific OT layout engines.

Comment 8 Behdad Esfahbod 2007-05-03 01:46:02 UTC

Can you cook a patch?

Comment 9 Behdad Esfahbod 2007-05-16 01:43:47 UTC

2007-05-15  Behdad Esfahbod  <behdad@gnome.org>

        Bug 385168 – indic, khmer, and tibetan modules don't apply ccmp
        Bug 385477 – kern feature is not supported in OpenType layout for
        Tibetan.

        * modules/khmer/khmer-fc.c (khmer_engine_shape):
        * modules/tibetan/tibetan-fc.c (tibetan_engine_shape):
        Port to new OpenType APIs.  Add standard features (ccmp,
        locl, calt, kern, mark, mkmk).

2007-05-15  Behdad Esfahbod  <behdad@gnome.org>

        * modules/indic/indic-fc.c:
        Add ccmp, locl, calt; kern, mark, and mkmk features.



Please test.

Comment 10 Mathieu Pellerin 2007-05-16 02:27:26 UTC

Yeah, rock on Behdad! Thanks for time spent into coding this feature, wish I could have had knowledge to do it myself.

I'll give it an extended try as soon as 0.17.1 goes out and manage to compile it on my ubuntu box.