GNOME Bugzilla – Bug 385168
indic, khmer, and tibetan modules don't apply ccmp
Last modified: 2007-05-16 02:27:26 UTC
The probably should. There is a report that the tibetan module doesn't work because of this.
Suppport of ccmp feature is required for Tibetan as there are a number of complex compound vowel characters (U+0F73, U+0F76, U+0F76, U+0F77, U+0F78, U+0F79, and U+0F81) with glyph elements above and below the base stack which need decomposing (GSUB lookup type 2) before other lookups proceed. It is also useful to pre-compose (GSUB lookup type 4) other compound characters if they are entered by their elements since this greatly simplifies other lookups down the line under blws and abvs features. Typical ccmp lookups for a Tibetan font: feature ccmp { # Glyph Composition/Decomposition script tibt; # Tibetan lookup decompose { sub uni0F73 by uni0F71 uni0F72; sub uni0F76 by uni0FB2 uni0F80; sub uni0F77 by uni0FB2 uni0F71 uni0F80; sub uni0F78 by uni0FB3 uni0F80; sub uni0F79 by uni0FB3 uni0F71 uni0F80; sub uni0F81 by uni0F71 uni0F80; } decompose; lookup compose { sub uni0F40 uni0FB4 by uni0F69; sub uni0F42 uni0FB7 by uni0F43; sub uni0F4C uni0FB7 by uni0F4D; sub uni0F51 uni0FB7 by uni0F52; sub uni0F58 uni0FB7 by uni0F59; sub uni0F7A uni0F7A by uni0F7B; sub uni0F7C uni0F7C by uni0F7D; sub uni0F90 uni0FB4 by uni0FB9; sub uni0F92 uni0FB7 by uni0F93; sub uni0F9C uni0FB7 by uni0F9D; sub uni0FA1 uni0FB7 by uni0FA2; sub uni0FA8 uni0FB7 by uni0FA9; } compose; } ccmp; Without support for ccmp many Tibetan combinations will not be rendered properly. Note that in decomposition a single glyph may need to be replaced by as many as three glyphs. - Chris
Created attachment 78510 [details] Illustrates requirement for CCMP processing for Tibetan This document explains & illustrates the use of CCMP feature in OT Tibetan fonts.
*** Bug 356006 has been marked as a duplicate of this bug. ***
Whenever someone gets round to this - I don't know what "pref", "blwf", "abvf" and "pstf" are doing in Tibetan module. These features are *not* needed for Tibetan which only uses: "ccmp", "blws", "abvs", "calt", "blwm", "abvm" and "kern".
For historical purposes, Khmer Unicode also need the ccmp feature to be fully supported by Pango. I'm trying to set up a team in Cambodia to create a Khmer translated Gnome OS but hard to convince as KDE Unicode engine offers full Khmer Unicode support. Would be great to have an ETA on the implementation of ccmp (i.e. pango 1.8? :o) )
I'll try to get this in 1.16.2.
According to the OpenType Specification lookups under ccmp feature should take precedence over lookups under any other feature - therefore lack of support for ccmp is a fairly major bug in Pango since it can affect proper processeing of all subsequent features. Latin script: In Latin the ccmp feature is used e.g. to form the dotless i (used when the 'i' is followed by an above base diacritic mark. see: <http://www.microsoft.com/typography/otfntdev/standot/features.aspx> Arabic script: In Arabic script ccmp feature may be used e.g. to decompose the individual elements in the glyphs for characters such as U+0623 see: <http://www.microsoft.com/typography/otfntdev/arabicot/features.aspx> Hebrew script: In Hebrew script ccmp may be used to a compose number of glyphs into one glyph (GSUB lookup type 4) e.g. uni05F2 + uni05B7 -> uniFB1F or decompose one glyph into a number of glyphs. see: <http://www.microsoft.com/typography/otfntdev/hebrewot/features.aspx> In Hangul script the ccmp feature is used in the composition of Old Hangul Jamos see: <http://www.microsoft.com/typography/otfntdev/hangulot/features.htm> In Lao script ccmp is used to decompose characters like U+0EB3 to its component parts (U+0ECD + U+0EB2) for individual positioning see: <http://www.microsoft.com/typography/otfntdev/laoot/features.htm> Similarly in Thai script ccmp is used to decompose characters like U+0E33 to component parts (U+0E4D U+0E32) for individual positioning and also for alterng a base glyph when it is followed by a combining mark see: <http://www.microsoft.com/typography/otfntdev/thaiot/features.htm> ccmp may also be useful for decomposing (GSUB lookup type 2) any of the following characters so that their individual glyph elements can be placed seperatly according to the dimensions of the different base glyphs with which they can combine. Syriac: U+0734 - combining glyph elements above & below base glyph see: <http://www.microsoft.com/typography/otfntdev/syriacot/features.aspx> Bengali: U+09CB - combining glyph elements before & after base glyph U+09CD - combining glyph elements before & after base glyph Tamil: U+0BCA - combining glyph elements before & after base glyph U+0BCB - combining glyph elements before & after base glyph U+0BCB - combining glyph elements before & after base glyph Telugu: U+0C48 - combining glyph elements above & below base glyph Malayalam: U+0D4A - combining glyph elements before & after base glyph U+0D4B - combining glyph elements before & after base glyph U+0D4C - combining glyph elements before & after base glyph Sinhala: U+0DDC - combining glyph elements before & after base glyph U+0DDD - combining glyph elements before & after base glyph U+0DDE - combining glyph elements before & after base glyph Tibetan: U+0F73 - combining glyph elements above & below base glyph U+0F76 - combining glyph elements above & below base glyph U+0F77 - combining glyph elements above & below base glyph U+0F78 - combining glyph elements above & below base glyph U+0F79 - combining glyph elements above & below base glyph U+0F81 - combining glyph elements above & below base glyph Khmer: U+17BE - combining glyph elements before & after base glyph U+17BF - combining glyph elements before & after base glyph U+17C0 - combining glyph elements before & after base glyph U+17C4 - combining glyph elements before & after base glyph U+17C5 - combining glyph elements before & after base glyph Balinese: U+1B3B - combining glyph elements above & after base glyph U+1B3C - combining glyph elements above & below base glyph U+1B3D - combining glyph elements above, below and after base glyph U+1B40 - combining glyph elements before & after base glyph U+1B41 - combining glyph elements before & after base glyph U+1B43 - combining glyph elements above & after base glyph e.g Microsoft's Tibetan script font "Microsoft Himalaya" uses the ccmp feature do decompose glyphs for U+0F43, U+0F4D, U+0F52, U+0F57, U+0F5C, U+0F69, U+0F73, U+0F76, U+0F77, U+0F78, U+0F79, U+0F81, U+0F93, U+0F9D, U+0FA2, U+0FA7, U+0FAC and U+0FB9 to their component parts so that they may be positioned seperatly and/or to simplify subsequent lookups. ccmp is also used in that font to compose licatures of a nember of vowel combinations. Of course in individual fonts it may be possible for a font developer to workaround the lack of support for ccmp; but, IMO, the burden should not be placed on font developers to provide workarounds for the lack of support for a particular feature in individual OT shaping engines. Even if such a work around were provided in fonts it would then force users to use fonts tied to specific OT layout engines.
Can you cook a patch?
2007-05-15 Behdad Esfahbod <behdad@gnome.org> Bug 385168 – indic, khmer, and tibetan modules don't apply ccmp Bug 385477 – kern feature is not supported in OpenType layout for Tibetan. * modules/khmer/khmer-fc.c (khmer_engine_shape): * modules/tibetan/tibetan-fc.c (tibetan_engine_shape): Port to new OpenType APIs. Add standard features (ccmp, locl, calt, kern, mark, mkmk). 2007-05-15 Behdad Esfahbod <behdad@gnome.org> * modules/indic/indic-fc.c: Add ccmp, locl, calt; kern, mark, and mkmk features. Please test.
Yeah, rock on Behdad! Thanks for time spent into coding this feature, wish I could have had knowledge to do it myself. I'll give it an extended try as soon as 0.17.1 goes out and manage to compile it on my ubuntu box.