GNOME Bugzilla – Bug 125605
Pango should support rendering of Khmer script
Last modified: 2005-06-22 15:06:24 UTC
There are people interested in providing GNOME translations in Khmer [km], but reportedly Pango doesn't support the rendering of the Khmer script yet (http://lists.gnome.org/archives/gnome-i18n/2003-October/msg00232.html).
More info on Khmer script: http://en.wikipedia.org/wiki/Cambodian_language http://www.ethnologue.com/show_language.asp?code=KMR
Having free fonts available is a prerequisite to such work. http://www.microsoft.com/typography/otfntdev/khmerot/default.htm Describes the standard for how OpenType Khmer fonts.
There are OpenType fonts available for testing. Part of the project that I am developing includes buying fonts into the public domain, but they are not still available as freeware, all we have is permission from the developer for copying for testing purposes. I talked to Eric at IBM (ICU). They don't know if ICU will support Khmer or when. They are considering rewriting a part of ICU to make it easier to integrate in Pango, and as part of that effort they may include Khmer. But everything is pending from later decisions. Javier SOLA - fjsola@aui.es
I have negotiated a set of Khmer fonts to be put in the public domain, but I dont know: - What is the necessary basic set of fonts.Normal and cursive fonts have already been developed and and are available. Do we also need bold and cursive bold? - What information should be included in the fonts to inform that they are in the public domain?
A public domain OpenType Khmer language font (KhmerOS) is now available. Bold, cursive and cursive-bold variants will be available within a month. This is a very clear schematic font specially designed for small sizes to be displayed in computer screens. I will soon put it up in a website, meanwhile, I will send it to anybody who requests it.
Khmer support has been developed by Lin Chear <linchear@rogers.com> and it is available at http://unicode.khmer.cc/khmerpango/indic-khmer.tar.gz (approximately 16kb) In its current inception, this module will 'break' Indic Pango support and replace it with Khmer (it is a modification of the Indic module and replaces it). It now needs to be integrated in Pango. If integration by the maintainer is not possible at this time, help or indications would be most welcome.
Created attachment 29557 [details] [review] convert the source to a diff against current indic module had a quick look at this. Is the eventual aim to integrate it into the indic module or to create a separate khmer module? I'm attaching a diff of the modified module against the current indic module. Issues I can see (if the khmer parts are going to be part of the indic module) are that (1)the changes to indic-fc.c need cleaned up to merge with recent changes to the indic modules (2)in indic-ot.h the feature flags should not be broken for the indic scripts. Is there a way to do this withing the indic module or is this difference enough to require a separate module?
As I recall, one of the main reasons that "breaking" the current Indic feature bits was needed is that there only used to be 16 total feature bits available. With the switch to PangoOTBuffer I switch from a 16 bit quantity to a unsigned long, so there should portably be 32 bits, so this should no longer be a problem. It would be interesting to see how much remains of the patch when the rcent changes are merged in. If it's a small patch, it seems to make more sense to do it this way then to create an entirely separate patch. One thing is that Lin Chear also did a port of the Qt module as an independent Pango module: http://mail.gnome.org/archives/gtk-i18n-list/2004-January/msg00042.html It would be interesting to know how this compares for completeness, accuracy, etc, with the indic module modification.
Owen. Yes, this is correct, the reason for the independent module were the flags. Two or three more non-indian indic or similar CTL languages are trying to develop Pango support: Lao and Dzongkha (Bhutan, tibetan script) and maybe Myanmar. If a split was considered, maybe these languages should be bundled with Khmer (non-indian indic module), but if Khmer is integrated into Indic, they should also be easy to integrate. Bug 141983 depends on this one. Its shape will depend of this decision. There is another exception to be handled, the letter ROBAT. It was changed in Unicode 4.0. I will file a bug on this after the module decision.
I spent some time reading the Unicode and OpenType Khmer specs, and I think it should be done as a separate module... while there are clearly some strong parallels with the Indic scripts, Khmer also has quite a few distinct features. Trying to force Khmer into the Indic modules will produce a situation where fixes for Khmer will break Indic and fixes for Indic break Khmer. We see that even among the languages that the Indic modules currently support. I don't think there is any signficant advantage to combining Lao, Dzhongkha support with Khmer. I suspect that Myanmar is also distinct enough to really require a separate module. (Lao will be in 1.8 as part of the Thai module... Lao and Thai are very similar. Tibetan will be separate module.) If TrollTech formally agreed to allow the Qt module to be used under the LGPL, then the Qt port could be used, otherwise Khmer support would have to be written from scratch. (Another approach would be to take the Indic module, delete all the unnecessary code, make changes like virama => coeng mark in the functions and docs, and go from that. I don't know if that is easier than starting from scratch.) A Khmer module from scratch looks like roughly a week of work. The first item of the work is to create a set of input test cases with images of expected output.
We reached the same conclusion. Khmer needs a separate module. We have developed a Khmer module for ICU (starting from the ICU indic module), and sent it to Eric for review. It works quite well for us, after extensive testing. http://www.khmeros.info/download/issues/ICU-khmer-layout.zip We are now trying to adapt this module to Pango. We tried to use the old Pango-Khmer module as a shell structure, but it did not integrate into present Pango, so we are working on integration. We expect to finish before the end of the year. Even if Dzongkha and Myanmar have similar characteristics, they are different enough that trying to put them together in a single module would require a lot of extra work. Myanmar has two-code-point subscript vowels like Khmer... but (unlike Khmer) two-code-point split matras, and an almost-indian-but-not group in which NGO+virama+consonant ends up in consonant+diacritic shape (not in Khmer either). It should be easy to work on, starting from a Khmer module, and taking a couple of ideas from the indic module.
Date: Fri, 8 Apr 2005 01:43:06 -0400 From: Jens Herden <jens@khmeros.info> To: gtk-i18n-list@gnome.org Cc: Owen Taylor <otaylor@redhat.com> Subject: Re: khmer modul for Pango Hi List, hi Owen, I found a bug and I also saw that my code layout was still not perfect. So I updated the patch for the Khmer module again. You can get it still here: http://www.khmeros.info/download/pango-khmer-patch.zip Please use this version for the integration into Pango, thanks Jens
I've committed code based on the last version of the patch. Changes are: - Reordered changes to build files to keep modules in alphabetical order - Changed the copyright notices to include: * Partially based on Indic shaper * Copyright (C) 2001, 2002 IBM Corporation * Author: Eric Mader <mader@jtcsv.com> - Fixed various minor code-style problems (mostly excess {} around single line conditionals, and missing spaces in function calls. "foo(" rather than "foo (". No substantive changes. 2005-06-21 Owen Taylor <otaylor@redhat.com> * modules/khmer configure.in modules/Makefile.am modules/makefile.msc: Add a Khmer module by Jens Herden and Javier Sola. (#125605)
Owen: Somehow the buffer initialization code in khmer_engine_shape has been dropped from the patch you applied. Both the patch in comment 12 and other modules have a line like this: buffer = pango_ot_buffer_new (fc_font); As gcc warns, currently buffer is may be used uninitialized, and I can confirm that buffer is never ever assigned in khmer_engine_shape.
2005-06-22 Owen Taylor <otaylor@redhat.com> * modules/khmer/khmer-fc.c (khmer_engine_shape): Add back accidentally dropped line (Pointed out by Behdad Esfahbod)