GNOME Bugzilla – Bug 619753
Mongolian shaper
Last modified: 2011-01-22 20:39:02 UTC
Would be nice, and seems like there's a patch.
Created attachment 162051 [details] [review] Initial patch It is very early version. Will be updated!
Created attachment 162055 [details] [review] Updated code new updated code. Correct displayed with the font Mongolian Baiti.
I like to merge this with our arabic/syriac/n'ko shapers. Thanks for the code.
I think I need to check the Unicode standard. The behavior around NNBSP doesn't look right to me. Isn't it non-joining?
Ok, read it. Unfortunately it does look like NNBSP has a very unusual joining behavior. I'll see what I can do.
A few comments: The code should be updated to just skip over variation_selectors instead of handling them. I liked the first patch more. Can you explain the joining rules around NNBSP and the vowel separator?
Both of them look like just a normal space. But they are NOT word boundary. Vowels "a (U+1820)" and "e (U+1821)" have 2 glyphs at the final position. One with MVP (vowel separator) and one without. (See attachment picture MVP). In the picture, first word has MVP before the last 'a' vowel. NNBSP is used for joining suffixes (See attached picture SUFFIXES). I'm trying to make a table which maps joining rule with suffixes and will show it here when it is ready.
Created attachment 162157 [details] MVP First word has the Mongolian Vowel Separator before last vowel 'a'.
Created attachment 162158 [details] SUFFIXES sample suffixes which use NNBSP.
Found much better link that explains 1000 times better than me :) http://www.iist.unu.edu/www/docs/techreports/reports/report170a.tgz And two nice images that show MVS and NNBSP rule.
Created attachment 162174 [details] MVS rule
Created attachment 162175 [details] NNBSP rule
Thanks. I'll get to read that and implement soon (I hope!).
I am new to text rendering but I have a few comments that might help. Chapter 13.2 "Mongolian" of The Unicode Standard, Version 5.2.0 briefly explains various shaping rules including those for NNBSP. http://www.unicode.org/versions/Unicode5.2.0/ch13.pdf http://babelstone.blogspot.com/2006/10/manchu-letter-lha.html I tried the second patch above and found it worked surprisingly well with the Mongolian Baiti font. Almost all shaping rules are encoded in the font. So there appear few things to be done by the Mongolian module. But I found three problems. (1) U+200D ZERO WIDTH JOINER (and U+200C ZERO WIDTH NON-JOINER) is not handled in the second patch. I guess ZWJ is to behave like a 'letter' with regard to the shaping of the preceding and following letters. (2) MVS should be displayed as a narrow non-breaking whitespace instead of the dummy glyph. The shaping of the preceding and following letters seem correct. (3) Any free variation selector not immediately preceded by one of their defined base characters should be ignored but is currently displayed as the dummy glyph. The specified combinations of based characters and FVSs are correctly interpreted with the substitution rules encoded in the font.
I already added Mongolian support to harfbuzz. Pango will switch to harfbuzz soon (hopefully for March release). No new shapers for pango itself are accepted. You can test Mongolian support by using the harfbuzz-ng-external branch of pango.