After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 619753 - (mongolian) Mongolian shaper
(mongolian)
Mongolian shaper
Status: RESOLVED WONTFIX
Product: pango
Classification: Platform
Component: general
unspecified
Other Linux
: Normal enhancement
: ---
Assigned To: pango-maint
pango-maint
Depends on:
Blocks:
 
 
Reported: 2010-05-26 20:08 UTC by Behdad Esfahbod
Modified: 2011-01-22 20:39 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Initial patch (12.72 KB, patch)
2010-05-26 20:48 UTC, nagi
none Details | Review
Updated code (18.41 KB, patch)
2010-05-26 22:50 UTC, nagi
none Details | Review
MVP (2.96 KB, image/png)
2010-05-27 23:29 UTC, nagi
  Details
SUFFIXES (4.03 KB, image/png)
2010-05-27 23:31 UTC, nagi
  Details
MVS rule (188.24 KB, image/jpeg)
2010-05-28 08:09 UTC, nagi
  Details
NNBSP rule (152.28 KB, image/jpeg)
2010-05-28 08:10 UTC, nagi
  Details

Description Behdad Esfahbod 2010-05-26 20:08:59 UTC
Would be nice, and seems like there's a patch.
Comment 1 nagi 2010-05-26 20:48:51 UTC
Created attachment 162051 [details] [review]
Initial patch 

It is very early version. Will be updated!
Comment 2 nagi 2010-05-26 22:50:05 UTC
Created attachment 162055 [details] [review]
Updated code

new updated code. Correct displayed with the font Mongolian Baiti.
Comment 3 Behdad Esfahbod 2010-05-26 23:47:30 UTC
I like to merge this with our arabic/syriac/n'ko shapers.  Thanks for the code.
Comment 4 Behdad Esfahbod 2010-05-27 00:03:31 UTC
I think I need to check the Unicode standard.  The behavior around NNBSP doesn't look right to me.  Isn't it non-joining?
Comment 5 Behdad Esfahbod 2010-05-27 00:32:33 UTC
Ok, read it.  Unfortunately it does look like NNBSP has a very unusual joining behavior.  I'll see what I can do.
Comment 6 Behdad Esfahbod 2010-05-27 00:49:16 UTC
A few comments:

The code should be updated to just skip over variation_selectors instead of handling them.  I liked the first patch more.

Can you explain the joining rules around NNBSP and the vowel separator?
Comment 7 nagi 2010-05-27 23:27:50 UTC
Both of them look like just a normal space. But they are NOT word boundary.

Vowels "a (U+1820)" and "e (U+1821)" have 2 glyphs at the final position. One with MVP (vowel separator) and one without. (See attachment picture MVP). In the picture, first word has MVP before the last 'a' vowel. 

NNBSP is used for joining suffixes (See attached picture SUFFIXES). I'm trying to make a table which maps joining rule with suffixes and will show it here when it is ready.
Comment 8 nagi 2010-05-27 23:29:38 UTC
Created attachment 162157 [details]
MVP

First word has the Mongolian Vowel Separator before last vowel 'a'.
Comment 9 nagi 2010-05-27 23:31:02 UTC
Created attachment 162158 [details]
SUFFIXES

sample suffixes which use NNBSP.
Comment 10 nagi 2010-05-28 08:08:19 UTC
Found much better link that explains 1000 times better than me :)
http://www.iist.unu.edu/www/docs/techreports/reports/report170a.tgz

And two nice images that show MVS and NNBSP rule.
Comment 11 nagi 2010-05-28 08:09:17 UTC
Created attachment 162174 [details]
MVS rule
Comment 12 nagi 2010-05-28 08:10:25 UTC
Created attachment 162175 [details]
NNBSP rule
Comment 13 Behdad Esfahbod 2010-06-04 05:23:33 UTC
Thanks.  I'll get to read that and implement soon (I hope!).
Comment 14 Murawaki 2011-01-22 02:28:34 UTC
I am new to text rendering but I have a few comments that might help.

Chapter 13.2 "Mongolian" of The Unicode Standard, Version 5.2.0 briefly explains various shaping rules including those for NNBSP.
http://www.unicode.org/versions/Unicode5.2.0/ch13.pdf
http://babelstone.blogspot.com/2006/10/manchu-letter-lha.html

I tried the second patch above and found it worked surprisingly well with the Mongolian Baiti font.  Almost all shaping rules are encoded in the font.  So there appear few things to be done by the Mongolian module.  But I found three problems.

(1) U+200D ZERO WIDTH JOINER (and U+200C ZERO WIDTH NON-JOINER) is not handled in the second patch.  I guess ZWJ is to behave like a 'letter' with regard to the shaping of the preceding and following letters.

(2) MVS should be displayed as a narrow non-breaking whitespace instead of the dummy glyph.  The shaping of the preceding and following letters seem correct.

(3) Any free variation selector not immediately preceded by one of their defined base characters should be ignored but is currently displayed as the dummy glyph.  The specified combinations of based characters and FVSs are correctly interpreted with the substitution rules encoded in the font.
Comment 15 Behdad Esfahbod 2011-01-22 20:39:02 UTC
I already added Mongolian support to harfbuzz.  Pango will switch to harfbuzz soon (hopefully for March release).  No new shapers for pango itself are accepted.

You can test Mongolian support by using the harfbuzz-ng-external branch of pango.