After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 701652 - wrong shaped classic Mongolian word suffixes
wrong shaped classic Mongolian word suffixes
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: pango-maint
pango-maint
Depends on:
Blocks:
 
 
Reported: 2013-06-05 14:59 UTC by Erdene-Ochir Tuguldur
Modified: 2017-07-12 02:33 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch for Ubuntu 13.04 (782 bytes, patch)
2013-06-05 14:59 UTC, Erdene-Ochir Tuguldur
none Details | Review
Proposed patch (1.36 KB, patch)
2013-06-06 23:12 UTC, Behdad Esfahbod
none Details | Review

Description Erdene-Ochir Tuguldur 2013-06-05 14:59:25 UTC
Created attachment 246085 [details] [review]
patch for Ubuntu 13.04

In classic Mongolian, the Unicode character U+202F (narrow no break space) is used to separate word and word suffixes. To shape the classic Mongolian correctly, the word and the word suffix should be shaped as one entire entity so that the word suffixes should start with either MEDI or FINA opentype features.

Currently, pango shapes the word and the word suffix separately, so that the suffix starts with INIT feature. That is wrong.

I have attached a simple patch that solves the problem for Ubuntu.

Maybe this problem with U+202F is a general problem in Unicode. The chrome got also same problem: https://code.google.com/p/chromium/issues/detail?id=155334

Tuguldur
Comment 1 Behdad Esfahbod 2013-06-05 19:18:07 UTC
That doesn't make any sense.  The whole point of putting NNBSP is to break shaping.  Unless you can point me to Unicode or historical documents showing the inverse to be true, this is a WONTFIX.
Comment 2 Behdad Esfahbod 2013-06-05 22:39:25 UTC
Ok, I understand what you mean now.  The fonts have GSUB rules to choose the medial/final form, we just have to make sure NNBSP is chosen from the same font as the surrounding text.  Your patch is fairly correct.  We used to do that for all space characters but years ago I removed them to fix another bug.  I'll see if I can reconcile the two approaches.
Comment 3 Behdad Esfahbod 2013-06-05 23:12:05 UTC
FWIW this is what the Unicode Standard has to say about NNBSP in Mongolian shaping:

NNBSP affects the form of the preceding and following letters. The final letter of the stem
or suffix preceding the NNBSP takes the final positional form, whereas the first letter of the
suffix following NNBSP may take the normal initial form, a variant initial form, a medial
form, or a final form, depending on the particular suffix.
Comment 4 Badaa 2013-06-06 14:13:12 UTC
Hi,
Initial meaning of NNBSP and MVS is lost in many sources. The logic of the NNBSP is to present mongolian suffixes correctly. MVS handles also similar logic.
Every mongolian word started by a initial form of letter and ended by final form of a letter. In the middle should every letter take medial form excepting following 2 conditions.
Condition 1:
There exist two type of suffixes. 
A) Connected with small space like STEM SUFFIX. (all noun suffix and some very suffix)
B) Connected directly to the word like STEMSUFFIX. (some verb suffixes)
Problem: There is no problem for type B but for type A we should write like S[init]TEM[final] S[medial]UFFIX. To solve this problem (also M[final] S[medial]) we introduced NNBSP. That's why this character entered in Unicode.
Condition 2:
Every final form of vowel A and E has 2 forms.
A) After some mongolian letters like "y, n, g, h, l, u" should be written by isolated form namely ORHITS with prefix with small space like WORD[final] A[isol]. Thus, the pre letter of the prefix space should take it's final form.
B) In all other cases should by written by SUUL. WORDA, WORDE
Problem: There is no problem for case B but for case A we have problem to illustrate final form in the middle of the word. To solve this problem we introduced MVS.
There is also another example on the bug, which submitted by me to ICU. http://bugs.icu-project.org/trac/ticket/10212
Comment 5 Behdad Esfahbod 2013-06-06 17:46:43 UTC
Ok, but please stop adding more details or you will confuse me :).  As I said I understand the problem now and will fix it.
Comment 6 Behdad Esfahbod 2013-06-06 17:47:59 UTC
And please retract your ICU bugreport.  It's wrong.
Comment 7 Behdad Esfahbod 2013-06-06 23:12:49 UTC
Created attachment 246202 [details] [review]
Proposed patch

Attaching untested patch.  Can you please test with this?
Comment 8 Badaa 2013-06-06 23:35:50 UTC
(In reply to comment #6)
> And please retract your ICU bugreport.  It's wrong.

Why should I retract this report? ICU has also same problem.
Comment 9 Behdad Esfahbod 2013-06-06 23:40:38 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > And please retract your ICU bugreport.  It's wrong.
> 
> Why should I retract this report? ICU has also same problem.

Yes.  But your report is misleading.  Two things:

1. ICU LayoutEngine is being deprecated in preference for HarfBuzz anyway,

2. We *can't* allow shaping across those two characters.  Their Unicode properties say that they are nonjoiners.  However, what we want is to make sure those two characters are shaped in the same run as the neighboring Mongolian text such that the GSUB tables in the font can do their magic, whatever it is.

Hope that helps.
Comment 10 Erdene-Ochir Tuguldur 2013-06-06 23:47:34 UTC
(In reply to comment #7)
> Created an attachment (id=246202) [details] [review]
> Proposed patch
> 
> Attaching untested patch.  Can you please test with this?

Hallo Behdad,

the patch fixes the problem.

Thanks
Tuguldur

PS: could you say something about the chrome problem with 0x202F? I think you are working also on Chrome?
Comment 11 Behdad Esfahbod 2013-06-06 23:50:20 UTC
AFAIU Chrome is the same issue.  Basically, if the default font has the glyph for U+202F, we end up choosing the default font for that character instead of the Mongolian font, and that breaks the runs and the font can't do its magic.  Please file a bug at crbug.com and CC me (behdad @ chromium.org).
Comment 12 Erdene-Ochir Tuguldur 2013-06-06 23:54:32 UTC
(In reply to comment #11)
> AFAIU Chrome is the same issue.  Basically, if the default font has the glyph
> for U+202F, we end up choosing the default font for that character instead of
> the Mongolian font, and that breaks the runs and the font can't do its magic. 
> Please file a bug at crbug.com and CC me (behdad @ chromium.org).

Last year, I have filed a bug already: https://code.google.com/p/chromium/issues/detail?id=155334
Should I make a new one?
Comment 13 Behdad Esfahbod 2013-06-06 23:57:40 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > AFAIU Chrome is the same issue.  Basically, if the default font has the glyph
> > for U+202F, we end up choosing the default font for that character instead of
> > the Mongolian font, and that breaks the runs and the font can't do its magic. 
> > Please file a bug at crbug.com and CC me (behdad @ chromium.org).
> 
> Last year, I have filed a bug already:
> https://code.google.com/p/chromium/issues/detail?id=155334
> Should I make a new one?

Ah, *that* one!  Now I understand that bug.  Will follow up there.  No need for a new one.
Comment 14 Behdad Esfahbod 2013-06-07 00:00:47 UTC
(In reply to comment #10)
> (In reply to comment #7)
> > Created an attachment (id=246202) [details] [review] [details] [review]
> > Proposed patch
> > 
> > Attaching untested patch.  Can you please test with this?
> 
> Hallo Behdad,
> 
> the patch fixes the problem.

Thanks for testing.  Pushed to master.
Comment 15 Badaa 2013-06-07 00:18:06 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #6)
> > > And please retract your ICU bugreport.  It's wrong.
> > 
> > Why should I retract this report? ICU has also same problem.
> 
> Yes.  But your report is misleading.  Two things:
> 
> 1. ICU LayoutEngine is being deprecated in preference for HarfBuzz anyway,
> 
> 2. We *can't* allow shaping across those two characters.  Their Unicode
> properties say that they are nonjoiners.  However, what we want is to make sure
> those two characters are shaped in the same run as the neighboring Mongolian
> text such that the GSUB tables in the font can do their magic, whatever it is.
> 
> Hope that helps.

Hi Behdad,
Thanks for this fix!
For 1. The current version of OpenOffice (at least on Windows) still use ICU
breakers. How can I fix it spontaneously? We developed a spellchecker for
mongolian script but we have massive problems with openoffice plug-in. 
For 2. You are quite right, I agree 100%!
Comment 16 Behdad Esfahbod 2013-06-07 00:21:50 UTC
Well, you can still try with ICU LE.  But it occurs to me that this is not a shaping issue.  It's font selection issue and needs to be fixed in LibreOffice itself I think.