GNOME Bugzilla – Bug 701652
wrong shaped classic Mongolian word suffixes
Last modified: 2017-07-12 02:33:18 UTC
Created attachment 246085 [details] [review] patch for Ubuntu 13.04 In classic Mongolian, the Unicode character U+202F (narrow no break space) is used to separate word and word suffixes. To shape the classic Mongolian correctly, the word and the word suffix should be shaped as one entire entity so that the word suffixes should start with either MEDI or FINA opentype features. Currently, pango shapes the word and the word suffix separately, so that the suffix starts with INIT feature. That is wrong. I have attached a simple patch that solves the problem for Ubuntu. Maybe this problem with U+202F is a general problem in Unicode. The chrome got also same problem: https://code.google.com/p/chromium/issues/detail?id=155334 Tuguldur
That doesn't make any sense. The whole point of putting NNBSP is to break shaping. Unless you can point me to Unicode or historical documents showing the inverse to be true, this is a WONTFIX.
Ok, I understand what you mean now. The fonts have GSUB rules to choose the medial/final form, we just have to make sure NNBSP is chosen from the same font as the surrounding text. Your patch is fairly correct. We used to do that for all space characters but years ago I removed them to fix another bug. I'll see if I can reconcile the two approaches.
FWIW this is what the Unicode Standard has to say about NNBSP in Mongolian shaping: NNBSP affects the form of the preceding and following letters. The final letter of the stem or suffix preceding the NNBSP takes the final positional form, whereas the first letter of the suffix following NNBSP may take the normal initial form, a variant initial form, a medial form, or a final form, depending on the particular suffix.
Hi, Initial meaning of NNBSP and MVS is lost in many sources. The logic of the NNBSP is to present mongolian suffixes correctly. MVS handles also similar logic. Every mongolian word started by a initial form of letter and ended by final form of a letter. In the middle should every letter take medial form excepting following 2 conditions. Condition 1: There exist two type of suffixes. A) Connected with small space like STEM SUFFIX. (all noun suffix and some very suffix) B) Connected directly to the word like STEMSUFFIX. (some verb suffixes) Problem: There is no problem for type B but for type A we should write like S[init]TEM[final] S[medial]UFFIX. To solve this problem (also M[final] S[medial]) we introduced NNBSP. That's why this character entered in Unicode. Condition 2: Every final form of vowel A and E has 2 forms. A) After some mongolian letters like "y, n, g, h, l, u" should be written by isolated form namely ORHITS with prefix with small space like WORD[final] A[isol]. Thus, the pre letter of the prefix space should take it's final form. B) In all other cases should by written by SUUL. WORDA, WORDE Problem: There is no problem for case B but for case A we have problem to illustrate final form in the middle of the word. To solve this problem we introduced MVS. There is also another example on the bug, which submitted by me to ICU. http://bugs.icu-project.org/trac/ticket/10212
Ok, but please stop adding more details or you will confuse me :). As I said I understand the problem now and will fix it.
And please retract your ICU bugreport. It's wrong.
Created attachment 246202 [details] [review] Proposed patch Attaching untested patch. Can you please test with this?
(In reply to comment #6) > And please retract your ICU bugreport. It's wrong. Why should I retract this report? ICU has also same problem.
(In reply to comment #8) > (In reply to comment #6) > > And please retract your ICU bugreport. It's wrong. > > Why should I retract this report? ICU has also same problem. Yes. But your report is misleading. Two things: 1. ICU LayoutEngine is being deprecated in preference for HarfBuzz anyway, 2. We *can't* allow shaping across those two characters. Their Unicode properties say that they are nonjoiners. However, what we want is to make sure those two characters are shaped in the same run as the neighboring Mongolian text such that the GSUB tables in the font can do their magic, whatever it is. Hope that helps.
(In reply to comment #7) > Created an attachment (id=246202) [details] [review] > Proposed patch > > Attaching untested patch. Can you please test with this? Hallo Behdad, the patch fixes the problem. Thanks Tuguldur PS: could you say something about the chrome problem with 0x202F? I think you are working also on Chrome?
AFAIU Chrome is the same issue. Basically, if the default font has the glyph for U+202F, we end up choosing the default font for that character instead of the Mongolian font, and that breaks the runs and the font can't do its magic. Please file a bug at crbug.com and CC me (behdad @ chromium.org).
(In reply to comment #11) > AFAIU Chrome is the same issue. Basically, if the default font has the glyph > for U+202F, we end up choosing the default font for that character instead of > the Mongolian font, and that breaks the runs and the font can't do its magic. > Please file a bug at crbug.com and CC me (behdad @ chromium.org). Last year, I have filed a bug already: https://code.google.com/p/chromium/issues/detail?id=155334 Should I make a new one?
(In reply to comment #12) > (In reply to comment #11) > > AFAIU Chrome is the same issue. Basically, if the default font has the glyph > > for U+202F, we end up choosing the default font for that character instead of > > the Mongolian font, and that breaks the runs and the font can't do its magic. > > Please file a bug at crbug.com and CC me (behdad @ chromium.org). > > Last year, I have filed a bug already: > https://code.google.com/p/chromium/issues/detail?id=155334 > Should I make a new one? Ah, *that* one! Now I understand that bug. Will follow up there. No need for a new one.
(In reply to comment #10) > (In reply to comment #7) > > Created an attachment (id=246202) [details] [review] [details] [review] > > Proposed patch > > > > Attaching untested patch. Can you please test with this? > > Hallo Behdad, > > the patch fixes the problem. Thanks for testing. Pushed to master.
(In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #6) > > > And please retract your ICU bugreport. It's wrong. > > > > Why should I retract this report? ICU has also same problem. > > Yes. But your report is misleading. Two things: > > 1. ICU LayoutEngine is being deprecated in preference for HarfBuzz anyway, > > 2. We *can't* allow shaping across those two characters. Their Unicode > properties say that they are nonjoiners. However, what we want is to make sure > those two characters are shaped in the same run as the neighboring Mongolian > text such that the GSUB tables in the font can do their magic, whatever it is. > > Hope that helps. Hi Behdad, Thanks for this fix! For 1. The current version of OpenOffice (at least on Windows) still use ICU breakers. How can I fix it spontaneously? We developed a spellchecker for mongolian script but we have massive problems with openoffice plug-in. For 2. You are quite right, I agree 100%!
Well, you can still try with ICU LE. But it occurs to me that this is not a shaping issue. It's font selection issue and needs to be fixed in LibreOffice itself I think.