GNOME Bugzilla – Bug 118299
Better handling for BENGALI LETTER A/E
Last modified: 2012-08-18 17:11:09 UTC
Trying to keep track of the four different issues in bug 113551 was pretty much impossible for me, so splitting up the comments into separate bug reports. * unmadindu@Softhome.net (Sayamindu Dasgupta): 1. Yaphala --------------- b. The sequence 0985 09CD 09AF 09BE (অ্যা) is not rendered properly. I quote from the Unicode Indic FAQ. Q: What are the Bengali characters used to transcribe the sound "a" (as in English "bat") in Unicode? A: In Bengali, the sequence "zophola" (U+09CD U+09AF) + the "aa" matra (U+09BE) is used for transcribing the English "a" in "bat". This zophola_aa can be seen as a special "composite" matra to write a new Bengali sound, imported from English. Represent these sequences using a halant (virama): Vowel_A_zophola_AA = 0985 09CD 09AF 09BE ( a- halant ya -aa ) Vowel_E_zophola_AA = 098F 09CD 09AF 09BE ( e- halant ya -aa ) If you need to add a candrabindu or other combining mark in the sequence, represent the sequence as: Vowel_A_zophola_AA + candrabindu = 0985 09CD 09AF 09BE 0981 ( a- halant ya -aa candrabindu ) * Additional Comments From Taneem Ahmed 2003-06-01 03:13: Also, a very quick hack (and a bit ugly) is to set U+985 to _ct from _iv, this will fix the 1b issue. I will also upload an image with the result. There is a small side effect, but I am sure everyone can live with that, instead of pango rendering it wrong. [ Image is http://bugzilla.gnome.org/showattachment.cgi?attach_id=17030, I don't know what the "small side effect" referred to above is - OT ] * Additional Comments From Owen Taylor 2003-06-01 04:42: Two quick thoughts on 1b: Does the 'independent vowel + halant + ya + aa' combination work in Windows? The OT bengali specification strongly implies that uniscribe doesn't handle it. It should be pretty trivial to handle by adding an extra flag to scriptFlags and writing a special case for it in indic_ot_reorder(). * Additional Comments From Taneem Ahmed 2003-06-01 04:54: I tried what you said, 1b does not get fixed with out the _ct hack. Let me explain this problem. Take the following input: U+985 U+9CD U+9AF U+9BE The problem with this is that U+985 is an independent vowel, and right now this input will become three syllables, (U+985) (U+9CD) (U+9AF U+9BE). This is not right obviously. Even if we somehow treat it as one syllable, we end up setting the tag blwf_p to all of them. This is a very very special case for U+985 where it acts as a consonant instead of a vowel. If you want to deal with it properly then we will have to add quite a few checks for U+985 in the reorder code to add proper tags. But as indic-ot.c is used by all the indic scripts, I think it will be even a bigger hack, risk, and extra delay. As this is a pure Bengali issue, I thought it will be better to keep the hack limited to Bengali :) The only side effect for my hack is that U+985 can now take up other independent vowels, which may actually be considered as a feature :) And I don't have access to a windows box at home, don't know what windows does. Can someone else please check? * Additional Comments From Owen Taylor 2003-06-01 10:49 It seems to me that the next step for 1b is to: - Find a uniscribe enabled copy of Microsoft windows - See if 'U+985 U+9CD U+9AF U+9BE' renders as desired - Try another sequence that would make sense for a consonant, but doesn't make sense for U+985, say U+985 + halant + <normal consonant> and see how that renders. Another approach would be simply to ask on the OpenType mailing list (http://www.microsoft.com/typography/otspec/otlist.htm) and ask for clarification of the relationship between the Unicode Indic FAQ item and the Bengali OpenType spec. * Additional Comments From Taneem Ahmed 2003-06-01 16:50 I just looked at the Bengali part of chapter 9 of Unicode4.0. It cleary states what to do for 1b. I don't think we need to bring it up with OpenType mailing list, unless we want to know if they are planning to add some new feature in OT layout table. And IMHO if uniscribe does not render it properly then we need to let them know, not follow them :)
On a related note, I think the Bengali letter E (098F) should also be considered as a consonant. This is specified in the Indic FAQ, as well as in Chapter 9 of the Unicode standard (http://www.unicode.org/book/preview/ch09.pdf). Also, I am not very sure about this, but the sequence 09B0 09CD 098B should be allowed to form a reph with the vowel 098B. This is required for the Bengali word "Nairhit" and afaik, the latest beta of Uniscribe forms a reph (We had some discussion with Paul Nelson of Microsoft Typography on this - if you want I can forward the related emails to you) - or do I file this as yet another bug?
Something I would like to point out here. The letter A acts as a consonant, *only* when it is followed by halant + ya. In other cases, it should act as a normal vowel. I have just received a file where the user using a version of pango with the _ct hack wrote Bengali letter AA as A + AA vowel sign. Visually the result is the same, but can cause problems while searching anddoing other stuff. Example rendering at http://www.peacefulaction.org/sayamindu/images/garbage.png Recently I had the chance to play around with a Microsoft Windows XP box - and they can't handle a halant ya - as Microsoft has not released official Bengali supporting version of Uniscribe yet.
So, is making the _ct change for A and E better or nothing or not? I can leave this bug open, but I want to know whether I should make that change for 1.4.0.
My proposal - make the changes. Microsoft is doing the same thing with Uniscribe, and ditto with the QT people. However, we should try to have a better way to do this in the next versions.
Fri Feb 27 14:26:34 2004 Owen Taylor <otaylor@redhat.com> * modules/indic/indic-ot-class-tables.c (bengCharClasses): Mark BENGALI LETTER A (U+0985) and BENGALI LETTER E (U+098F) as consonants which gives better behavior when they are combined wiht halant, though it isn't exactly right. (#118299, Sayamindu Dasgupta) (Filed as ICU bug 3626 (http://www.jtcsv.com/cgibin/icu-bugs/))
Has this bug already been fixed? What problem has it?
(In reply to comment #6) > Has this bug already been fixed? > What problem has it? > Is this CLOSED ?
I would like to write bug summary in short first bug: 0985 (vowel) + 09BE (matra) = অা should not combine as it may create spoofing as person can 0986 or 0985+ 09BE both will provide same rendering output Bug Origin: This is happening due to changing character class of 0985 (vowel) to consonant in pango for handling exceptional combination of bengali (IMHO it is wrong) 0985 + 09cd + 09BE but it produce above mentioned bug regression Solution: 1) change character class of 0985 back to vowel 2) add a rule in font to handle this exceptional condition of bengali script
Created attachment 138441 [details] attachment showing result correct rendering result changes in Pango: changed character class of 0985 and 098F back to vowel character changes in lohit font: added gsub rule for handling this exceptional case
Created attachment 138443 [details] [review] patch to solve bug just changed character classes 0f 0985 abd 098f back to vowel
I will do corresponding changes in lohit-fonts as well so things can work fine from next version on words
Created attachment 138446 [details] lohit font for testing with pango changes
While the rendering shown in your last screenshot is correct, I'm not sure if this will work as we want it to. The reason being, this is a special-casing done essentially only in a single font (namely Lohit). We cannot possibly go and change each and every Bengali OpenType font that is out there. We can try with the Open Source fonts - but if a user downloads fonts like Vrinda and ShonarBangla from Microsoft, they will get unexpected rendering from Pango, and don't think that is acceptable. A possible way forward with what you have done is to coordinate with the people who wrote the Bengali Opentype related specs (in Microsoft's typography division) - get this in as a recommendation (I don't know how easy this will) in the official Bengali OpenType specs, and then we can move ahead with the bug.
This is getting interesting now. However, from comment #11 and #12 it seems pango has managed to look up the gsub tables and do the actual work of rearranging the glyphs. Its difficult for me to comprehend why U+0985 and U+098f shouldn't be declared as independent vowel (_iv) in pango?
(In reply to comment #13) > While the rendering shown in your last screenshot is correct, I'm not sure if > this will work as we want it to. The reason being, this is a special-casing > done essentially only in a single font (namely Lohit). We cannot possibly go > and change each and every Bengali OpenType font that is out there. We can try > with the Open Source fonts - but if a user downloads fonts like Vrinda and > ShonarBangla from Microsoft, they will get unexpected rendering from Pango, and > don't think that is acceptable. can you update me with rendering result on Microsoft, with Lohit (not modified by me) and say local MS fonts are they working as expected with none of the above bug as well as possible regresion? if that is ok, means somehow uniscribe is handling in better way > > A possible way forward with what you have done is to coordinate with the people > who wrote the Bengali Opentype related specs (in Microsoft's typography > division) - get this in as a recommendation (I don't know how easy this will) > in the official Bengali OpenType specs, and then we can move ahead with the > bug. that is long process, though i will surely try for that
We've merged the HarfBuzz branch. Closing obsolete.