Bug 118299 – Better handling for BENGALI LETTER A/E

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 118299 - Better handling for BENGALI LETTER A/E


Summary:	Better handling for BENGALI LETTER A/E


Status:	RESOLVED OBSOLETE

Product:	pango
Classification:	Platform
Component:	indic
Version:	1.2.x
Hardware:	Other Linux

Importance:	Normal minor
Target Milestone:	Small fix
Assigned To:	Pango Indic
QA Contact:	pango-maint

URL:
Whiteboard:

Depends on:
Blocks:	113551

Reported:	2003-07-25 14:12 UTC by Owen Taylor
Modified:	2012-08-18 17:11 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
attachment showing result correct rendering result (34.25 KB, image/png) 2009-07-15 13:35 UTC, Pravin Satpute		Details
patch to solve bug (937 bytes, patch) 2009-07-15 13:43 UTC, Pravin Satpute	none	Details \| Review
lohit font for testing with pango changes (136.01 KB, application/octet-stream) 2009-07-15 13:59 UTC, Pravin Satpute		Details

Description Owen Taylor 2003-07-25 14:12:13 UTC

Trying to keep track of the four different issues in 
bug 113551 was pretty much impossible for me, so splitting
up the comments into separate bug reports.

* unmadindu@Softhome.net (Sayamindu Dasgupta):

1. Yaphala
---------------
b. The sequence 0985 09CD 09AF 09BE (&#2437;&#2509;&#2479;&#2494;) is not
rendered properly.

    I quote from the Unicode Indic FAQ.

	Q: What are the Bengali characters used to transcribe the sound "a" (as in
English "bat") in Unicode?

	A: In Bengali, the sequence "zophola" (U+09CD U+09AF) + the "aa" matra
(U+09BE) is used for transcribing the 		English "a" in "bat". This
zophola_aa can be seen as a special "composite" matra to write a new
Bengali sound, 	  imported from English. Represent these sequences using a
halant (virama):

		Vowel_A_zophola_AA = 0985 09CD 09AF 09BE ( a- halant ya -aa )
		Vowel_E_zophola_AA = 098F 09CD 09AF 09BE ( e- halant ya -aa )
	
	If you need to add a candrabindu or other combining mark in the sequence,
represent the sequence as:

		Vowel_A_zophola_AA + candrabindu = 0985 09CD 09AF 09BE 0981 ( a- halant
ya -aa candrabindu )

* Additional Comments From Taneem Ahmed 2003-06-01 03:13:

Also, a very quick hack (and a bit ugly) is to set U+985 to _ct from _iv, 
this will fix the 1b issue. I will also upload an image with the result. There 
is a small side effect, but I am sure everyone can live with that, instead 
of pango rendering it wrong. 

[ Image is http://bugzilla.gnome.org/showattachment.cgi?attach_id=17030,
  I don't know what the "small side effect" referred to above is  - OT ]

* Additional Comments From Owen Taylor 2003-06-01 04:42:

Two quick thoughts on 1b:

 Does the 'independent vowel + halant + ya + aa' combination
 work in Windows? The OT bengali specification strongly implies
 that uniscribe doesn't handle it.

 It should be pretty trivial to handle by adding an extra
 flag to scriptFlags and writing a special case for it
 in indic_ot_reorder().

* Additional Comments From Taneem Ahmed 2003-06-01 04:54:

I tried what you said, 1b does not get fixed with out the _ct hack. Let me 
explain this problem. Take the following input: 
 
U+985 U+9CD U+9AF U+9BE 
 
The problem with this is that U+985 is an independent vowel, and right 
now this input will become three syllables, (U+985) (U+9CD) (U+9AF 
U+9BE). This is not right obviously. Even if we somehow treat it as one 
syllable, we end up setting the tag blwf_p to all of them. 
 
This is a very very special case for U+985 where it acts as a consonant 
instead of a vowel. If you want to deal with it properly then we will have 
to add quite a few checks for U+985 in the reorder code to add proper 
tags. But as indic-ot.c is used by all the indic scripts, I think it will be 
even a bigger hack, risk, and extra delay. As this is a pure Bengali 
issue, I thought it will be better to keep the hack limited to Bengali :) The 
only side effect for my hack is that U+985 can now take up other 
independent vowels, which may actually be considered as a feature :) 
And I don't have access to a windows box at home, don't know what 
windows does. Can someone else please check? 

* Additional Comments From Owen Taylor 2003-06-01 10:49

It seems to me that the next step for 1b is to:

 - Find a uniscribe enabled copy of Microsoft windows
 - See if 'U+985 U+9CD U+9AF U+9BE' renders as desired
 - Try another sequence that would make sense for a 
   consonant, but doesn't make sense for U+985, 
   say 
       U+985 + halant + <normal consonant>
   and see how that renders.

Another approach would be simply to ask on the 
OpenType mailing list
(http://www.microsoft.com/typography/otspec/otlist.htm)
and ask for clarification of the relationship between
the Unicode Indic FAQ item and the Bengali OpenType spec.

* Additional Comments From Taneem Ahmed 2003-06-01 16:50

I just looked at the Bengali part of chapter 9 of Unicode4.0. It cleary 
states what to do for 1b. I don't think we need to bring it up with 
OpenType mailing list, unless we want to know if they are planning to 
add some new feature in OT layout table. And IMHO if uniscribe does 
not render it properly then we need to let them know, not follow them :)

Comment 1 Sayamindu Dasgupta 2003-07-25 15:35:12 UTC

On a related note, I think the Bengali letter E (098F) should also be
considered as a consonant. This is specified in the Indic FAQ, as well
as in Chapter 9 of the Unicode standard
(http://www.unicode.org/book/preview/ch09.pdf). 
Also, I am not very sure about this, but the sequence 09B0 09CD 098B
should be allowed to form a reph with the vowel 098B. This is required
for the Bengali word "Nairhit" and afaik, the latest beta of Uniscribe
forms a reph (We had some discussion with Paul Nelson of Microsoft
Typography on this - if you want I can forward the related emails to
you) - or do I file this as yet another bug?

Comment 2 Sayamindu Dasgupta 2004-02-24 18:36:47 UTC

Something I would like to point out here. 
The letter A acts as a consonant, *only* when it is followed by halant
+ ya. In other cases, it should act as a normal vowel. I have just
received a file where the user using a version of pango with the _ct
hack wrote Bengali letter AA as A + AA vowel sign. Visually the result
is the same, but can cause problems while searching anddoing  other
stuff. Example rendering at
http://www.peacefulaction.org/sayamindu/images/garbage.png

Recently I had the chance to play around with a Microsoft Windows XP
box - and they can't handle a halant ya - as Microsoft has not
released official Bengali supporting version of Uniscribe yet.

Comment 3 Owen Taylor 2004-02-24 19:30:58 UTC

So, is making the _ct change for A and E better or nothing or 
not? I can leave this bug open, but I want to know whether
I should make that change for 1.4.0.

Comment 4 Sayamindu Dasgupta 2004-02-25 03:48:49 UTC

My proposal - make the changes. Microsoft is doing the same thing with
Uniscribe, and ditto with the QT people. However, we should try to
have a better way to do this in the next versions.

Comment 5 Owen Taylor 2004-02-27 19:43:39 UTC

Fri Feb 27 14:26:34 2004  Owen Taylor  <otaylor@redhat.com>
 
        * modules/indic/indic-ot-class-tables.c (bengCharClasses):
        Mark BENGALI LETTER A (U+0985) and BENGALI LETTER E (U+098F)
        as consonants which gives better behavior when they
        are combined wiht halant, though it isn't exactly right.
        (#118299, Sayamindu Dasgupta)

(Filed as ICU bug 3626 (http://www.jtcsv.com/cgibin/icu-bugs/))

Comment 6 LingNing Zhang 2006-03-15 14:19:33 UTC

Has this bug already been fixed?
What problem has it?

Comment 7 Sankarshan Mukhopadhyay 2007-03-07 14:32:55 UTC

(In reply to comment #6)
> Has this bug already been fixed?
> What problem has it?
> 

Is this CLOSED ?

Comment 8 Pravin Satpute 2009-07-15 13:28:49 UTC

I would like to write bug summary in short first

bug:

0985 (vowel) + 09BE (matra) = অা 
should not combine as it may create spoofing as
person can 0986 or 0985+ 09BE both will provide same rendering output

Bug Origin:

This is happening due to changing character class of 0985 (vowel) to consonant in pango for handling exceptional combination of bengali (IMHO it is wrong)

0985 + 09cd + 09BE but it produce above mentioned bug regression

Solution:

1) change character class of 0985 back to vowel
2) add a rule in font to handle this exceptional condition of bengali script

Comment 9 Pravin Satpute 2009-07-15 13:35:56 UTC

Created attachment 138441 [details]
attachment showing result correct rendering result 

changes in Pango: changed character class of 0985 and 098F back to vowel character

changes in lohit font: added gsub rule for handling this exceptional case

Comment 10 Pravin Satpute 2009-07-15 13:43:00 UTC

Created attachment 138443 [details] [review]
patch to solve bug

just changed character classes 0f 0985 abd 098f back to vowel

Comment 11 Pravin Satpute 2009-07-15 13:45:21 UTC

    I will do corresponding changes in lohit-fonts as well so things can work fine
    from next version on words

Comment 12 Pravin Satpute 2009-07-15 13:59:59 UTC

Created attachment 138446 [details]
lohit font for testing with pango changes

Comment 13 Sayamindu Dasgupta 2009-07-15 18:19:34 UTC

While the rendering shown in your last screenshot is correct, I'm not sure if this will work as we want it to. The reason being, this is a special-casing done essentially only in a single font (namely Lohit). We cannot possibly go and change each and every Bengali OpenType font that is out there. We can try with the Open Source fonts - but if a user downloads fonts like Vrinda and ShonarBangla from Microsoft, they will get unexpected rendering from Pango, and don't think that is acceptable.

A possible way forward with what you have done is to coordinate with the people who wrote the Bengali Opentype related specs (in Microsoft's typography division) - get this in as a recommendation (I don't know how easy this will) in the official Bengali OpenType specs, and then we can move ahead with the bug.

Comment 14 sandeep 2009-07-16 06:23:20 UTC

This is getting interesting now. 

However, from comment #11 and #12 it seems pango has managed to look up the gsub tables and do the actual work of rearranging the glyphs. 

Its difficult for me to comprehend why U+0985 and U+098f shouldn't be declared as independent vowel (_iv) in pango?

Comment 15 Pravin Satpute 2009-07-16 07:10:06 UTC

(In reply to comment #13)
> While the rendering shown in your last screenshot is correct, I'm not sure if
> this will work as we want it to. The reason being, this is a special-casing
> done essentially only in a single font (namely Lohit). We cannot possibly go
> and change each and every Bengali OpenType font that is out there. We can try
> with the Open Source fonts - but if a user downloads fonts like Vrinda and
> ShonarBangla from Microsoft, they will get unexpected rendering from Pango, and
> don't think that is acceptable.

can you update me with rendering result on Microsoft, with Lohit (not modified by me) and say local MS fonts
are they working as expected with none of the above bug as well as possible regresion?
if that is ok, means somehow uniscribe is handling in better way

> 
> A possible way forward with what you have done is to coordinate with the people
> who wrote the Bengali Opentype related specs (in Microsoft's typography
> division) - get this in as a recommendation (I don't know how easy this will)
> in the official Bengali OpenType specs, and then we can move ahead with the
> bug.

that is long process, though i will surely try for that

Comment 16 Behdad Esfahbod 2012-08-18 17:11:09 UTC

We've merged the HarfBuzz branch.  Closing obsolete.