After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 121882 - decomposed vowels (AU, O, OO) in Tamil, Malayalam,Oriya, Bengali are not rendered correctly.
decomposed vowels (AU, O, OO) in Tamil, Malayalam,Oriya, Bengali are not rend...
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: indic
1.2.x
Other Linux
: High normal
: 1.4.1
Assigned To: Pango Indic
Pango Indic
Depends on:
Blocks:
 
 
Reported: 2003-09-10 06:29 UTC by Jungshik Shin
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Malaylalam example (with ThoolikaUnicode font) (10.62 KB, image/jpeg)
2003-09-10 06:47 UTC, Jungshik Shin
  Details
Two-part vowels patch (6.64 KB, patch)
2003-09-15 11:49 UTC, Owen Taylor
none Details | Review
Simpler version (3.34 KB, patch)
2004-07-30 18:27 UTC, Owen Taylor
none Details | Review

Description Jungshik Shin 2003-09-10 06:29:12 UTC
Vowels AU, O, and OO have two parts (left and right) and can be represented
either composed or decomposed in Unicode. Pango renders well syallables
with two-part vowel signs represented in a single Unicode
character(composed) but it doesn't render decomposed two-part vowel signs
as well. 
For instance, in Malayalam, the sequence A is rendered differently from the
sequence B.

A : JA + OO : <U+0D1C, U+0D4B>
B : JA + OO : <U+0D1C, U+0D47, U+0D3E>  

The sequence B is rendered as if it were <U+0D1C, U+0D47>(JA + EE) followed
by <U+0020, U+0D3E>
Comment 1 Jungshik Shin 2003-09-10 06:47:33 UTC
Created attachment 19838 [details]
Malaylalam example (with ThoolikaUnicode font)
Comment 2 Jungshik Shin 2003-09-10 08:04:58 UTC
Stragne is that CA+AU and JA+OO are regarded as a single syllable
whether  AU and OO are represented composed or decomposed. 
Comment 3 Noah Levitt 2003-09-10 15:06:00 UTC
The image is an example of the way it currently looks, right? Can you
attach an example of the way it should look?
Comment 4 Jungshik Shin 2003-09-10 15:36:50 UTC
The correct rendering is  included in the screenshot.
 
Dependent vowel signs, OO, U, and AU (with two parts, left and right
parts) should be rendered identically whether they're represented with
a sequence of two Unicode characters(decomposed) or with a single
Unicode character (composed) because both representations are
canonically equivalent to each other.

I've tried the same example on Win2k with the same font (with Mozilla)
and it worked fine there. So, it's surely not a font issue.  
Comment 5 Owen Taylor 2003-09-12 16:38:26 UTC
Pango doesn't make any attempt to normalize before feeding text
to the shape engines, so the fact that the sequences are canonically
equivalent is only marginally relevant.

It's probably just something that needs to be handled within
the indic code.
Comment 6 Jungshik Shin 2003-09-12 16:58:38 UTC
hmmm... ...
 
> It's probably just something that needs to be handled within
> the indic code.

  That's exactly the point of this bug if you thought otherwise.
Comment 7 Owen Taylor 2003-09-15 11:49:14 UTC
Adding a patch that fixes this - it has two parts:

 - First change the syllable-state table to allow
   a dependent vowel to be followed by another dependent
   vowel.

 - Second, change the reordering code to allow several
   dependent vowel signs to combine to give 

The main problem with the new code is overpermissiveness:

 - It allows mixing improper combinations of left 
   and right matras.
 - It allows the left and right matras to be reversed
 - It allows sequences of three or more dependent vowels
   (though sequences of three are needed for for Kannada?)

Fixing that would require significantly more complexity
in the state table.
Comment 8 Owen Taylor 2003-09-15 11:49:45 UTC
Created attachment 19944 [details] [review]
Two-part vowels patch
Comment 9 Owen Taylor 2004-07-30 18:27:40 UTC
Created attachment 30092 [details] [review]
Simpler version

Last patch was larger than it needed to be since the tags and 
cluster index are the same for all matras in a syllable.
Comment 10 Owen Taylor 2004-07-30 18:31:04 UTC
Fri Jul 30 14:05:25 2004  Owen Taylor  <otaylor@redhat.com>
 
        Improve handling of decomposed two-part vowels
        (#121882, Jungshik Shin)
 
        * modules/indic/indic-ot-class-tables.c (stateTable):
        allow a dependent vowel to be followed by another
        dependent vowel.
 
        * modules/indic/indic-ot.c (indic_ot_reorder): Handle
        multiple vowel matras.

Filed as ICU bug:

 http://www.jtcsv.com/cgibin/icu-bugs?findid=4026