After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 118302 - Handling of ZWJ and ZWNJ
Handling of ZWJ and ZWNJ
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: indic
1.2.x
Other Linux
: Normal normal
: 1.4.2
Assigned To: pango-maint
pango-maint
Depends on: 91542
Blocks: 113551
 
 
Reported: 2003-07-25 14:37 UTC by Owen Taylor
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Eyelash Ra, still some problems (618 bytes, image/png)
2003-09-23 23:48 UTC, Owen Taylor
Details

Description Owen Taylor 2003-07-25 14:37:03 UTC
Trying to keep track of the four different issues in 
bug 113551 was pretty much impossible for me, so splitting
up the comments into separate bug reports.

* unmadindu@Softhome.net (Sayamindu Dasgupta):

3. ZWNJ & ZWJ
---------------------

Rendering of certain strings have led us to believe that Pango is somehow
confusing between Zero Width Non Joiner (ZWNJ) and Zero Width Joiner (ZWJ).
<consonant> <halant> <ZWJ> <consonant> is rendered in the exact same way as
<consonant> <halant> <ZWNJ> <consonant>. This should not happen - as the
screenshot taken in Yudit shows. <consonant> <halant> <ZWJ> should render
the "half form" of the consonant, while Pango is rendering the "halant
form" instead (or it may be simply putting the consonant followed by the
halant - I am not very sure). This issue becomes important when we handle
the khanda-ta character in Bengali - a short write-up on this can be found
in the Unicode Indic FAQ.

* Additional Comments From Taneem Ahmed 2003-06-01 01:05

Owen, can you please take a look at the third issue? It seems like a word 
with ZWJ or ZWNJ are broken into three items (in pango_itemize), and 
then treated alike. 

* Additional Comments From Owen Taylor 2003-06-01 04:04

Your issue 3. is bug 91542 .. in Pango currently, 
every character has to be assigned to *some* script.

Is there an easy workaround short of fixing 91542?

We can't assign ZWNJ to indic-fc, because it is
needed, e.g., for displaying Persian in Arabic
script, but perhaps we can add ZWJ to the list of
characters that indic-fc.c handles? As it turns
out, that won't work either because the Indic engine
advertises itself as one engine for each different
Indic language. So, only one Indic script can
get ZWJ...

So, in the end, I don't have any idea other than fixing
bug 91542.

* Additional Comments From Taneem Ahmed 2003-06-01 04:54

As for 3, today was my first day hacking pango... no way I can make a 
meaningful comment on this one. The only idea that crossed my mind is 
to consider ZWJ as part of the language left (or right in case of LTR) to it. 
Most of the code in indic directory seems to be checking for 
CC_ZERO_WIDTH_MARK, but currently this case can not happen. I am 
not sure about other engines. 

* Additional Comments From Taneem Ahmed 2003-06-01 16:50

Issue 3 is quite important for Bengali at least. Unicode 4.0 seems to be 
using ZWJ/ZWNJ to deal with few commonly used cases.
Comment 1 Owen Taylor 2003-09-23 23:48:47 UTC
Created attachment 20231 [details]
Eyelash Ra, still some problems
Comment 2 Owen Taylor 2003-09-23 23:50:31 UTC
The code is now in Pango to pass ZWJ to the Indic shaper;
the attached screenshot shows the indic shaper handling
the Devanagari eyelash ra - the only place where the indic
shaper currently cares about ZWJ.

However, as the attached screenshot also shows, what the indic 
shaper does with the ZWJ is not quite correct.
Comment 3 Sayamindu Dasgupta 2004-01-09 04:56:24 UTC
Adding Karunakar - he would be able to understand the eyelash ra
issues better.
Comment 4 Owen Taylor 2004-12-15 18:50:39 UTC
I think the remaining issues here are really:

 bug 121670 - Chillu in Malayalam not properly rendered
 bug 145233 - Zero-width non-joiner is displayed.

So I'm going to close this one.