GNOME Bugzilla – Bug 118302
Handling of ZWJ and ZWNJ
Last modified: 2004-12-22 21:47:04 UTC
Trying to keep track of the four different issues in bug 113551 was pretty much impossible for me, so splitting up the comments into separate bug reports. * unmadindu@Softhome.net (Sayamindu Dasgupta): 3. ZWNJ & ZWJ --------------------- Rendering of certain strings have led us to believe that Pango is somehow confusing between Zero Width Non Joiner (ZWNJ) and Zero Width Joiner (ZWJ). <consonant> <halant> <ZWJ> <consonant> is rendered in the exact same way as <consonant> <halant> <ZWNJ> <consonant>. This should not happen - as the screenshot taken in Yudit shows. <consonant> <halant> <ZWJ> should render the "half form" of the consonant, while Pango is rendering the "halant form" instead (or it may be simply putting the consonant followed by the halant - I am not very sure). This issue becomes important when we handle the khanda-ta character in Bengali - a short write-up on this can be found in the Unicode Indic FAQ. * Additional Comments From Taneem Ahmed 2003-06-01 01:05 Owen, can you please take a look at the third issue? It seems like a word with ZWJ or ZWNJ are broken into three items (in pango_itemize), and then treated alike. * Additional Comments From Owen Taylor 2003-06-01 04:04 Your issue 3. is bug 91542 .. in Pango currently, every character has to be assigned to *some* script. Is there an easy workaround short of fixing 91542? We can't assign ZWNJ to indic-fc, because it is needed, e.g., for displaying Persian in Arabic script, but perhaps we can add ZWJ to the list of characters that indic-fc.c handles? As it turns out, that won't work either because the Indic engine advertises itself as one engine for each different Indic language. So, only one Indic script can get ZWJ... So, in the end, I don't have any idea other than fixing bug 91542. * Additional Comments From Taneem Ahmed 2003-06-01 04:54 As for 3, today was my first day hacking pango... no way I can make a meaningful comment on this one. The only idea that crossed my mind is to consider ZWJ as part of the language left (or right in case of LTR) to it. Most of the code in indic directory seems to be checking for CC_ZERO_WIDTH_MARK, but currently this case can not happen. I am not sure about other engines. * Additional Comments From Taneem Ahmed 2003-06-01 16:50 Issue 3 is quite important for Bengali at least. Unicode 4.0 seems to be using ZWJ/ZWNJ to deal with few commonly used cases.
Created attachment 20231 [details] Eyelash Ra, still some problems
The code is now in Pango to pass ZWJ to the Indic shaper; the attached screenshot shows the indic shaper handling the Devanagari eyelash ra - the only place where the indic shaper currently cares about ZWJ. However, as the attached screenshot also shows, what the indic shaper does with the ZWJ is not quite correct.
Adding Karunakar - he would be able to understand the eyelash ra issues better.
I think the remaining issues here are really: bug 121670 - Chillu in Malayalam not properly rendered bug 145233 - Zero-width non-joiner is displayed. So I'm going to close this one.