Bug 111166 – Tamil: Left matra placement

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 111166 - Tamil: Left matra placement


Summary:	Tamil: Left matra placement


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	general
Version:	1.2.x
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	1.2.4
Assigned To:	pango-maint
QA Contact:	pango-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2003-04-19 20:07 UTC by prabu
Modified:	2003-07-25 16:12 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Fix left matra placement bug for Tamil in indic engine (1007 bytes, patch) 2003-05-04 05:42 UTC, Sivaraj D	none	Details \| Review
Port of mprefixups code from ICU engine. Also includes Owen's patch from bug# 112433. (14.66 KB, patch) 2003-07-19 03:33 UTC, Sivaraj D	none	Details \| Review
Cleaned up version of port patch (13.94 KB, patch) 2003-07-24 23:30 UTC, Owen Taylor	none	Details \| Review

Description prabu 2003-04-19 20:07:48 UTC

Tamil unicode rendering is broken in the current Pango library. This 
causes all gtk2+ based applcations nusable if localised to tamil Unicode. 
This URL http://tamil-Mandrake.sourceforge.net/screenshots/pango.jpg shows 
the problem in current pango.

With more applications like Mozilla, openoffice and other applications 
depending on pango for rendering it is very important to fix asap.

Comment 1 Owen Taylor 2003-04-19 22:30:28 UTC

To describe the problem in more detail - in Tamil
(and Malayalam) when you have a vowel with a left
part, that should go to the left of the base glyph
rather than to the left of the entire syllable.

This is a known issue. Eric Mader has plans to work on
this, but won't have time to do so for another month or two. 
If you want it to fix it before then, you might want to contact
him (mader@jtcsv.com) and ask him what is involved.

[ Neither Mozilla nor OpenOffice uses Pango currently, 
and I don't know of plans's to do so for either ]

Comment 2 Dinesh Nadarajah 2003-04-21 15:55:53 UTC

Bug was first reported more than 12 months ago to Indic module 
author. (I think when RedHat 7.3 came out). 

Thanks

Comment 3 Owen Taylor 2003-04-21 16:03:07 UTC

Very unlikely that *this* particular bug was reported
that long ago, since the current OpenType Indic support
was only released as part of Pango-1.2, which came
out a couple of months ago :-)

[ The older bitmap font support for Tamil could have
  had a similar bug, of course ]

Comment 4 Dinesh Nadarajah 2003-04-21 16:11:48 UTC

Yes. You may be right. But the rendering issue has been around for 
sometime. It was not there when Vikram Subramanian introduced the 
first Tamil module for Pango (way back when). The image rendered by 
the link above (provided by Prabu) was generated on RedHat 7.3 (when 
it had just been released). 

Thanks.

-D

Comment 5 Owen Taylor 2003-04-21 16:23:22 UTC

Well, Red Hat 7.3 had *no* capability to render antialiased
Tamil. So, if they were rendered on Red Hat 7.3, the
software must have been installed separately (Pango CVS?)

Still, I don't think debating when the problem first appeared
is really at all important here :-)

Comment 6 Sivaraj D 2003-05-04 05:42:17 UTC

Created attachment 16244 [details] [review]
Fix left matra placement bug for Tamil in indic engine

Comment 7 Sivaraj D 2003-05-04 05:55:34 UTC

This is a preliminary fix.  It is against pango-1.1.1 rpm for RH8.0.
But I did verify it against the current CVS version and found them to
be same.

I have only tested it for Tamil rendering.  The changes may possibly
affect other scripts.  This fix will not correct the problem for
Malayalam.  I will consult with some Malayalam friends and get another
patch later.

Comment 8 Owen Taylor 2003-05-06 19:35:38 UTC

I discussed the patch above with Eric Mader some;
Fundamentally, the patch above is not in the right place,
since it affects the ordering of the glyphs *before*
the GSUB stage, not after the GSUB stage, so 
conjunct consonants won't be formed properly. 
Since Tamil has very few conjunct consonants, this
is only a minor problem there, but it is a big problem
for Malayalam.

For Tamil, the problem is apparently with the KA+SSA
conjunct.

If you have, say:

 KA + VIRAMA + SSA + VOWEL_SIGN_O

The display should be:

 O_LEFT KA_SSA_CONJUNCT + O_RIGHT

But with the above patch, you'll get

 KA + VIRAMA + O_LEFT + KA + O_RIGHT

It may still make sense to try to fix things in something
like this simple fashion for Pango-1.2.

It should be possible to extend things to 
properly hand KA+SSA since as I understand, that's
the only conjunct for Tamil and should be present
n every font. The code should simply recognize
KA+VIRAMA+SSA and place the left matra to the
left of all the glyphs before applying GSUB.

Here's a description of the right fix from Eric:

===
The Indic code which I added to Pango is based on the Indic OpenType code
in the ICU LayoutEngine (http://oss.software.ibm.com/icu). I have fixed
this problem in the ICU code, but cannot easilly port the change to Pango
because the details of the information available are different...
 
In brief, the ICU solution involves remembering the location of the matra
and the base consonant before the GSUB table is applied, and using the
saved information to move the matra afterwords. The complication in Pango
is two-fold: the first complication is that in ICU for each output glyph I
record the character index of the character which originally produced that
glyph - in Pango, what's remembered for each glyph is the character index
of the first character in the syllable. The other complication is that in
ICU there are always as many glyphs as there were input characters - if I
form a ligature, I replace the glyph for the first component of the
ligature with the glyph for the ligature, and the glyphs for the rest of
the components are replaced by special "place holder" glyphs - this means
that the position of a particular glyph in the output glyph array will
remain the same during GSUB processing. In Pango, when a ligature is
formed, the glyph array is shortened to contain *only* the glyph for the
ligature. (i.e. no "place holder" glyphs are used) This makes it harder to
find the glyphs for the matra and the base consonant.
 
So, what needs to change is that for each glyph, the Pango code needs to
remember the actual character offset for that glyph rather than the offset
for the first character in the syllable, and the code which locates the
matra and the base glyph after GSUB processing will need to look for these
character offsets in the glyph output rather than just using them as
indices into the array. Also, after the matra fixups have been run, the
character offsets for each glyph need to be changed back to the offset for
the first character in the syllable. I don't expect any of this to be
particularly hard, though I haven't worked out the low-level details of
how to do it yet.
===

Implementation note on the above:

To keep track of which input glyph each character corresponds
to without changing the pango-ot.h interfaces, you can
use glyphs->log_clusters.

What you would do is in indic-fc.c:set_glyphs(), instead
of setting log_clusters[i] = indices[i], set 
log_clusters[i]=i. Then, on exit from pango_ot_shape(),
the value of log_clusters[] for each output glyph
gives the index of the corresponding input glyph.

You can then compute the real log_clusters by returning
from indic_engine_shape, by looping over the output
glyphs and doing:

 log_clusters[i] = indices[log_clusters[i]]

Comment 9 Owen Taylor 2003-05-27 16:55:22 UTC

If we want a quick-fix for 1.2.x, someone will need to
fix up the patch above for KA+SSA (or tell me that it
doesn't matter)

Comment 10 Sivaraj D 2003-05-27 19:09:10 UTC

I have raised the issue in tamilinix mailing list.  There is a 
problem with the way KSHA is handled elsewhere.  

The sequence is not always supposed to form a ligature.  There are 
many instances where it should not form a ligature, but rendered as 
indivitual characters.  So I would prefer it to be ligated only when 
it has a ZWJ within: 

  KA + VIRAMA (pulli) + ZWJ + SSA

I will update this bug based on the response from the list.

Comment 11 Owen Taylor 2003-05-27 19:25:24 UTC

It's not clear to me that the above is allowed by the
Unicode standard; the Unicode standard explicitely 
describes using ZWNJ to prevent conjunct forms and
does *not* mention a use of ZWJ as above.

Sticking to the Unicode standard here is important,
or text from various implementations won't be
interchangable.

Comment 12 Owen Taylor 2003-05-28 20:37:59 UTC

Links to currently relevant code:

http://oss.software.ibm.com/cvs/icu/icu/source/layout/MPreFixups.h
http://oss.software.ibm.com/cvs/icu/icu/source/layout/MPreFixups.cpp
http://oss.software.ibm.com/cvs/icu/icu/source/layout/IndicReordering.cpp

It looks like this really should be close to trival to port to
Pango with the small patch I posted to bug 112433 ... an hour
or two perhaps. With that in mind, I'm not sure a quick-hack
fix really makes sense.

Comment 13 Sivaraj D 2003-06-03 13:27:56 UTC

This issue is currently being debated in INFITT working group 2 
for Unicode related matters.  The choices currently seem to be 
either define a new character XA for the ligature or let the 
current definition alone.  Use of ZWJ doesn't seem to be favoured.
So I am retracting from my previous position.

In either of the above cases MPreFixup code makes sense.  
Besides, this code is also needed for Malayalam anyway.

Comment 14 Sivaraj D 2003-07-19 03:33:08 UTC

Created attachment 18428 [details] [review]
Port of mprefixups code from ICU engine.  Also includes Owen's patch from bug# 112433.

Comment 15 Sivaraj D 2003-07-19 03:58:25 UTC

The above patch contains port of MPreFixup code from ICU engine to
Pango.  I have tested it for Tamil script only.  It corrects the
pre-modifier problem and doesn't break KSHA ligature.  The patch is
against pango cvs as of July 11.

However, this doesn't fix an additional problem reported in Tamilinix
list.

Behaviour of backspace wrt Tamil is incorrect in Pango.  Currently
backspace will delete the whole syllable.  While this behaviour might
make sense in other indic scripts, it is unacceptable in Tamil.
Backspace should only delete one Tamil character, namely a consonant
followed by its modifiers, except in case of KSHA and SRI, which are
by definition /foreign/ characters.

One of the main reasons this is incorrect is, in pure Tamil, starting
any  word with a CONSONANT + VIRAMA is illegal.  This rule is
fundamental to the Tamil language and is like, first grade grammer.

If I understand correctly, with the current setup in Pango (and ICU),
many soft-hyphenation will occur right before a CONSONANT + VIRAMA.

I do not know how we would be able to solve this problem with this
mprefixups code.  Only suitable way I can think of right now is to
change the indic_ot_class_tables logic to handle this situation.
I am looking into that alternative.  Please let me know your comments
on this.

Comment 16 Owen Taylor 2003-07-19 12:34:55 UTC

I'll try to look at the MPre code soon.

I'm not sure if I'm understanding your other issue correctly
(and the golden rule of bugzilla is one issue per bug report),
but the Indic modules have nothing to do with the delete key
handling. The delete and cursor navigation boundaries are
determined solely by the results of pango_break()

The plan for handling the problem of Delete deleting too
much in Indic languages is to add another flag to 
PangoLogAttr, perhaps "is_delete_break"; using
"is_cursor_position" as we do currently doesn't work very
well, but you can't simply delete one *character* - 
European combinations of Character+accent are seen by
users as being a single letter.

Comment 17 Sivaraj D 2003-07-21 13:04:47 UTC

Thanks Owen.  I will open another bug report for this problem, and 
also look into the code you have mentioned above.  In that case, if 
this mpre code works, we can close this bug.

Comment 18 Owen Taylor 2003-07-24 23:29:01 UTC

I've gone through your patch and adjusted it a bit. Here's
what I did in some detail:

 - Renamed mprefix_add/remove to indic_mprefixups_add/
   indic_mprefixups_remove, since all exported symbols
   need to be in a unique namespace.

 - Gave mprefixups Pango style constructors/destructors
   indic_mprefixups_new()/indic_mprefixups_destroy()
 
 - Moved the _FixupData structure definition into
   mprefixups.c.

 - Moved various variables in mprefixups_apply() to
   be closer to their location in the original
   instead of at the outermost block.

 - Restore a comment to indic_mprefixups_add() that
   got lost along the way.
 
 - Removed the charIndices parameter from mprefixups_apply()
   since it was unused

 - Made mprefixups_apply() move around glyphs->log_clusters[],
   since that corresponds to charIndices, and needs to be
   reordered in order for the GPOS processing to work right.

 - make outMPrefixups in indic_ot_reorder() a 
   MPreFixups ** as it is in the ICU code. Allow it to be
   NULL in the case where we don't need the result.
   (Your patch seems to leak the MPrefixups from the
   first call to indic_ot_reorder())

 - Remove indic_ot_adjust_mpres(); it didn't add anything
   over calling mprefixups_apply() directly, and made
   the memory management more confusing.

The patch seems to work for me in very limited testing;
I'd appreciate it if someone could give it a more thorough
runthrough.

Comment 19 Owen Taylor 2003-07-24 23:30:16 UTC

Created attachment 18585 [details] [review]
Cleaned up version of port patch

Comment 20 Owen Taylor 2003-07-25 16:12:44 UTC

I've committed my patch now, both to the stable pango-1.2
branch and to HEAD. Testing still much appreciated. I
made one small change beyond the above - if outMPreFixups 
is NULL when passed in, I don't bother creating it at
all, rather than creating it, storing information in it,
then freeing it.

Fri Jul 25 12:07:21 2003  Owen Taylor  <otaylor@redhat.com>
 
        * modules/indic/mprefixups.[ch] modules/indic/indic-ot.[ch]
        modules/indic/indic-{xft,ft2}.c: Port pre-base-mantra
        fixup code from ICU, as needed for Tamil and Malayalam.
        (Based on patch from Sivaraj Doddannan, #111166)