Bug 117282 – Pango can't render Arabic accents

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 117282 - Pango can't render Arabic accents


Summary:	Pango can't render Arabic accents


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	general
Version:	1.2.x
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	1.4 API freeze
Assigned To:	pango-maint
QA Contact:	pango-maint

URL:
Whiteboard:

Duplicates:	69329 (view as bug list)
Depends on:
Blocks:	121060

Reported:	2003-07-12 20:16 UTC by Arafat Medini
Modified:	2004-12-22 21:47 UTC

See Also:
GNOME target:	---
GNOME version:	Unversioned Enhancement

Attachments
Unfinished patch for arabic accents and GPOS (25.14 KB, patch) 2003-08-25 13:41 UTC, Owen Taylor	none	Details \| Review
a patch to fix three new Unicode 4 letters (to be applied after patch 19487) (575 bytes, patch) 2003-08-26 10:17 UTC, Roozbeh Pournader	none	Details \| Review
Centering marks (2.51 KB, patch) 2003-08-28 10:37 UTC, Soheil Hassas Yeganeh	none	Details \| Review

Description Arafat Medini 2003-07-12 20:16:36 UTC

when I use an Arabic umlaut script, pango renders it false, instead of
rendering it above the character, it renders the char, the umlaut with a
gap and then the char, this actually looks like this :

http://silverpen.de/Screenshot_modi.png

it's a major bug as it's really annyoing for reading letters and for little
children were these "umlauts" are used for EVERY char, the text becomes
completely unreadable...

Comment 1 Owen Taylor 2003-07-12 21:34:51 UTC

It's just something that's not been implemented

(In English umlaut means specifically the two dots used in German)

Comment 2 Arafat Medini 2003-07-13 07:56:09 UTC

Well I discussed it with Telsa and the concensus was to name it Umalut
even if there are six types of it ;)
I hope this will get implemented soon...

Comment 3 Owen Taylor 2003-07-15 11:47:42 UTC

"Blocker: We should fix and push an update immediately. This will
mostly be used for security fixes."

Comment 4 Owen Taylor 2003-07-29 18:49:11 UTC

Notes about fixing this:

In order to get simple (no GPOS) accents working, what needs
to be done is:

 - Pango currently orders marks for RTL base characters
   visually after their base character.

   TrueType fonts are apparently designed with the opposite
   convention. (Not positive of this)

 - The advance width for mark glyphs needs to be forced
   to zero. Whether a glyph is a mark glyph or not can
   be determined by looking at the GDEF table.

To get GPOS working, some more fixes are needed:

 - pango_ot_ruleset_shape() always passes false for the
   rtl parameter of TT_GPOS_Apply_String(), which is
   wrong. 

 - The interpretation of the results of 
   TT_GPOS_Apply_String are wrong for RTL segments.

What I think is right for a change to Pango is to
add a separate:

 void pango_ot_ruleset_position (PangoOTRuleset      *ruleset,
                                 PangoGlyphString    *glyphs,
                                 gboolean             rtl,
                                 PangoOTPositionFlags flags);

That:

 - Positions the glyphs according to their advance widths
   and the classification in the GDEF table

 - If there are any GPOS rules for the ruleset, applies
   them to the string

 - Flips the orders of the glyphs in the string into visual
   order if 'rtl' is set.

Comment 5 Owen Taylor 2003-08-25 13:41:16 UTC

To prevent duplication of work, I'm attaching an unfinished
patch that *mostly* gets things working for Arabic accents
and arabic fonts with GPOS that I've been fooling around
with over the last few weeks.

What's in the patch:

 - pango_ot_ruleset_substitute() and pango_ot_ruleset_position()
   as described above.

 - Changes to the default positioning algorithm for Arabic
   to put marks logically after (to the left of) base glyphs
   and assign them a width of 0.

 - Changes to the Arabic shaper to use these functions and
   to apply many more features relevant to Arabic fonts

 - A fix to some joining classes relevant for Urdu in
   the Arabic shaper. (The data their should be comprehensively
   reviewed against Unicode-4.0)

 - A fix to the code for propagating character classes 
   in ftxgdef.c

 - A partial fix for bug 118639.

 - A major change to how cursive connections are handled - 
   the existing code couldn't handle multiple lookups
   doing a cursive connection
 
 - Some fixes to the computed GDEF tables to make them
   work better (if we don't know the class for a character,
   don't add it to the computed GDEF table - there is
   code to propagated the classes as we apply the GSUB
   features)

What needs fixing in the patch:

 - Needs splitting up into separate pieces

 - I still don't think the mset features in many Arabic
   fonts are getting applied correctly; needs debugging.

 - Docs for the new API functions

 - It's not clear to me that the approach of simply
   assigning mark glyphs a zero width is good enough
   for positioning in the absence of a GPOS table - 
   this is quite clear for marks positioned over 
   final beh, teh, etc ... the mark should be centered
   over the middle of the bowl, but instead appears
   at the left edge of the character. Centering the
   mark on the base glyph is probably a better approach,
   but you only want to do that if the font doesn't
   have any GPOS tables.

Comment 6 Owen Taylor 2003-08-25 13:41:49 UTC

Created attachment 19487 [details] [review]
Unfinished patch for arabic accents and GPOS

Comment 7 Roozbeh Pournader 2003-08-26 10:12:46 UTC

Owen,

- For positioning marks when there's no table, the Persian typography
practice says center the marks unless they're at the end of a word,
where you should put them at the left end of the letter.

- On Unicode 4.0, you have forgot to handle three new letters. They
are U+06EE, U+06EF, and U+06FF (the first two being right, the last
being dual). I'll attach a patch to be applied after your patch.

Comment 8 Roozbeh Pournader 2003-08-26 10:17:14 UTC

Created attachment 19515 [details] [review]
a patch to fix three new Unicode 4 letters (to be applied after patch 19487)

Comment 9 Arafat Medini 2003-08-26 20:30:39 UTC

This looks very good, but I agree with Pournader that centering the
marks is the right way also for Arabic.

Comment 10 Soheil Hassas Yeganeh 2003-08-28 10:35:14 UTC

As Roozbeh said, it's better to place diacritic marks in the middle of
the base glyph box except Kasra (U+064D) and Kasratan (U+0650) that
have to be at the end.

I create a patch to center marks. But I think this patch doesn't have
a good design because I didn't find appropriate routine.

- Unfortunately, I did not find any routine to get the appropriate
character code from glyphs id , so I add a member variable to 
PangoGlyphVisAttr, is_eh_or_an, to find if this glyph is stand for
Kasra or Kasratan. I fill this attribute in module/arabic/arabic-fc.c
file.

- I center the marks (except en or eh) just for the base glyphs wider
than 36000.

Comment 11 Soheil Hassas Yeganeh 2003-08-28 10:37:15 UTC

Created attachment 19578 [details] [review]
Centering marks

Comment 12 Owen Taylor 2003-11-17 21:37:26 UTC

*** Bug 69329 has been marked as a duplicate of this bug. ***

Comment 13 Arafat Medini 2003-12-30 14:07:40 UTC

As these patches are present now, do we have to test them to get them
into cvs, or do we have to wait till we get revisions?(or something
like ...never patched apps or wrote once so I have ziru idea...)

Comment 14 Owen Taylor 2004-01-05 16:49:49 UTC

There are still some issues that I have to fix up before committing
this stuff. It's all blocking on me.

Comment 15 Behdad Esfahbod 2004-01-05 17:45:24 UTC

Owen, if time-needed-to-fix/your-time-frame-for-it < 2day/2month, then
drop some notes here.  We may be able to give a hand.

Comment 16 Arafat Medini 2004-01-05 20:31:21 UTC

Yeah It would be wonderful to see this in 2.8 or 3.0, Arabic support
needs so much work and I am sure we can tackle this together!

Comment 17 Owen Taylor 2004-01-05 20:39:13 UTC

I'm going to try hard to get it in 2.4. I'm not sure there is
much that can be done to help me with this ... it's basically
just a question of sitting down and deciding exactly how I 
think some things should be done.

Comment 18 Owen Taylor 2004-02-29 16:01:19 UTC

I've now committed a substantially reworked version of my patch
that uses a PangoOTBuffer object to hold data from the
GSUB stage that is needed in the GPOS stage.

I've filed the accent positioning issue as bug 135753,
and I'm going to close this bug. Testing would be very much
appreciated.

Sun Feb 29 10:54:55 2004  Owen Taylor  <otaylor@redhat.com>
 
        * modules/arabic/arabic-ot.c (arabic): Add joining
        classes for new Unicode-4.0 characters U+06EE, U+06EF, U+06FF.
        (Patch by Roozbeh Pournader from #117282)
 
Sun Feb 29 09:25:13 2004  Owen Taylor  <otaylor@redhat.com>
 
        Rework opentype interfaces and other changes to make GPOS
        work for Arabic. (Most of #117282, #121060)
 
        * pango/opentype/otlbuffer.[ch]: OTL_Buffer that
        acts as a replacement for the separate GSUB and
        GPOS string structures and hides many of the internal
        details.
 
        * pango/opentype/ftxgsub.[ch] pango/opentype/ftxgpos.[ch]:
        Adapt to OTL_Buffer.
 
        * pango/opentype/ftxgpos.c: Redo handling of cursive
        chains so that it actually works.
 
        * pango/pango-ot.h pango/opentype/pango-ot-buffer.c:
        Pango wrapper around OTL_Buffer.
 
        * pango/pango-ot.h pango/pango-ot-ruleset.c
pango/pango-ot-buffer.c:
        Split pango_ot_ruleset_shape() into pango_ot_ruleset_substitute(),
        pango_ot_ruleset_position(), make them act on
        PangoOTBuffer, add a separate pango_ot_buffer_output()
        which does the default positioning and writes to a
        PangoGlyphString.
 
        * modules/arabic/arabic-fc.c modules/indic/indic-fc.c
        modules/indic/mprefixups.[ch]: Adapt to new OpenType
        interfaces; add GPOS features for Arabic.
 
        * pango/opentype/pango-ot-info.c: Don't derive class information
        from Unicode properties for Arabic presentation forms,
        let the shaping process derive the properties.