GNOME Bugzilla – Bug 117282
Pango can't render Arabic accents
Last modified: 2004-12-22 21:47:04 UTC
when I use an Arabic umlaut script, pango renders it false, instead of
rendering it above the character, it renders the char, the umlaut with a
gap and then the char, this actually looks like this :
it's a major bug as it's really annyoing for reading letters and for little
children were these "umlauts" are used for EVERY char, the text becomes
It's just something that's not been implemented
(In English umlaut means specifically the two dots used in German)
Well I discussed it with Telsa and the concensus was to name it Umalut
even if there are six types of it ;)
I hope this will get implemented soon...
"Blocker: We should fix and push an update immediately. This will
mostly be used for security fixes."
Notes about fixing this:
In order to get simple (no GPOS) accents working, what needs
to be done is:
- Pango currently orders marks for RTL base characters
visually after their base character.
TrueType fonts are apparently designed with the opposite
convention. (Not positive of this)
- The advance width for mark glyphs needs to be forced
to zero. Whether a glyph is a mark glyph or not can
be determined by looking at the GDEF table.
To get GPOS working, some more fixes are needed:
- pango_ot_ruleset_shape() always passes false for the
rtl parameter of TT_GPOS_Apply_String(), which is
- The interpretation of the results of
TT_GPOS_Apply_String are wrong for RTL segments.
What I think is right for a change to Pango is to
add a separate:
void pango_ot_ruleset_position (PangoOTRuleset *ruleset,
- Positions the glyphs according to their advance widths
and the classification in the GDEF table
- If there are any GPOS rules for the ruleset, applies
them to the string
- Flips the orders of the glyphs in the string into visual
order if 'rtl' is set.
To prevent duplication of work, I'm attaching an unfinished
patch that *mostly* gets things working for Arabic accents
and arabic fonts with GPOS that I've been fooling around
with over the last few weeks.
What's in the patch:
- pango_ot_ruleset_substitute() and pango_ot_ruleset_position()
as described above.
- Changes to the default positioning algorithm for Arabic
to put marks logically after (to the left of) base glyphs
and assign them a width of 0.
- Changes to the Arabic shaper to use these functions and
to apply many more features relevant to Arabic fonts
- A fix to some joining classes relevant for Urdu in
the Arabic shaper. (The data their should be comprehensively
reviewed against Unicode-4.0)
- A fix to the code for propagating character classes
- A partial fix for bug 118639.
- A major change to how cursive connections are handled -
the existing code couldn't handle multiple lookups
doing a cursive connection
- Some fixes to the computed GDEF tables to make them
work better (if we don't know the class for a character,
don't add it to the computed GDEF table - there is
code to propagated the classes as we apply the GSUB
What needs fixing in the patch:
- Needs splitting up into separate pieces
- I still don't think the mset features in many Arabic
fonts are getting applied correctly; needs debugging.
- Docs for the new API functions
- It's not clear to me that the approach of simply
assigning mark glyphs a zero width is good enough
for positioning in the absence of a GPOS table -
this is quite clear for marks positioned over
final beh, teh, etc ... the mark should be centered
over the middle of the bowl, but instead appears
at the left edge of the character. Centering the
mark on the base glyph is probably a better approach,
but you only want to do that if the font doesn't
have any GPOS tables.
Created attachment 19487 [details] [review]
Unfinished patch for arabic accents and GPOS
- For positioning marks when there's no table, the Persian typography
practice says center the marks unless they're at the end of a word,
where you should put them at the left end of the letter.
- On Unicode 4.0, you have forgot to handle three new letters. They
are U+06EE, U+06EF, and U+06FF (the first two being right, the last
being dual). I'll attach a patch to be applied after your patch.
Created attachment 19515 [details] [review]
a patch to fix three new Unicode 4 letters (to be applied after patch 19487)
This looks very good, but I agree with Pournader that centering the
marks is the right way also for Arabic.
As Roozbeh said, it's better to place diacritic marks in the middle of
the base glyph box except Kasra (U+064D) and Kasratan (U+0650) that
have to be at the end.
I create a patch to center marks. But I think this patch doesn't have
a good design because I didn't find appropriate routine.
- Unfortunately, I did not find any routine to get the appropriate
character code from glyphs id , so I add a member variable to
PangoGlyphVisAttr, is_eh_or_an, to find if this glyph is stand for
Kasra or Kasratan. I fill this attribute in module/arabic/arabic-fc.c
- I center the marks (except en or eh) just for the base glyphs wider
Created attachment 19578 [details] [review]
*** Bug 69329 has been marked as a duplicate of this bug. ***
As these patches are present now, do we have to test them to get them
into cvs, or do we have to wait till we get revisions?(or something
like ...never patched apps or wrote once so I have ziru idea...)
There are still some issues that I have to fix up before committing
this stuff. It's all blocking on me.
Owen, if time-needed-to-fix/your-time-frame-for-it < 2day/2month, then
drop some notes here. We may be able to give a hand.
Yeah It would be wonderful to see this in 2.8 or 3.0, Arabic support
needs so much work and I am sure we can tackle this together!
I'm going to try hard to get it in 2.4. I'm not sure there is
much that can be done to help me with this ... it's basically
just a question of sitting down and deciding exactly how I
think some things should be done.
I've now committed a substantially reworked version of my patch
that uses a PangoOTBuffer object to hold data from the
GSUB stage that is needed in the GPOS stage.
I've filed the accent positioning issue as bug 135753,
and I'm going to close this bug. Testing would be very much
Sun Feb 29 10:54:55 2004 Owen Taylor <email@example.com>
* modules/arabic/arabic-ot.c (arabic): Add joining
classes for new Unicode-4.0 characters U+06EE, U+06EF, U+06FF.
(Patch by Roozbeh Pournader from #117282)
Sun Feb 29 09:25:13 2004 Owen Taylor <firstname.lastname@example.org>
Rework opentype interfaces and other changes to make GPOS
work for Arabic. (Most of #117282, #121060)
* pango/opentype/otlbuffer.[ch]: OTL_Buffer that
acts as a replacement for the separate GSUB and
GPOS string structures and hides many of the internal
* pango/opentype/ftxgsub.[ch] pango/opentype/ftxgpos.[ch]:
Adapt to OTL_Buffer.
* pango/opentype/ftxgpos.c: Redo handling of cursive
chains so that it actually works.
* pango/pango-ot.h pango/opentype/pango-ot-buffer.c:
Pango wrapper around OTL_Buffer.
* pango/pango-ot.h pango/pango-ot-ruleset.c
Split pango_ot_ruleset_shape() into pango_ot_ruleset_substitute(),
pango_ot_ruleset_position(), make them act on
PangoOTBuffer, add a separate pango_ot_buffer_output()
which does the default positioning and writes to a
* modules/arabic/arabic-fc.c modules/indic/indic-fc.c
modules/indic/mprefixups.[ch]: Adapt to new OpenType
interfaces; add GPOS features for Arabic.
* pango/opentype/pango-ot-info.c: Don't derive class information
from Unicode properties for Arabic presentation forms,
let the shaping process derive the properties.