GNOME Bugzilla – Bug 117282
Pango can't render Arabic accents
Last modified: 2004-12-22 21:47:04 UTC
when I use an Arabic umlaut script, pango renders it false, instead of rendering it above the character, it renders the char, the umlaut with a gap and then the char, this actually looks like this : http://silverpen.de/Screenshot_modi.png it's a major bug as it's really annyoing for reading letters and for little children were these "umlauts" are used for EVERY char, the text becomes completely unreadable...
It's just something that's not been implemented (In English umlaut means specifically the two dots used in German)
Well I discussed it with Telsa and the concensus was to name it Umalut even if there are six types of it ;) I hope this will get implemented soon...
"Blocker: We should fix and push an update immediately. This will mostly be used for security fixes."
Notes about fixing this: In order to get simple (no GPOS) accents working, what needs to be done is: - Pango currently orders marks for RTL base characters visually after their base character. TrueType fonts are apparently designed with the opposite convention. (Not positive of this) - The advance width for mark glyphs needs to be forced to zero. Whether a glyph is a mark glyph or not can be determined by looking at the GDEF table. To get GPOS working, some more fixes are needed: - pango_ot_ruleset_shape() always passes false for the rtl parameter of TT_GPOS_Apply_String(), which is wrong. - The interpretation of the results of TT_GPOS_Apply_String are wrong for RTL segments. What I think is right for a change to Pango is to add a separate: void pango_ot_ruleset_position (PangoOTRuleset *ruleset, PangoGlyphString *glyphs, gboolean rtl, PangoOTPositionFlags flags); That: - Positions the glyphs according to their advance widths and the classification in the GDEF table - If there are any GPOS rules for the ruleset, applies them to the string - Flips the orders of the glyphs in the string into visual order if 'rtl' is set.
To prevent duplication of work, I'm attaching an unfinished patch that *mostly* gets things working for Arabic accents and arabic fonts with GPOS that I've been fooling around with over the last few weeks. What's in the patch: - pango_ot_ruleset_substitute() and pango_ot_ruleset_position() as described above. - Changes to the default positioning algorithm for Arabic to put marks logically after (to the left of) base glyphs and assign them a width of 0. - Changes to the Arabic shaper to use these functions and to apply many more features relevant to Arabic fonts - A fix to some joining classes relevant for Urdu in the Arabic shaper. (The data their should be comprehensively reviewed against Unicode-4.0) - A fix to the code for propagating character classes in ftxgdef.c - A partial fix for bug 118639. - A major change to how cursive connections are handled - the existing code couldn't handle multiple lookups doing a cursive connection - Some fixes to the computed GDEF tables to make them work better (if we don't know the class for a character, don't add it to the computed GDEF table - there is code to propagated the classes as we apply the GSUB features) What needs fixing in the patch: - Needs splitting up into separate pieces - I still don't think the mset features in many Arabic fonts are getting applied correctly; needs debugging. - Docs for the new API functions - It's not clear to me that the approach of simply assigning mark glyphs a zero width is good enough for positioning in the absence of a GPOS table - this is quite clear for marks positioned over final beh, teh, etc ... the mark should be centered over the middle of the bowl, but instead appears at the left edge of the character. Centering the mark on the base glyph is probably a better approach, but you only want to do that if the font doesn't have any GPOS tables.
Created attachment 19487 [details] [review] Unfinished patch for arabic accents and GPOS
Owen, - For positioning marks when there's no table, the Persian typography practice says center the marks unless they're at the end of a word, where you should put them at the left end of the letter. - On Unicode 4.0, you have forgot to handle three new letters. They are U+06EE, U+06EF, and U+06FF (the first two being right, the last being dual). I'll attach a patch to be applied after your patch.
Created attachment 19515 [details] [review] a patch to fix three new Unicode 4 letters (to be applied after patch 19487)
This looks very good, but I agree with Pournader that centering the marks is the right way also for Arabic.
As Roozbeh said, it's better to place diacritic marks in the middle of the base glyph box except Kasra (U+064D) and Kasratan (U+0650) that have to be at the end. I create a patch to center marks. But I think this patch doesn't have a good design because I didn't find appropriate routine. - Unfortunately, I did not find any routine to get the appropriate character code from glyphs id , so I add a member variable to PangoGlyphVisAttr, is_eh_or_an, to find if this glyph is stand for Kasra or Kasratan. I fill this attribute in module/arabic/arabic-fc.c file. - I center the marks (except en or eh) just for the base glyphs wider than 36000.
Created attachment 19578 [details] [review] Centering marks
*** Bug 69329 has been marked as a duplicate of this bug. ***
As these patches are present now, do we have to test them to get them into cvs, or do we have to wait till we get revisions?(or something like ...never patched apps or wrote once so I have ziru idea...)
There are still some issues that I have to fix up before committing this stuff. It's all blocking on me.
Owen, if time-needed-to-fix/your-time-frame-for-it < 2day/2month, then drop some notes here. We may be able to give a hand.
Yeah It would be wonderful to see this in 2.8 or 3.0, Arabic support needs so much work and I am sure we can tackle this together!
I'm going to try hard to get it in 2.4. I'm not sure there is much that can be done to help me with this ... it's basically just a question of sitting down and deciding exactly how I think some things should be done.
I've now committed a substantially reworked version of my patch that uses a PangoOTBuffer object to hold data from the GSUB stage that is needed in the GPOS stage. I've filed the accent positioning issue as bug 135753, and I'm going to close this bug. Testing would be very much appreciated. Sun Feb 29 10:54:55 2004 Owen Taylor <otaylor@redhat.com> * modules/arabic/arabic-ot.c (arabic): Add joining classes for new Unicode-4.0 characters U+06EE, U+06EF, U+06FF. (Patch by Roozbeh Pournader from #117282) Sun Feb 29 09:25:13 2004 Owen Taylor <otaylor@redhat.com> Rework opentype interfaces and other changes to make GPOS work for Arabic. (Most of #117282, #121060) * pango/opentype/otlbuffer.[ch]: OTL_Buffer that acts as a replacement for the separate GSUB and GPOS string structures and hides many of the internal details. * pango/opentype/ftxgsub.[ch] pango/opentype/ftxgpos.[ch]: Adapt to OTL_Buffer. * pango/opentype/ftxgpos.c: Redo handling of cursive chains so that it actually works. * pango/pango-ot.h pango/opentype/pango-ot-buffer.c: Pango wrapper around OTL_Buffer. * pango/pango-ot.h pango/pango-ot-ruleset.c pango/pango-ot-buffer.c: Split pango_ot_ruleset_shape() into pango_ot_ruleset_substitute(), pango_ot_ruleset_position(), make them act on PangoOTBuffer, add a separate pango_ot_buffer_output() which does the default positioning and writes to a PangoGlyphString. * modules/arabic/arabic-fc.c modules/indic/indic-fc.c modules/indic/mprefixups.[ch]: Adapt to new OpenType interfaces; add GPOS features for Arabic. * pango/opentype/pango-ot-info.c: Don't derive class information from Unicode properties for Arabic presentation forms, let the shaping process derive the properties.