GNOME Bugzilla – Bug 153546
arabic shaping is not done when glyphs come from different fonts
Last modified: 2005-10-13 09:49:59 UTC
when contiguous letters are displayed with glyphs from different fonts, they are incorrectly shaped. tested in pango 1.2.5, I didn't looked at more recent CVS versions, but there is nothing in the Changelogs that say it has been fixed, so most likely the problem is still there. The internal reasons of the problem are also probably somewhat related to bug #83058 For example, the word ئۇيغۇرچە (Uyghurche: 064a, 065a, 06c7, 06a4, 063a, 06c7, 0631, 0686, 06d5) displayed in the attached image. The first rendering is done with pango, using "Sans" pseudo font, which takes its glyphs primarly from "Arial" font; the arabic letter U (06c7) is missing from Arial, and taken from "Code2000" font (the only font I have with that letter); note how the boundaries between 06c7 and the preceding letter (wrongly) become non-joigning boundaries. The second rendering is done with pango and "Code2000" font. the last letter (06d5) is missing from Code2000, so we see again the same unshaping between the two last letters. The rendering of the first letter is wrong too, and maybe it is a bug in pango (the hamza above (065a) should be ignored in shaping, and pango should look at the following letter (here 06c7) to see if the first one should connect or not). But note how this time the letter 06c7 is properly shaped as joining on its right; and how the letter 063a is properly joining on its left too, contrary to the first rendering. The third rendering was done with yudit editor, with a font selection similar to the "Sans" one used by pango; and that is how pango should render arabic script with glyphs from multiple fonts. Note how all glyphs are properly shaped. Yes, their joining is not perfect, as they come from different fonts where the joining line is at different heights; but it is much much better anyway than the pango rendering on line 1. That is how the first line should look like. The last rendering is done with yudit, with a font selection similar to the "Code2000" used by pango (that is, Code2000 for all glyphs but the last one); look how all letters are properly shaped; there is no problem because of the hamza above between first and third letter; and no shaping problem because of the last glyph coming from a different font; that is how the 2nd rendering should look like.
Created attachment 31872 [details] an image showing rendering of a same word with pango and yudit and two different fonts, to illustrate the problem
(too bad that bugzilla doesn't allow UTF-8 input... for pango and i18n problems it would help to be able to type non-ascii in the bug report)
Very hard to fix without major restructuring of Pango. (I've generally seen no problem with UTF-8 in bugzlla recently)
Owen, would you please reconsider this. You can make it low priority or delay it but it need to be fixed one day. Sans fonts would not be useful in such situations and they would lose the purpose they made for afterall.
To me it's a fairly serious bug or misconfiguration if the automatically selected for the Arabic script can't properly display your language. Throwing in a character from a different font is a poor substitute, even if it is properly shaped. It's vastly better to use the different font in the first place. If someone is working in Urdu, the Arabic script font they should get for Sans should be one with all the necessary Arabic characters for Urdu. So, I don't see it as a huge problem, and as I said, it's very hard to fix. I'd like to restrict open bugs to bugs, that either: - I'm planning to fix or that: - If someone asked, I could tell them how to go about fixing, even if it would be months of work. If a bug doesn't fit into either of those categories, I think keeping it open is just clutter.
I'm actually planning to work on this. The problem is that most of the text rendered using Pango is not tagged, so for example when I typically deal with Persian text, most of the time the font chosen is the best for rendering Arabic, not Persian... I see Mozilla doing this, and I know I appreciate it when browsing sites that (mis)use weird characters they should not be using in Persian (and so our Persian fonts do not have.) Anyway, can stay closed until a patch comes up.