GNOME Bugzilla – Bug 89449
Better handling of precomposed/combining forms in Hebrew shaper
Last modified: 2012-08-07 19:13:55 UTC
The hebrew characters 0xfb2a and 0xfb2a (shin and sin _with_ dots) are in Pango not displayed correctly. Using 0x05e9 + 0x05c1 (0x05c2 respectively) redners OK.
Ooops forget to bring screenshot with me, hopefully tomorrow...
These characters in the compatiblity area of Unicode, so use of them is discouraged. However, yeah, it probably would be nicer if the Hebrew shapers could handle them.
Created attachment 10193 [details] Wrongly rednered hebrew shin with dot character.
Well the problem is that in the european scholar tradition (as opposed to the israeli hebrew) the shin and sin are _two_ different letters. (So in a biblical hebrew dictionary you have two chapters for each letters, israeli's dictionaries have only one). Now if I want to handle (and analyse) biblical text, it's much more convenient to use two unicode chars for the two letters (i.e. 0xfb2a and 0xfb2b). As for the pango side, since there are more such "presentation forms", not just for hebrew, which can be simply divided into other unicode characters, wouldn't it make sence to have pango make aditional step outside of language modules? (Said that I don't understand that much how pango works even if I spend quite a while looking in the sources :-(((
Created attachment 10224 [details] [review] Patch to make Hebrew modules deal with presentation forms.
I just added a patch that I believe solves the problem. It is basically just a question of making the Hebrew modules deal with the presentation forms as well. Ok to commit?
I don't think the question of the "European Scholarly tradition" really matters here. The question isn't how the user thinks of the character, how it is inputed, or how it is edited, but simply how it is represented in the text. Yes, it would be good if the basic shapers could handle decomposition and composition, but that isn't really relevant here because we *do* have special shaping engines for Hebrew; primarily to handle combining mark placement. (e.g., vowels) I think the right approach to presentation forms in the input text is to decompose them; I'm not sure how just passing them through does any better than using the basic shaper for these characters. If it is desired to actually use precomposed glyphs in the font, then they should be used without regard to whether the input text was presentation forms or glyph+combining mark. (I haven't studied the Hebrew shapers in detail however, so maybe the patch above handles this...)
It certainly would make sense to have the shapers check if the font includes precomposed characters (presentation forms) and in that case use them. This is under the assumption that the font designer was able to make a smarter decision about dot placement than the shaper that only has access to boundry boxes. But this is actually not related to the problem that is shown in the above screen shot. In the screenshot the the two glyphs U+FB2B;HEBREW LETTER SHIN WITH SIN DOT and 05B4;HEBREW POINT HIRIQ where not combined because the hebrew shaper didn't receive the U+FB2B character. The simple patch will solve that problem. Whether to look for precomposed glyphs is a different problem. About what the Hebrew shapers do, it all boils down to using bounding boxes and guess work in order to place the vowel marks in esthetic places. Did this convince you to let me commit the patch? 8-)
I'm fine with the patch, I just don't think it really resolves the bug ... .if the precomposed input character form isn't rendered exactly_ the same way as the uncomposed input characters, then there is something to fix.
I just commited something similar to the patch above to CVS. Regarding the wish that presentation forms are rendered the same as accents joined by the shaper, that is of course that is what we would like to happen. But only the font designer has the knowledge of making that true in the general case. Whenever a precomposed glyph exists, we may use that. (Just like using á for a+' in iso-latin1).It would be easy to change the shaper to do that. But that only solves the problem for those combinations that have precomposed characters. There are lots of combination that don't have presentation forms that we have to deal with. The only real solution to that is to use a kerning table with delta-distances in x and y, and as far as I understand that is only possbile in OpenType fonts. Actually, for such fonts (with a proper kerning table) the Hebrew shaper becomes redundant.
We can certainly ensure that: presentation form in the input renders the same as decomposed form of presentation form in the input By decomposing as the first step. Any recomposition to use precomposed forms in the font would then treat both input sequences identically. What's harder is ensuring: input sequence with presentation form in font renders similarly to: input sequence without presentation form in font (though there is a trivial way of ensuring that ... never use presentation forms in the font.)
Retitling since the original Subject: should be fixed now.
There's a more generic problem here of trying to assure this equivalency, though a Hebrew-specific solution may be possible here.
This is mostly done in HarfBuzz now. Closing obsolete.