After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 89449 - Better handling of precomposed/combining forms in Hebrew shaper
Better handling of precomposed/combining forms in Hebrew shaper
Status: RESOLVED OBSOLETE
Product: pango
Classification: Platform
Component: general
1.0.x
Other other
: Normal minor
: Medium feature
Assigned To: pango-maint
pango-maint
Depends on:
Blocks:
 
 
Reported: 2002-07-30 19:00 UTC by Petr Tomasek
Modified: 2012-08-07 19:13 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Wrongly rednered hebrew shin with dot character. (2.24 KB, image/png)
2002-08-01 20:12 UTC, Petr Tomasek
  Details
Patch to make Hebrew modules deal with presentation forms. (1.53 KB, patch)
2002-08-02 14:09 UTC, Dov Grobgeld
committed Details | Review

Description Petr Tomasek 2002-07-30 19:00:40 UTC
The hebrew characters 0xfb2a and 0xfb2a (shin and sin _with_ dots) are in
Pango not displayed correctly. Using 0x05e9 + 0x05c1 (0x05c2 respectively)
redners OK.
Comment 1 Petr Tomasek 2002-07-30 19:04:50 UTC
Ooops forget to bring screenshot with me, hopefully tomorrow...
Comment 2 Owen Taylor 2002-07-30 23:37:16 UTC
These characters in the compatiblity area of Unicode, so 
use of them is discouraged. However, yeah, it probably
would be nicer if the Hebrew shapers could handle them.

Comment 3 Petr Tomasek 2002-08-01 20:12:17 UTC
Created attachment 10193 [details]
Wrongly rednered hebrew shin with dot character.
Comment 4 Petr Tomasek 2002-08-01 20:32:02 UTC
Well the problem is that in the european scholar tradition (as opposed to
the israeli hebrew) the shin and sin are _two_ different letters. (So
in a biblical hebrew dictionary you have two chapters for each letters,
israeli's dictionaries have only one).
Now if I want to handle (and analyse) biblical text, it's much more
convenient to use two unicode chars for the two letters (i.e. 0xfb2a
and 0xfb2b).

As for the pango side, since there are more such "presentation forms",
not just for hebrew, which can be simply divided into other unicode
characters, wouldn't it make sence to have pango make aditional
step outside of language modules? (Said that I don't understand that
much how pango works even if I spend quite a while looking in the
sources :-(((

Comment 5 Dov Grobgeld 2002-08-02 14:09:53 UTC
Created attachment 10224 [details] [review]
Patch to make Hebrew modules deal with presentation forms.
Comment 6 Dov Grobgeld 2002-08-02 14:12:14 UTC
I just added a patch that I believe solves the problem. It is
basically just a question of making the Hebrew modules deal with the
presentation forms as well. Ok to commit?
Comment 7 Owen Taylor 2002-08-02 15:41:16 UTC
I don't think the question of the "European Scholarly tradition"
really matters here. The question isn't how the user thinks
of the character, how it is inputed, or how it is edited,
but simply how it is represented in the text.

Yes, it would be good if the basic shapers could handle 
decomposition and composition, but that isn't really relevant
here because we *do* have special shaping engines for Hebrew;
primarily to handle combining mark placement. (e.g., vowels)

I think the right approach to presentation forms in the
input text is to decompose them; I'm not sure how just passing
them through does any better than using the basic shaper
for these characters. If it is desired to actually use precomposed
glyphs in the font, then they should be used without regard
to whether the input text was presentation forms or 
glyph+combining mark.

(I haven't studied the Hebrew shapers in detail however, so
maybe the patch above handles this...)
Comment 8 Dov Grobgeld 2002-08-03 20:51:07 UTC
It certainly would make sense to have the shapers check if the font
includes precomposed characters (presentation forms) and in that case
use them. This is under the assumption that the font designer was able
to make a smarter decision about dot placement than the shaper that
only has access to boundry boxes. But this is actually not related to
the problem that is shown in the above screen shot. In the screenshot
the the two glyphs U+FB2B;HEBREW LETTER SHIN WITH SIN DOT and
05B4;HEBREW POINT HIRIQ where not combined because the hebrew shaper
didn't receive the U+FB2B character. The simple patch will solve that
problem. Whether to look for precomposed glyphs is a different problem.

About what the Hebrew shapers do, it all boils down to using bounding
boxes and guess work in order to place the vowel marks in esthetic places.

Did this convince you to let me commit the patch? 8-)
Comment 9 Owen Taylor 2002-08-03 21:04:04 UTC
I'm fine with the patch, I just don't think it really resolves
the bug ... .if the precomposed input character form isn't rendered 
exactly_ the same way as the uncomposed input characters, then there 
is something to fix. 
Comment 10 Dov Grobgeld 2002-08-05 20:28:34 UTC
I just commited something similar to the patch above to CVS. 

Regarding the wish that presentation forms are rendered the same as
accents joined by the shaper, that is of course that is what we would
like to happen. But only the font designer has the knowledge of making
that true in the general case. Whenever a precomposed glyph exists, we
may use that. (Just like using á for a+' in iso-latin1).It would be
easy to change the shaper to do that. But that only solves the problem
for those combinations that have precomposed characters. There are
lots of combination that don't have presentation forms that we have to
deal with. 

The only real solution to that is to use a kerning table with
delta-distances in x and y, and as far as I understand that is only
possbile in OpenType fonts. Actually, for such fonts (with a proper
kerning table) the Hebrew shaper becomes redundant.
Comment 11 Owen Taylor 2002-08-06 02:34:26 UTC
We can certainly ensure that:

 presentation form in the input
 
renders the same as

 decomposed form of presentation form in the input

By decomposing as the first step. Any recomposition
to use precomposed forms in the font would then treat
both input sequences identically.

What's harder is ensuring:

 input sequence with presentation form in font

renders similarly to:

 input sequence without presentation form in font

(though there is a trivial way of ensuring that ... never
use presentation forms in the font.)
Comment 12 Owen Taylor 2002-11-26 00:23:56 UTC
Retitling since the original Subject: should be fixed now.
Comment 13 Owen Taylor 2004-02-19 15:17:14 UTC
There's a more generic problem here of trying to assure 
this equivalency, though a Hebrew-specific solution may
be possible here.
Comment 14 Behdad Esfahbod 2012-08-07 19:13:55 UTC
This is mostly done in HarfBuzz now.  Closing obsolete.