After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 101079 - opentype font suppor for diacritics for Latin/Greek/Cyrillic letters
opentype font suppor for diacritics for Latin/Greek/Cyrillic letters
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
1.1.x
Other Linux
: Normal normal
: Medium feature
Assigned To: pango-maint
pango-maint
Depends on:
Blocks:
 
 
Reported: 2002-12-13 00:41 UTC by Jungshik Shin
Modified: 2006-03-07 05:54 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
first attempt (5.59 KB, patch)
2003-09-20 23:36 UTC, Noah Levitt
needs-work Details | Review
sample output (23.63 KB, image/png)
2003-09-20 23:38 UTC, Noah Levitt
  Details
screenshot demonstrating the bug (22.86 KB, image/png)
2004-10-24 07:44 UTC, greg_aumann
  Details
UTF-8 encoded IPA test data (272 bytes, text/plain)
2004-10-24 07:45 UTC, greg_aumann
  Details
expect output from opentype in notepad (9.48 KB, image/png)
2004-10-24 07:47 UTC, greg_aumann
  Details
Ideal output (requires graphite rendering) (12.87 KB, image/png)
2004-10-24 07:48 UTC, greg_aumann
  Details
patch that fix mark positioning (2.43 KB, patch)
2005-09-26 17:29 UTC, Denis Jacquerye
none Details | Review
Code2000 mark (2.19 KB, image/png)
2005-09-26 17:33 UTC, Denis Jacquerye
  Details
Doulos SIL mark (1.97 KB, image/png)
2005-09-26 17:34 UTC, Denis Jacquerye
  Details
Code2000 mark and mkmk (4.08 KB, image/png)
2005-09-26 17:36 UTC, Denis Jacquerye
  Details
Gedit with Junicode and the utf8 file (17.99 KB, image/png)
2005-10-25 22:50 UTC, Denis Jacquerye
  Details
cleaned up patch for OpenType features for Latin and other basic Scripts (8.25 KB, patch)
2005-10-27 19:42 UTC, Denis Jacquerye
none Details | Review
log from ftglue.c (157.07 KB, text/plain)
2005-11-08 11:18 UTC, greg_aumann
  Details
screenshot of qt's rendering of Doulos SIL (43.55 KB, image/png)
2005-11-08 11:44 UTC, greg_aumann
  Details
screenshot of qt placing Gentium diacritics (34.12 KB, image/png)
2005-11-08 14:34 UTC, Denis Jacquerye
  Details
backtrace of failure to load GPOS table in Doulos SIL (4.88 KB, text/plain)
2005-11-14 23:14 UTC, greg_aumann
  Details
screenshot of Charis SIL 4.0.2 (50.02 KB, image/png)
2005-11-23 02:26 UTC, greg_aumann
  Details
screenshot of Doulos SIL 4.0.10 (50.47 KB, image/png)
2005-11-23 02:27 UTC, greg_aumann
  Details
Committed patch. (13.00 KB, patch)
2005-11-23 11:55 UTC, Behdad Esfahbod
none Details | Review
Comparing bluefish with yudit under pango-1.10.4 (12.48 KB, image/png)
2006-03-07 05:01 UTC, Hydonsingore Cia
  Details

Description Jungshik Shin 2002-12-13 00:41:43 UTC
Combining diacritical marks for Latin/Greek/Cyrillic letters
are not supported. Code2000 fonts have opentype tables
for some (if not all) of them and Yudit 2.6 can render
them correctly with Code2000 font. However, Pango doesn't seem
to be able to. A smaple text is available at 
<http://www.columbia.edu/kermit/st-erkenwald.html>.

The text has sequences like <U+0068, U+0305> and <U+0069, U+0304>.
U+0304 and U+0305 have to be rendered above base characters
<U+0068> and <U+0069>, but they're rendered to the left of them,
instead.
Comment 1 Jungshik Shin 2002-12-13 02:07:07 UTC
It seems like basic-xft does take care of diacritic combining
marks with simple overstriking and some heuristics. 
Unfortunately, it doesn't work well for the sample
text I tried with Code2000 font. I'm  changing the summary line
because diacritic combining marks are supported but 
opentype tables are not made use of when present in a font.
Comment 2 Jungshik Shin 2003-02-20 18:07:30 UTC
I found a family of fonts with opentype tables for virtually all
Latin/Greek/Cyrillic diacritical
combining marks at

http://www.sil.org/~gaultney/gentium/index.html

When implementing opentype support for Latin/Greek/Cyrillic, these
fonts would
be of great help.


Comment 3 Noah Levitt 2003-09-20 00:31:24 UTC
I'm confused about this Gentium font. pango/pango/opentype/ottest
doesn't list any opentype tables for it.

$ ./ottest /home/nlevitt/.fonts/Gentium\ Release\ 1/GenR1.ttf 
----> GSUB <----
TT_Load_GSUB_Table 8e
----> GPOS <----
TT_Load_GPOS_Table 8e
Comment 4 Jungshik Shin 2003-09-20 00:46:16 UTC
Sorry for the confusion. I forgot to clarify.

Until a week ago, I thought it's an opentype font but it turned out
NOT. When I mentioned it as an opentype font, the download link to the
font didn't work and I somehow assumed that it's an opentype font.(I
returned to the site several times, but it didn't work). A week ago
when I finally downloaded it and read README file, I realized that
it's a dumb truetype font.
 
Code2000 font by James Kass may have some support of diacritics for
Latin/Greek/Cyrillic, but I'm not sure. 
 
Comment 5 Jungshik Shin 2003-09-20 00:54:34 UTC
oops. I forgot what I had written earlier. Code2000 font does have OT
tables for some diacritic marks for Latin/Greek/Cyrillic.
 
Comment 6 Noah Levitt 2003-09-20 23:36:58 UTC
Created attachment 20146 [details] [review]
first attempt
Comment 7 Noah Levitt 2003-09-20 23:38:01 UTC
Created attachment 20147 [details]
sample output
Comment 8 Noah Levitt 2003-09-20 23:49:13 UTC
The font in the sample image is code2000. I don't know if the
rendering is correct or not (in the sense that the opentype rules
are applied correctly). We really need a font with some sample
strings and correct renderings to check against.

This patch is just sort of a proof of concept. Stuff still needs
to be worked out. For starters, there's there is opentype kerning
and (I guess) "regular" kerning. This patch skips the regular
kerning if the font has opentype kerning. But I'm not sure that's
the right thing to do if the font has kerning for one or more but
not all the scripts (latn, cyrl, grek, armn, geor, runr, ogam).
Another question is whether and which discretionary ligatures
should be on by default. 
Comment 9 Noah Levitt 2003-09-21 00:33:45 UTC
Oh, there is the erkenwald link in the first comment. I won't flood
you with more attachments. Suffice it to say that without the patch it
renders wrong, and with the patch it renders more than half right.
Comment 10 Jungshik Shin 2003-09-21 10:59:53 UTC
> This patch skips the regular kerning if the font has 
> opentype kerning.

 I'm not sure either, but it's  likely that you should not skip
'regular' kerning. Have you tried the other way around (i.e. turn off
OT kerning and leave alone 'regular' kerning)? 


  BTW, basic shaper has a 'best-effort guessing' code for combining
characters and your patch invokes the OT shaping function after that.
You might have to block it 'selectively' (??). To determine whether or
not to block it, we may have to  go  really 'deep' into OT
'internals'. Alternatively, we may  block it  per script (or unicode
block)  

 
Comment 11 Noah Levitt 2003-09-21 15:45:48 UTC
>   BTW, basic shaper has a 'best-effort guessing' code for combining
> characters and your patch invokes the OT shaping function after that.
> You might have to block it 'selectively' (??). To determine whether or
> not to block it, we may have to  go  really 'deep' into OT
> 'internals'. Alternatively, we may  block it  per script (or unicode
> block)

That's a good point. Fortuntely, I did a tiny bit of testing, and it
appears that the opentype rules override the heuristics. I tried my
sample file and the only part that changed was the Greek part
(Code2000 has no tables for Greek). Tests without Greek, like the
Erkenwald one, turned out identical. I don't know if we just got lucky
or what...
Comment 12 Owen Taylor 2004-02-19 15:21:28 UTC
Doesn't sound like this patch is really ready to go in for 1.4.
Comment 13 greg_aumann 2004-10-24 07:41:19 UTC
Pango's rendering of IPA using the Doulos SIL font also has this problem. I have
attached several files that demonstrate the problem and what it should look like. 

The Doulos SIL font is freeware and can be downloaded from
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=DoulosSILfont
I am using version 4.0.4 for these screenshots but the only difference between
4.0.10 and 4.0.4 is that 4.0.10 has a couple of supplementary plane characters.

Particular attention has been paid to the opentype and graphite tables in this
font so that it renders well on Windows with recent versions of uniscribe. Thus
it is a good test for the opentype rendering in pango.

ipa_test.txt - UTF-8 encoded IPA test file

gedit-2.6.2_pango-1.6.0.png - screenshot of gedit 2.6.2 and pango 1.6.0 showing
the test data. Diacritics are about the width of an average glyph too far too
the left. Also when there are multiple diacritics there is no vertical
seperation and so they just go on top of each other in a confused mess (line 4).
Also ligatures (line 3) are not working. 

notepad.txt - same test file (but converted to DOS linebreaks in Notepad.exe
with a recent version of Uniscribe. This is what gedit and pango should be able
to output using just the open type tables in the font.

worldpad.png - This is the same data rendered using the graphite tables in the
font. What it would ideally look like. Using Worldpad 2.0.2004.6259 on windows 2000.
Comment 14 greg_aumann 2004-10-24 07:44:26 UTC
Created attachment 32983 [details]
screenshot demonstrating the bug
Comment 15 greg_aumann 2004-10-24 07:45:25 UTC
Created attachment 32984 [details]
UTF-8 encoded IPA test data
Comment 16 greg_aumann 2004-10-24 07:47:46 UTC
Created attachment 32985 [details]
expect output from opentype in notepad
Comment 17 greg_aumann 2004-10-24 07:48:41 UTC
Created attachment 32986 [details]
Ideal output (requires graphite rendering)
Comment 18 Denis Jacquerye 2005-06-03 21:34:08 UTC
It seems the first attempt patch looks for the 'kern' tag.

Doulos SIL seems to be using the 'mark' tag for base+mark glyphs and the 'mkmk'
tag for *+mark+mark glyphs.
I don't know how the code should be changed for pango to let the font handle
anchors.

This issue is quite serious, since some languages use diacritics on base
characters that Unicode doesn't have like 'à'. Unicode leaves the issue for the
fonts to handle. Maybe the severity should be more than "normal".
Comment 19 Behdad Esfahbod 2005-09-26 13:28:46 UTC
Noah, any news on this?  Handling mark and mkmk should be easy these days.  Do
you have time to finish this?
Comment 20 Denis Jacquerye 2005-09-26 17:29:05 UTC
Created attachment 52690 [details] [review]
patch that fix mark positioning

This patch fixes the mark positioning. Followed by three sample output.
Code2000 is printed properly for marks, but mkmk don't work. Doulos SIL shoul
work but mark isn't working. Here is the error when using pangoft2topgm
--font="Doulos SIL" on a file with diacritics.
Comment 21 Denis Jacquerye 2005-09-26 17:33:02 UTC
Created attachment 52691 [details]
Code2000 mark

Code2000 mark, diacritics on latin extended letters
Comment 22 Denis Jacquerye 2005-09-26 17:34:58 UTC
Created attachment 52692 [details]
Doulos SIL mark

Doulos SIL mark, diacritics with latin-extended letters.
This will return an error:

(process:11612): Pango-WARNING **: Error loading GPOS table 4096
Comment 23 Denis Jacquerye 2005-09-26 17:36:23 UTC
Created attachment 52693 [details]
Code2000 mark and mkmk

Code2000 mark and mkmk, the IPA sample. As you can see mkmk are handled.
Comment 24 Denis Jacquerye 2005-09-26 17:37:33 UTC
sorry about the spam.

Comment #23 "As you can see mkmk are handled."
I meant they _aren't_ handled.
Comment 25 Denis Jacquerye 2005-10-08 13:08:06 UTC
Thanks to behdad for his help.

Doulos SIL seems to have a bug, ttx isn't able to convert it to xml due to a
GPOS error.
Behdad mentioned we could work around the OT specs to have this work.
Code2000 has a lack of ligatures for accented Is, it does have definition for
diacritics placement.

I'm currently working on a few fonts.
Junicode 6.5 beta has diacritics placement definitions for vowels, the mark and
liga seems to work properly, test for yourselves : 
http://home.sus.mcgill.ca/~moyogo/lingala/fonts/Junicode-20051005-patch.zip
It contains Regular, Italic, Bold and Bold Italic.
The first three have the mark diacritics placed correclty. mkmk seems to be an
issue, test with attachment (id=32984), but the following attachments is clearer
for testing Junicode.

Pango does not place the diacritics correctly with Junicode Bold Italic, nor
does it give any error, and yet the font has the definitions. Ttx does not have
any problem converting Junicode Bold Italic to xml.

I'm also working on another font. Similar problem as Junicode Bold Italic, the
marks are set, but pango does not use them and does not print any errors. Yet
Ttx finds a bug in GPOS.
test it : http://home.sus.mcgill.ca/~moyogo/lingala/fonts/Ubuntu-Title.otf

Ligatures and diacritics are very important, not only in non Latin-scripts. Why
aren't they always used? Or at least why aren't they used for Latin-scripts and
private Unicode blocks?
Comment 26 Denis Jacquerye 2005-10-10 04:11:25 UTC
Actually, Code2000 and Pango not rendering i+dieresis (U+0069 + U+0308) as ï
(U+00EF) is not a bug in Code2000 but in Pango.
Pango should know from Unicode that they are canonically equivalent and
therefore use the precomposed glyph if available in that font.

Should I open another bug or leave that for here?
Comment 27 Denis Jacquerye 2005-10-10 08:57:12 UTC
I'm currently adding anchors for mark and base glyphs to some DejaVu Fonts.
Pango has a really strange behaviour. For DejaVu Mono without anchors it will
give a 4097 error, yet ttx translate the font to xml without any warning or error. 
I've only managed to get DejaVu Sans and Sans Bold to have the mark working. All
my other tentative simply don't render them, no error message not from Pango nor
from ttx.
Comment 28 Denis Jacquerye 2005-10-25 22:47:22 UTC
It seems the error with the fonts that didn't work in Pango was due to the
version of Fontforge I was using. Another Fontforge related bug is still there,
but only occurs depending on the order one adds anchors types.

There is a Junicode-Regular ttf file that has the OpenType feature for Latin
Script can it can be fetched at
http://home.sus.mcgill.ca/~moyogo/lingala/fonts/Junicode-Regular.ttf

Use the gzipped patch attachment (id=52690) with this font and the file
attachment (id=32984) or http://home.sus.mcgill.ca/~moyogo/lingala/fonts/text.utf8
The result, as far as the feature are included in the font, is as close to
Notepad or even Worldpad for some features.

I will provide more fonts with similar features if needed for testing, but it
seems to me the patch can be reviewed and tested in CVS.
Comment 29 Denis Jacquerye 2005-10-25 22:50:59 UTC
Created attachment 53890 [details]
Gedit with Junicode and the utf8 file

Here's a screenshot of Junicode-Regular with OpenType features for Latin script
diacritics in Gedit with the previous utf8 exemple with extra characters.
The extra characters are there because I could not reprocude Doulos SIL exact
behaviour with one sequence of characters, but managed to with other sequences.
This is a font issue, or even an OpenType issue.
Comment 30 Denis Jacquerye 2005-10-27 19:42:50 UTC
Created attachment 53961 [details] [review]
cleaned up patch for OpenType features for Latin and other basic Scripts

Here's the cleaned up version of the patch.
It handles mark, mkmk, kern for GPOS and clig, liga  and ccmp for GSUB.
I think it can be reviewed and go into CVS. I can provide more fonts and more
text samples if you need to test it more extensively.

All the fonts that have the features I've tested work. I assumed to much
previously, when display was incorrect it was due to the font not having the
right features and not due to Pango randomly acting up.

Doulos SIL is broken for Pango but that's another issue, it should not delay
other OpenType fonts from being displayed correctly for the languages needing
them. Code2000 isn't sustituting for dotless i and j when accented, so that
could be another bug. Should Pango use Unicode data to know i is composed and
therefore use the right glyph according to context or should it always be
defined in the font?
Comment 31 greg_aumann 2005-11-04 03:33:21 UTC
Can you please explain more fully why you think Doulos SIL is broken? The font
has been extensively tested with the uniscribe (Microsoft) and the InDesign
(Adobe) shaping engines and works correctly with them. We know it doesn't work
correctly with pango (released versions), qt or the version of icu that is in
openoffice2 on linux. But that is due to limitations of those shaping engines.
It may work correctly with more recent versions of icu. We will try to find out
about that. Of course this is no guarantee that there is not a bug in the font
but it makes it less likely. 

If there is a problem in Doulos SIL that you can pinpoint then we can arrange to
get it fixed. And now is a good time for that as there is currently a new
version in beta. Anyway wherever the problem is it would be good to get it fixed.

Doulos SIL uses 4 GPOS lookups and 13 GSUB lookups:
GPOS - type1 single adjustment for advancewdith (this may be new), mark, mkmk
udia, mkmk ldia
GSUB - context for dotless i, subst for dotless i, replacement dotless i when
precompose with lower diacritic, precomposed replacements, romanian overstrikes,
romanian precomposed, multi-way alternates, single alternates, vietnamese
overstrikes, vietnamese precomposed, ffi replacement, pitch ligatures replacement

It also uses lots of features and other stuff.

thanks for your work on this bug. I am very interested to see it resolved.

Hope this is somewhat helpful
Comment 32 greg_aumann 2005-11-04 03:34:22 UTC
cc myself
Comment 33 Nguyen Thai Ngoc Duy 2005-11-05 06:34:20 UTC
Did the new patch solve comment #26  i+dieresis (U+0069 + U+0308) => ï
(U+00EF)?
Comment 34 Denis Jacquerye 2005-11-05 08:45:50 UTC
The patch only uses kern, mark, mkmk for GPOS and ccmp, clig, liga for GSUB as
defined on http://www.microsoft.com/typography/otfntdev/standot/features.htm .
Comment #26 is a bug in the font, as far as this patch is concerned. Should
Pango do the extra work for fonts missing the feature?

With the patch Pango can load Doulos SIL's GSUB without any problem. Ligatures
and substitutions like i+dieresis (U+0069 + U+0308) => ï work, except the
diacritic is misplaced since Doulos SIL's GPOS triggers an error see comment #22.
I was able to have other fonts working with both GPOS and GSUB.
Greg, can you run ttx from fonttools on DoulosSILR.ttf? I don't know what to do
with the error.

Should other features be available for basic scripts?
Comment 35 greg_aumann 2005-11-08 11:15:20 UTC
I have tested the patch with Doulos SIL. I am getting two GPOS errors the same
as in comment 22. I have also run ttx on the font and am getting a stack trace
that ends with an assertion error 
"assert r.StartCoverageIndex == len(glyphs), \
AssertionError: (20, 0)".

I have turned on the logging in pango/opentype/ftglue.c line 13 and run gedit. I
will attach the output of the log. That should help in tracking down the source
of the problem.

I have checked with some of the font designers and they say that the font is
fine with current versions of icu and also Mellel's shaping engine. This is in
addition to uniscribe and indesign (except for two or more stacked diacritics).

It seems to me that there is some incompatability between Doulos SIL and
freetype. Your patch is just causing the OT tables to be loaded and thus
triggering the error. So I think the problem must be in either freetype or the
font. Behdad, do you agree with this and if so should should we move the
discussion elswhere?

About comment 26. Windows always substitutes the precomposed form for the
decomposed form. OS X does not. Of course if the font and the OS both support
the right shaping the final result will look the same. I am not sure that either
way is the "correct" way for this.

As for your question about the other features I will have to check about that.  
Comment 36 greg_aumann 2005-11-08 11:18:50 UTC
Created attachment 54458 [details]
log from ftglue.c

log from ftglue.c of gedit opening attachment 32984 [details] with the Doulos SIL font
selected
Comment 37 Denis Jacquerye 2005-11-08 11:23:16 UTC
The patch gives the same GPOS error with Charis SIL.
http://scripts.sil.org/cms/scripts/page.php?
site_id=nrsi&item_id=CharisSIL_download
Comment 38 greg_aumann 2005-11-08 11:41:12 UTC
Yes, I expected that. Doulos SIL and Charis SIL are not independent fonts. The
glyphs are different but the opentype tables are as similar as they can be made.
This is primarily to reduce the amount of work involved.
Comment 39 greg_aumann 2005-11-08 11:41:49 UTC
This comment is quite seperate from my previous comments. It has nothing to do
with Doulos SIL or Charis SIL. 

I think that fallback positioning used in your patch can be improved. That is
how diacritics are positioned when there are no opentype tables. Currently you
are putting the diacritic back a fixed amount each time. This works well if the
previous character is of average width. However it is not so good if the
previous character is wider or narrower than usual. It would be better to center
the diacritic over (or under) the previous character. This will not be exactly
right in every case (e.g. if the previous character is a j) but it will look
better for a lot more cases. Also if there is more than one diacritic they land
on top of each other. It would be good if second and subsequent diacritics could
move up (or down) a bit so they don't land on top of each other.

Qt does this and it really looks better. I will attach a screenshot so you can see.

The Gentium font (also from scripts.sil.org) is a good font to test this with as
it has no opentype tables. Contrary to what was suggested in comment 2.
Comment 40 greg_aumann 2005-11-08 11:44:05 UTC
Created attachment 54460 [details]
screenshot of qt's rendering of Doulos SIL

screen shot of Doulos SIL in kedit showing how good it can look without the use
of opentype tables
Comment 41 Behdad Esfahbod 2005-11-08 12:50:37 UTC
Greg, would you please attach the test text used in comment 40, and a pointer to
the font used please?  I want to get it working in Pango!

About Doulos, I'm investigating.
Comment 42 Denis Jacquerye 2005-11-08 14:33:21 UTC
The text used in comment 40 is the same UTF-8 encoded IPA test data: attachment
(id=32984)
The font used in the screenshot attachment 54460 [details] is Doulos SIL, but since QT
doesn't use GPOS or GSUB, it is as if they weren't there. QT places the
diacritics by itself, it doesn't use the tables (see comment 39).
Gentium is at http://www.sil.org/~gaultney/gentium/index.html (see comment 2 )
Comment 43 Denis Jacquerye 2005-11-08 14:34:51 UTC
Created attachment 54465 [details]
screenshot of qt placing Gentium diacritics

here's as screenshot of the exact same file with Gentium instead of Doulos SIL
in Kedit
Comment 44 greg_aumann 2005-11-09 01:52:56 UTC
Denis's summary in comment 42 is spot on. 
Comment 45 Nguyen Thai Ngoc Duy 2005-11-10 02:28:12 UTC
About comment #39, as a fallback case, substituting decomposed characters by 
composed characters if available would yield better results (at least with GPL 
Vietnamese fonts i have)
Comment 46 greg_aumann 2005-11-14 23:14:59 UTC
Created attachment 54758 [details]
backtrace of failure to load GPOS table in Doulos SIL

I have tracked down the point in the code at which pango fails to load the GPOS
table in Doulos SIL. The attachment is a backtrace with the values of local
variables made by setting a breakpoint where the error is first detected.

To duplicate this problem:
1) apply the patch in attachment 53961 [details] [review] to pango 1.10.1
2) install Doulos SIL 4.0.10
3) set the font in gedit to Doulos SIL
4) open gedit (I used 2.10.5) but I doubt that the exact version matters much
Comment 47 Behdad Esfahbod 2005-11-17 06:34:06 UTC
Ok, I committed two patches:

2005-11-17  Behdad Esfahbod  <behdad@gnome.org>

        Part of #101079:

        * pango/opentype/ftxopen.c (Load_Lookup): In extension subtables,
        offset is relative to the extension subtable, not the original
        table. (Greg Aumann)

        * pango/opentype/ftxgpos.c (Load_BaseArray): When reading BaseAnchor,
        skip offsets that are zero.  Works around bug in Doulos SIL Regular.

============

I believe Doulos SIL Regular is wrong here:

(01:03:07) behdad: the GPOS BaseArray is at 0x0E40
(01:03:29) behdad: ClassCount is 6
(01:03:41) behdad: BaseCount is 0x2C1 which is 705
(01:03:52) behdad: so we expect 705 records of 6 offsets each
(01:04:09) behdad: and look what follows, all two zero bytes, followed by 10
nonzero, repeat
(01:04:24) behdad: the zero bytes are wrong


Testing is appreciated.  See if this fixes your favorite problem.
Comment 48 greg_aumann 2005-11-23 02:25:10 UTC
I have tested pango 1.10.1 with the patch in attachment 53961 [details] [review] and the two fixes
mentioned in comment 47.

I tested with Doulos SIL 4.0.10 and 4.0.14 and Charis SIL 4.0.2. 

These have fixed the incompatibility problems with pango and these fonts. I will
attach two screen shots so you can see.

There is still one minor issue with the diacritics under U+0260 LATIN SMALL
LETTER G WITH HOOK (end of the fourth line of attachment 32984 [details]). The two
diacritics are being placed on top of each other. However this is also
happenning in the uniscribe screenshot (attachment 32985 [details]). And in fact pango is
rendering the test data a little better than Notepad.
Comment 49 greg_aumann 2005-11-23 02:26:31 UTC
Created attachment 55124 [details]
screenshot of Charis SIL 4.0.2

Screenshot of patched pango rendering Charis SIL 4.0.2
Comment 50 greg_aumann 2005-11-23 02:27:53 UTC
Created attachment 55125 [details]
screenshot of Doulos SIL 4.0.10

Screenshot of patched pango rendering Doulos SIL 4.0.10
Comment 51 greg_aumann 2005-11-23 02:49:07 UTC
Re: comment 47 and the zeros in the GPOS BaseArray.

response is a summary of comments from Bob Hallissy

As Behdad noted all of the offsets for BaseAnchor[0] are 0.

The key to understanding this is to look at the MarkArray and notice that none
of the covered mark glyphs are given class 0. Therefore there is no need to
provide a base glyph anchor for class 0 marks.

Thus the font is internally consistent. In effect class 0 is unused. The reason
for not using class 0 is to use the same glyph classes everywhere in the font.
For GDEF the mark classes are always 1..n (omitting 0) (see the example in the 
spec).  We use the same class numbers for all other uses, so we end up 
with 0 being unused.

Not sure the spec explicitly discusses the case where a given mark class 
is empty. The closest thing would be the statement in the BaseArray table 
description that says: "A BaseRecord declares one Anchor table for each 
mark class (including Class 0) identified in the MarkRecords of the 
MarkArray."  Noting that, in Doulos, Class 0 is not "identified in the 
MarkRecords of the MarkArray", no Anchor table is needed for it.

However it should be noted that other opentype shaping engines have no problem
with null offsets here.

Thus it is not really a bug in the font nor is it a pango bug but really a grey
area in the spec. Given that other opentype shaping engines are fine with null
offsets it is probably best that pango includes the second of Behdad's patches.
Comment 52 Behdad Esfahbod 2005-11-23 07:52:17 UTC
Thanks Greg for the clarifications.  That's exactly what I thought, that class 0
is not used.  Still it wouldn't harm to point it to a useless small (zero-item?)
lookup table.  Anyway, the current approach in Pango takes care of that.  All good.

I'll review and commit the shaper patch soon.
Comment 53 Behdad Esfahbod 2005-11-23 11:55:52 UTC
Created attachment 55143 [details] [review]
Committed patch.

A reworked patch committed.  Sweet rendering of Doulos confirmed.  Happy all :)


2005-11-23  Behdad Esfahbod  <behdad@gnome.org>

	* modules/basic/basic-fc.c: Reworked basic shaper with OpenType
	support. (#101079, based on patch from Denis Jacquerye and Noah Levitt)


	* modules/basic/basic-fc.c (basic_scripts): Added Unicode 4.1 addition
	script PANGO_SCRIPT_GLAGOLITIC that is a "simple" script.

	* modules/arabic/arabic-fc.c, modules/syriac/syriac-fc.c: Replace
	g_utf8_to_ucs4_fast() with g_utf8_strlen()!

	* pango/opentype/pango-ot-ruleset.c (pango_ot_ruleset_add_feature):
	Remove reference in docs to pango_ot_ruleset_shape() that was
	removed long ago.
Comment 54 Denis Jacquerye 2005-11-23 21:39:33 UTC
I opened a couple of bug that come from this discussion:
Bug 322234: Diacritics should not overlap
Bug 322273: Pango should use canonical decomposition data
Comment 55 Hydonsingore Cia 2006-03-07 05:01:07 UTC
Created attachment 60815 [details]
Comparing bluefish with yudit under pango-1.10.4

I still experience a similiar problem on my Gentoo Linux box with Pango-1.10.4 installed. I tried DejaVu, Duolos SIL,and Gentium fonts but none of them seemed to display these combining characters correctly. Here is a screenshot comparing bluefish with yudit while displaying the same characters.
Comment 56 Denis Jacquerye 2006-03-07 05:53:01 UTC
Pango-1.11.1 fixes this for fonts with the OpenType features.

Gentium and other fonts actually don't have anchors for diacritics, Pango doesn't handle that yet.

Doulos SIL, Charis SIL and DejaVu should work.
Comment 57 Behdad Esfahbod 2006-03-07 05:54:39 UTC
The fix is only in 0.11.x and later.