Bug 149438 – metrics in ja_JP vs. latin text

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 149438 - metrics in ja_JP vs. latin text


Summary:	metrics in ja_JP vs. latin text


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	general
Version:	1.4.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	pango-maint
QA Contact:	pango-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2004-08-05 20:59 UTC by Felipe Heidrich
Modified:	2004-12-22 21:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
snippet (2.36 KB, text/plain) 2004-08-05 21:00 UTC, Felipe Heidrich		Details
screenshot (3.53 KB, image/png) 2004-08-09 15:08 UTC, Felipe Heidrich		Details
Fontset itemization patch (9.04 KB, patch) 2004-09-13 21:28 UTC, Owen Taylor	none	Details \| Review
Itemize in pango_context_get_metrics() (4.81 KB, patch) 2004-10-18 19:14 UTC, Owen Taylor	none	Details \| Review

Description Felipe Heidrich 2004-08-05 20:59:29 UTC

Compile & run the snippet on japanese locale with pango 1.4.
Notice that the font extents is actually larger than ascent+descent returned by
pango_context_get_metrics.
I understand that the code is getting the metrics for ja-jp language while
drawing a latin string. But this should work given that japanese includes latin.
This breaks existing code, see:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=71517
I also found a few places in the gtk code base where the same approach is used
to calculate the font metrics, for example:
gtkcellrenderertext.c#gtk_cell_renderer_text_get_size
gtkclist.c#gtk_clist_set_row_height
gtkentry.c#gtk_entry_size_request
etc...

I believe all those places are broken just like my simple snippet is.
Older versions of pango are working.

(Ps. I didn't test pango 1.4.1, would it be good ?)

Comment 1 Felipe Heidrich 2004-08-05 21:00:10 UTC

Created attachment 30252 [details]
snippet

Comment 2 Owen Taylor 2004-08-08 14:54:51 UTC

I can't reproduce with 1.5.x. Something may have changed in this area,
though I can't really think of anything.

- Can you attach a screenshot?
- What Japanese font are you using?

Comment 3 Felipe Heidrich 2004-08-09 15:08:39 UTC

Created attachment 30368 [details]
screenshot

Comment 4 Felipe Heidrich 2004-08-09 15:09:14 UTC

More screenshot:
https://bugs.eclipse.org/bugs/attachment.cgi?id=13785&action=view

Comment 5 Felipe Heidrich 2004-08-09 15:16:08 UTC

In the gnome-font-properties I have Sans 11 set for application font. If a 
select some other font I can't reproduce the bug myself.
Some fonts I tested:
can reproduce the bug: Sans, Serif, Monospace.
can't reproduce the bug: Courier, Fixed, Arial, Kochi Mincho, Marumoji, etc.

Comment 6 Felipe Heidrich 2004-08-17 20:21:03 UTC

Billy, can you reproduce this bug ?
It failed on GTK 2.4.7/Pango 1.4.1
You don't forget to export LANG=ja_JP before running the snippet.

Comment 7 Billy Biggs 2004-08-23 19:35:16 UTC

I can reproduce this using GTK+-2.4.4/pango-1.4.0 and GTK+-2.4.7/pango-1.4.1.

Comment 8 Owen Taylor 2004-09-13 20:07:57 UTC

OK, while I couldn't reproduce this with the test case, I did track
down what was going on while looking at:

 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=131218

Which is about the same problem in xchat. It's actually quite simple;
the problem is that the way that fonts are being selected in 
pango_fontset_real_get_metrics() corresponds to the old font
selection algorithm (pre-1.4) not the new font selection algorithm.

The reason that this shows up for CJK in particular, is that
pango_language_includes_script("ja-jp", PANGO_SCRIPT_LATIN) is 
FALSE. If it was TRUE, then this wouldn't matter.

I'm not entirely sure favoring Latin fonts over CJK fonts for latin
characters is right in a CJK locale; I'm inclined to leave it
that way for now because the Latin glyphs in the commonly used
CJK fonts on Linux are so bad. Fixing the PangoFontset code
shouldn't hurt even if we decide to change that later; well, at
least except for a small once-per-font speed hit.

Comment 9 Felipe Heidrich 2004-09-13 20:58:09 UTC

My opnion:
1) ja-jp does include PANGO_SCRIPT_LATIN.
2) ugly glyphs are better than truncated strings.

Comment 10 Owen Taylor 2004-09-13 21:22:02 UTC

Felipe - leaving this broken isn't an option... it will be fixed,
just maybe not by making ja-jp include PANGO_SCRIPT_LATIN.

I don't see any reall justification for saying that ja-jp includes
PANGO_SCRIPT_LATIN unless we want to say *all* languages include
PANGO_SCRIPT_LATIN. I don't think latin usage in Japanese is higher
than in Russian, Hindi, etc. (And also, the same bug will occur for
other languages as well; it's not CJK specific.)

And saying that LATIN is part of all languages reintroduces one of the major
problems that the Pango-1.4 font selection scheme was trying to resolve - 
if you have

 Font A - Good Arabic font, no or incomplete Latin
 Font B - Good Latin font
 Font C - Crappy Arabic + Latin font

Listed in that order of preference, then a language tag of 'ar' would 
give you A+C rather than A+B.

Comment 11 Owen Taylor 2004-09-13 21:28:23 UTC

Created attachment 31538 [details] [review]
Fontset itemization patch

Here's a start at a patch; it's not quite working yet, becuase there
needs to be a fontset->fontmap back link to make it work fully correctly;
the reason for this is that the fontmap is needed to do shape engine
selection and shape engines get to vote on font selection.

Comment 12 federic zhang 2004-10-05 16:33:10 UTC

Not sure that the above patch is too complicated. I encountered one similiar 
requirement/problem, it is due to that the matrix rate of japanese character 
in ja font and latin glyph in latin font isn't matched well, 2:1. the problem 
occurs when Monospace family is used such as gedit.

I figured out one solution to try to address it: check whether monospace is 
used in itemize_state_init, if it is, set derived_lang as current language in 
itemize_state_update_new_run insteads of one that is got from 
computed_derived_lang, in this way, japan font would be used for both latin 
and japanese characters in ja_JP locale.

To enable user capability to configure it via fontconfig, codes are added in 
itemize_state_init to check whether one unofficial flag, 'disable-pango-
script', is set in the FcPattern, it means old font algorithm would be used 
only if Monospace family is used and 'disable-pango-script' is set in 
fonts.conf by user.

I will post the patch at Oct 8th once i come back to office.

Comment 13 Owen Taylor 2004-10-05 21:00:44 UTC

My last patch is junk. I have a better patch around that makes
pango_context_get_metrics() call pango_itemize() directly.

If you want to submit a patch about some config option to use Japanese
fonts for latin script, please file that *separately*, not on this 
bug report.

Comment 14 Owen Taylor 2004-10-18 19:14:40 UTC

Created attachment 32745 [details] [review]
Itemize in pango_context_get_metrics()

Comment 15 Owen Taylor 2004-11-21 15:59:26 UTC

Sun Nov 21 10:52:03 2004  Owen Taylor  <otaylor@redhat.com>

        * pango/pango-context.c: Don't just call pango_fontset_get_metrics()
        to implement pango_context_get_metrics(), since that skips our
        normal font selection algorithm. Rather itemize the sample string
        and get the metrics from that. (#149438, Felipe Heidrich)