GNOME Bugzilla – Bug 149438
metrics in ja_JP vs. latin text
Last modified: 2004-12-22 21:47:04 UTC
Compile & run the snippet on japanese locale with pango 1.4. Notice that the font extents is actually larger than ascent+descent returned by pango_context_get_metrics. I understand that the code is getting the metrics for ja-jp language while drawing a latin string. But this should work given that japanese includes latin. This breaks existing code, see: https://bugs.eclipse.org/bugs/show_bug.cgi?id=71517 I also found a few places in the gtk code base where the same approach is used to calculate the font metrics, for example: gtkcellrenderertext.c#gtk_cell_renderer_text_get_size gtkclist.c#gtk_clist_set_row_height gtkentry.c#gtk_entry_size_request etc... I believe all those places are broken just like my simple snippet is. Older versions of pango are working. (Ps. I didn't test pango 1.4.1, would it be good ?)
Created attachment 30252 [details] snippet
I can't reproduce with 1.5.x. Something may have changed in this area, though I can't really think of anything. - Can you attach a screenshot? - What Japanese font are you using?
Created attachment 30368 [details] screenshot
More screenshot: https://bugs.eclipse.org/bugs/attachment.cgi?id=13785&action=view
In the gnome-font-properties I have Sans 11 set for application font. If a select some other font I can't reproduce the bug myself. Some fonts I tested: can reproduce the bug: Sans, Serif, Monospace. can't reproduce the bug: Courier, Fixed, Arial, Kochi Mincho, Marumoji, etc.
Billy, can you reproduce this bug ? It failed on GTK 2.4.7/Pango 1.4.1 You don't forget to export LANG=ja_JP before running the snippet.
I can reproduce this using GTK+-2.4.4/pango-1.4.0 and GTK+-2.4.7/pango-1.4.1.
OK, while I couldn't reproduce this with the test case, I did track down what was going on while looking at: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=131218 Which is about the same problem in xchat. It's actually quite simple; the problem is that the way that fonts are being selected in pango_fontset_real_get_metrics() corresponds to the old font selection algorithm (pre-1.4) not the new font selection algorithm. The reason that this shows up for CJK in particular, is that pango_language_includes_script("ja-jp", PANGO_SCRIPT_LATIN) is FALSE. If it was TRUE, then this wouldn't matter. I'm not entirely sure favoring Latin fonts over CJK fonts for latin characters is right in a CJK locale; I'm inclined to leave it that way for now because the Latin glyphs in the commonly used CJK fonts on Linux are so bad. Fixing the PangoFontset code shouldn't hurt even if we decide to change that later; well, at least except for a small once-per-font speed hit.
My opnion: 1) ja-jp does include PANGO_SCRIPT_LATIN. 2) ugly glyphs are better than truncated strings.
Felipe - leaving this broken isn't an option... it will be fixed, just maybe not by making ja-jp include PANGO_SCRIPT_LATIN. I don't see any reall justification for saying that ja-jp includes PANGO_SCRIPT_LATIN unless we want to say *all* languages include PANGO_SCRIPT_LATIN. I don't think latin usage in Japanese is higher than in Russian, Hindi, etc. (And also, the same bug will occur for other languages as well; it's not CJK specific.) And saying that LATIN is part of all languages reintroduces one of the major problems that the Pango-1.4 font selection scheme was trying to resolve - if you have Font A - Good Arabic font, no or incomplete Latin Font B - Good Latin font Font C - Crappy Arabic + Latin font Listed in that order of preference, then a language tag of 'ar' would give you A+C rather than A+B.
Created attachment 31538 [details] [review] Fontset itemization patch Here's a start at a patch; it's not quite working yet, becuase there needs to be a fontset->fontmap back link to make it work fully correctly; the reason for this is that the fontmap is needed to do shape engine selection and shape engines get to vote on font selection.
Not sure that the above patch is too complicated. I encountered one similiar requirement/problem, it is due to that the matrix rate of japanese character in ja font and latin glyph in latin font isn't matched well, 2:1. the problem occurs when Monospace family is used such as gedit. I figured out one solution to try to address it: check whether monospace is used in itemize_state_init, if it is, set derived_lang as current language in itemize_state_update_new_run insteads of one that is got from computed_derived_lang, in this way, japan font would be used for both latin and japanese characters in ja_JP locale. To enable user capability to configure it via fontconfig, codes are added in itemize_state_init to check whether one unofficial flag, 'disable-pango- script', is set in the FcPattern, it means old font algorithm would be used only if Monospace family is used and 'disable-pango-script' is set in fonts.conf by user. I will post the patch at Oct 8th once i come back to office.
My last patch is junk. I have a better patch around that makes pango_context_get_metrics() call pango_itemize() directly. If you want to submit a patch about some config option to use Japanese fonts for latin script, please file that *separately*, not on this bug report.
Created attachment 32745 [details] [review] Itemize in pango_context_get_metrics()
Sun Nov 21 10:52:03 2004 Owen Taylor <otaylor@redhat.com> * pango/pango-context.c: Don't just call pango_fontset_get_metrics() to implement pango_context_get_metrics(), since that skips our normal font selection algorithm. Rather itemize the sample string and get the metrics from that. (#149438, Felipe Heidrich)