After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 325714 - (pango-languages) Pango should respect $LANGUAGE
(pango-languages)
Pango should respect $LANGUAGE
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
unspecified
Other Linux
: Normal enhancement
: ---
Assigned To: pango-maint
pango-maint
: 329402 (view as bug list)
Depends on:
Blocks: Persian
 
 
Reported: 2006-01-04 02:38 UTC by Behdad Esfahbod
Modified: 2007-05-30 04:23 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Screenshot of invoking pango-view with LANG=ja and LANGUAGE=C:ja (33.88 KB, image/png)
2007-05-29 05:43 UTC, Akira TAGOH
Details
Screenshot of invoking pango-view with LANG=ja and LANGUAGE=C:zh (33.89 KB, image/png)
2007-05-29 05:43 UTC, Akira TAGOH
Details
Screenshot of invoking pango-view with LANG=zh_CN and LANGUAGE=C:zh (35.10 KB, image/png)
2007-05-29 05:45 UTC, Akira TAGOH
Details
Screenshot of invoking pango-view with LANG=zh_CN and LANGUAGE=C:ja (35.31 KB, image/png)
2007-05-29 05:46 UTC, Akira TAGOH
Details

Description Behdad Esfahbod 2006-01-04 02:38:14 UTC
Pango should use $LANGUAGES to decide which language to use for each script.  That should be used for passing to fontconfig, and to choose the correct LanguageSystem for an OpenType font.

For example, if I set LANGUAGES=en,fa, then upon seeing text in Arabic script, it should ask fontconfig for fonts for 'fa', not 'ar'.  It should lookup the Persian LanguageSystem in OpenType fonts, instead of the default LangSys.
Comment 1 Behdad Esfahbod 2006-01-29 01:31:12 UTC
First we need to decide whether we really want to only look for a language list in $LANGUAGES.  This simplified things and can be overriden by putenv()ing LANGUAGES.  On the other hand, if we provide a function to override it, that function will have to set a global variable (which is messy), or take a PangoContext.  Having a per PangoContext language list, and falling back to $LANGUAGES does sound like a sound idea to me.
Comment 2 Behdad Esfahbod 2006-02-01 00:47:42 UTC
*** Bug 329402 has been marked as a duplicate of this bug. ***
Comment 3 Denis Jacquerye 2006-02-01 02:08:48 UTC
Here's the reference to the 'locl' feature in the MS OT specs: http://www.microsoft.com/typography/otspec/features_ko.htm#locl
It should be active by default.
Comment 4 Behdad Esfahbod 2006-02-01 03:11:53 UTC
Ok, I see. Unfortunately seems like the rest of the standard have not been updated to talk about where exactly this feature should be applied.  Is it before ccmp?  After?  After all GSUB features?  Etc.  All of these make some sense...

Anyway, just enabling the feature will not do much as long as we don't support the Language stuff in the OT shapers.
Comment 5 Denis Jacquerye 2006-03-09 07:55:17 UTC
I would have said locl should be applied before ccmp but the OpenType Tag Registry clearly specifies that ccmp "needs to be implemented prior to any other feature".

As far as other features it should probably be applied before, but after is fine if it can override their results.
Comment 6 Denis Jacquerye 2006-05-27 14:13:05 UTC
Actually OpenType features should be applied according to the font's order. Meaning fontmakers should probably order locl before ccmp.
Comment 7 Behdad Esfahbod 2006-05-27 18:05:53 UTC
There's been some discussion around this recently on the OpenType list saying that some OT features should be applied at the same time, but other than that, I don't agree with you.  The Arabic OT spec for example specifies the order the features should be applied.
Comment 8 Denis Jacquerye 2006-05-27 18:32:37 UTC
> The Arabic OT spec for example specifies the order the features should be applied.

But that could mean the order specified in the specs has to defined in the font by the font maker.

Either way, it's probably safer to have the order specified in the specs in Pango rather than in fonts. Some font makers might not realize they can (or have to) set the order of features. 

There are some descrepencies in the current specs. Hopefully the next update will clear those out. They specify 'aalt' should be applied first everytime, but it would be pretty much unusable if 'locl' is applied afterwards for some glyphs.

What does 'applying them at the same time' mean?
Comment 9 Owen Taylor 2006-05-27 18:58:57 UTC
Pango used to follow the font order; that produced incorrect rendering
for Indic scripts with many available fonts, and the code had to be
reworked to allow the shaper to specify the ordering. It's conceivable 
that following the font order is right for latn, though...

Comment 10 Behdad Esfahbod 2006-05-27 19:00:29 UTC
It's a pity that MS and Adobe have not publicized their latest spec yet :(.  Apparently they have changed a lot, including a lot of stuff in the Indic spec.

Applying at the same time means as if they were just one feature, instead of one comming after the other.
Comment 11 Maciej Katafiasz 2006-05-29 00:19:49 UTC
One use case for having more than just one language list is solving the (difficult) issue of correctly rendering unified Han characters which are in both Chinese and Japanese for example. If IME were able to provide a hint which language they're inputting, it'd allow a Japanese user (thus with general preference for Japanese glyphs) to write Chinese and have them rendered with Chinese font, or the other way around. AFAIK, no system today gets that quite right, and it'd be nice to have it solved, it's sort of a touchy matter for the users.
Comment 12 Behdad Esfahbod 2006-05-29 00:27:26 UTC
In your usecase, you still need the higher level to add markup/attrs to set the language when rendering later (where IME is not available anymore).  When doing that, the current context language is enough; no need for multiple languages.
Comment 14 Akira TAGOH 2007-03-26 10:36:02 UTC
So how would we be going on this? This behavior sounds like somewhat helpful in some cases to make it better in at least one language, even if this doesn't solve all of issues that is relevant to current locale v.s. used characters.
Comment 15 Akira TAGOH 2007-05-10 05:03:45 UTC
Speaking of the possible problem behind this feature, if one sets up LANGUAGE env, it may introduces not displaying the proper localized strings. for example, one is larning Japanese but want to look at translated strings as English so that it's still easier to see, but just need an input method etc. so if one just runs the application with LANGUAGE=ja LC_CTYPE=ja_JP.UTF-8 LANG=en_US.UTF-8, it still displays the translated strings as Japanese.  Well, I'm sure according to the original purpose of LANGUAGE this usage is wrong. but there may be the case that one wants to prefer Japanese fonts in any cases.

So should we have different env var or?
Comment 16 Behdad Esfahbod 2007-05-10 05:40:08 UTC
No, in that case they will use LANGUAGE=en,ja and will still get English messages.
Comment 17 Akira TAGOH 2007-05-10 09:52:54 UTC
Well, unfortunately even that way doesn't work. because we don't usually have any po files for en. so gettext is going to fallback to next. then application still shows ja text at menu, toolbar etc.
Comment 18 Behdad Esfahbod 2007-05-10 17:49:13 UTC
Humm, right.

Ok, what about LANGUAGE=C,ja?  Not pretty, but works.
Comment 19 Akira TAGOH 2007-05-11 03:32:32 UTC
well, for only en_US or just en? hmm, yeah, it should works.
Comment 20 Behdad Esfahbod 2007-05-14 04:02:04 UTC
Getting near:

2007-05-13  Behdad Esfahbod  <behdad@gnome.org>

        Part of Bug 325714 – Pango should respect $LANGUAGE

        * pango/pango-ot.h:
        * pango/pango-ot-private.h:
        * pango/pango-ot-tag.c (pango_ot_tag_from_script),
        (pango_ot_tag_from_language):
        * pango/pango-ot-info.c (pango_ot_info_find_script),
        (pango_ot_info_find_language), (pango_ot_info_find_feature),
        (pango_ot_info_list_languages), (pango_ot_info_list_features):
        * pango/pango-ot-ruleset.c (pango_ot_ruleset_new),
        (pango_ot_ruleset_new_for), (pango_ot_ruleset_add_feature),
        (pango_ot_ruleset_maybe_add_feature),
        (pango_ot_ruleset_maybe_add_features):
        Add new engine API:

                PANGO_OT_NO_FEATURE
                PANGO_OT_NO_SCRIPT
                PANGO_OT_TAG_DEFAULT_SCRIPT
                PANGO_OT_TAG_DEFAULT_LANGUAGE
                pango_ot_ruleset_new_for()
                pango_ot_ruleset_maybe_add_feature()
                pango_ot_ruleset_maybe_add_features()

        Using pango_ot_ruleset_new_for() and
        pango_ot_ruleset_maybe_add_features() drastically simplifies ruleset
        building in modules, and does correct script and language selection
        too.  Modules need to be updated to use it though.

        * docs/pango-docs.sgml:
        * docs/pango-sections.txt:
        * docs/tmpl/opentype.sgml:
        Update.

Comment 21 Behdad Esfahbod 2007-05-14 07:09:03 UTC
One more step:

2007-05-14  Behdad Esfahbod  <behdad@gnome.org>

        Part of Bug 325714 – Pango should respect $LANGUAGE

        * pango/pango-ot.h:
        * pango/pango-ot-ruleset.c (pango_ot_ruleset_get_for),
        (pango_ot_ruleset_description_hash),
        (pango_ot_ruleset_description_equal),
        (pango_ot_ruleset_description_copy),
        (pango_ot_ruleset_description_free):
        Add new engine API:

                PangoOTRulesetDescription
                pango_ot_ruleset_get_for()
                pango_ot_ruleset_description_hash()
                pango_ot_ruleset_description_equal()
                pango_ot_ruleset_description_copy()
                pango_ot_ruleset_description_free()

        The main addition is pango_ot_ruleset_get_for() that
        takes a ruleset description, ie. script/language and list
        of GSUB/GPOS features to apply, and returns a ruleset.
        It manages all the work to cache rulesets, so modules
        don't have to do that anymore.  Given that modules do not
        deal with just one ruleset anymore (because we want to
        respect language, and allow user-selected features), this
        makes their life way easier.

        * docs/pango-sections.txt:
        * docs/tmpl/opentype.sgml:
        Update.

Comment 22 Behdad Esfahbod 2007-05-14 07:22:16 UTC
2007-05-14  Behdad Esfahbod  <behdad@gnome.org>

        Part of Bug 325714 – Pango should respect $LANGUAGE
        Bug 414264 – Pango vertical writing support is different with real
        CJK usage.

        * modules/arabic/arabic-fc.c (arabic_engine_shape):
        * modules/basic/basic-fc.c (basic_engine_shape):
        * modules/syriac/syriac-fc.c (syriac_engine_shape):
        Remove fallback_shape() paths.  Remove get_ruleset().
        Use pango_ot_ruleset_get_for(), that correctly works for multiple
        languages.  Also makes basic shaper apply the 'vert' feature for
        vertical text.  Removes a net 500 lines.

        Other OpenType modules need to be ported over time, however some
        extensions may be needed.  For example, the Hebrew shaper uses
        fallback code if no GPOS tables are available.  Currently using
        pango_ot_ruleset_get_for() one cannot see which features were
        found.

Comment 23 Behdad Esfahbod 2007-05-14 07:23:38 UTC
The fixed modules (basic, arabic, syriac) also apply 'locl' feature now too.  The order is hardcoded to do locl after ccmp.  I'm going to fix it such that for non-Indic modules the font order of features is respected.
Comment 24 Behdad Esfahbod 2007-05-14 08:51:50 UTC
2007-05-14  Behdad Esfahbod  <behdad@gnome.org>

        Bug 325714 – Pango should respect $LANGUAGE

        * pango/pango-language.c (pango_language_matches),
        (parse_default_languages), (_pango_script_get_default_language),
        (pango_script_get_sample_language):
        Make pango_script_get_sample_language() use the value of env var
        PANGO_LANGUAGE or LANGUAGE (checked in that order) to make better
        guesses.  The env var should be a list of language tags, like "en:fa"
        for example where makes Pango choose Persian (fa) fonts instead of
        Arabic (ar) fonts...

Comment 25 Behdad Esfahbod 2007-05-14 08:53:13 UTC
So, I make it check PANGO_LANGUAGE first, and then LANGUAGE.

Setting to "C:ja" wasn't working becase "C" is an unknown lang to pango and so it will chose that for every script queried.  Made a special case about "C" to skip it.  Anyway, fixed.
Comment 26 Akira TAGOH 2007-05-29 05:40:37 UTC
Sorry for reopening, but it still looks not correct. ASCII characters are referring to PANGO_LANGUAGE/LANGUAGE now (thanks for that) though, Chinese characters/Kanji characters is still shown as same as previous pango.
Comment 27 Akira TAGOH 2007-05-29 05:43:13 UTC
Created attachment 88984 [details]
Screenshot of invoking pango-view with LANG=ja and LANGUAGE=C:ja
Comment 28 Akira TAGOH 2007-05-29 05:43:46 UTC
Created attachment 88985 [details]
Screenshot of invoking pango-view with LANG=ja and LANGUAGE=C:zh
Comment 29 Akira TAGOH 2007-05-29 05:45:28 UTC
Created attachment 88986 [details]
Screenshot of invoking pango-view with LANG=zh_CN and LANGUAGE=C:zh
Comment 30 Akira TAGOH 2007-05-29 05:46:06 UTC
Created attachment 88987 [details]
Screenshot of invoking pango-view with LANG=zh_CN and LANGUAGE=C:ja
Comment 31 Akira TAGOH 2007-05-29 05:48:11 UTC
In the above screenshot, rendering as expected is, LANG=ja,LANGUAGE=C:ja and LANG=zh_CN,LANGUAGE=C:zh.
Comment 32 Behdad Esfahbod 2007-05-29 06:13:37 UTC
I don't understand what the bug is.  However, your shots clearly show that the feature is working.

Please open a new bug and attach one right and one wrong shots, so I can see what you expect.  All the shots are expected as far as I understand.
Comment 33 Akira TAGOH 2007-05-29 07:14:16 UTC
Hmm, I may be confused. is both envvar evaluated after looking up LANG and the requested glyphs aren't available in the font that prefers for LANG?  Actually it works fine for LANGUAGE=blahblahblah and LANG=en_US. but I also expected to affect it first anyway, because it becomes ugly rendering easily for displaying Chinese text with LANG=ja as the above screenshot, because Japanese fonts are usually a subset of Chinese fonts you know. it may be still useful to display a text if it's obvious or which one would be rendered prior to.
Comment 34 Behdad Esfahbod 2007-05-30 04:23:50 UTC
(In reply to comment #33)
> Hmm, I may be confused. is both envvar evaluated after looking up LANG and the
> requested glyphs aren't available in the font that prefers for LANG?  Actually
> it works fine for LANGUAGE=blahblahblah and LANG=en_US. but I also expected to
> affect it first anyway, because it becomes ugly rendering easily for displaying
> Chinese text with LANG=ja as the above screenshot, because Japanese fonts are
> usually a subset of Chinese fonts you know. it may be still useful to display a
> text if it's obvious or which one would be rendered prior to.


Akira, again, it's really hard to understand what your expected behavior is without a good and bad screenshot.  Please file a new bug, with one good and one bad shot, and the command that produced each, and why you think the bad one is bad.  Thanks again.