GNOME Bugzilla – Bug 325714
Pango should respect $LANGUAGE
Last modified: 2007-05-30 04:23:50 UTC
Pango should use $LANGUAGES to decide which language to use for each script. That should be used for passing to fontconfig, and to choose the correct LanguageSystem for an OpenType font. For example, if I set LANGUAGES=en,fa, then upon seeing text in Arabic script, it should ask fontconfig for fonts for 'fa', not 'ar'. It should lookup the Persian LanguageSystem in OpenType fonts, instead of the default LangSys.
First we need to decide whether we really want to only look for a language list in $LANGUAGES. This simplified things and can be overriden by putenv()ing LANGUAGES. On the other hand, if we provide a function to override it, that function will have to set a global variable (which is messy), or take a PangoContext. Having a per PangoContext language list, and falling back to $LANGUAGES does sound like a sound idea to me.
*** Bug 329402 has been marked as a duplicate of this bug. ***
Here's the reference to the 'locl' feature in the MS OT specs: http://www.microsoft.com/typography/otspec/features_ko.htm#locl It should be active by default.
Ok, I see. Unfortunately seems like the rest of the standard have not been updated to talk about where exactly this feature should be applied. Is it before ccmp? After? After all GSUB features? Etc. All of these make some sense... Anyway, just enabling the feature will not do much as long as we don't support the Language stuff in the OT shapers.
I would have said locl should be applied before ccmp but the OpenType Tag Registry clearly specifies that ccmp "needs to be implemented prior to any other feature". As far as other features it should probably be applied before, but after is fine if it can override their results.
Actually OpenType features should be applied according to the font's order. Meaning fontmakers should probably order locl before ccmp.
There's been some discussion around this recently on the OpenType list saying that some OT features should be applied at the same time, but other than that, I don't agree with you. The Arabic OT spec for example specifies the order the features should be applied.
> The Arabic OT spec for example specifies the order the features should be applied. But that could mean the order specified in the specs has to defined in the font by the font maker. Either way, it's probably safer to have the order specified in the specs in Pango rather than in fonts. Some font makers might not realize they can (or have to) set the order of features. There are some descrepencies in the current specs. Hopefully the next update will clear those out. They specify 'aalt' should be applied first everytime, but it would be pretty much unusable if 'locl' is applied afterwards for some glyphs. What does 'applying them at the same time' mean?
Pango used to follow the font order; that produced incorrect rendering for Indic scripts with many available fonts, and the code had to be reworked to allow the shaper to specify the ordering. It's conceivable that following the font order is right for latn, though...
It's a pity that MS and Adobe have not publicized their latest spec yet :(. Apparently they have changed a lot, including a lot of stuff in the Indic spec. Applying at the same time means as if they were just one feature, instead of one comming after the other.
One use case for having more than just one language list is solving the (difficult) issue of correctly rendering unified Han characters which are in both Chinese and Japanese for example. If IME were able to provide a hint which language they're inputting, it'd allow a Japanese user (thus with general preference for Japanese glyphs) to write Chinese and have them rendered with Chinese font, or the other way around. AFAIK, no system today gets that quite right, and it'd be nice to have it solved, it's sort of a touchy matter for the users.
In your usecase, you still need the higher level to add markup/attrs to set the language when rendering later (where IME is not available anymore). When doing that, the current context language is enough; no need for multiple languages.
http://www.gnu.org/software/libc/manual/html_node/Using-gettextized-software.html
So how would we be going on this? This behavior sounds like somewhat helpful in some cases to make it better in at least one language, even if this doesn't solve all of issues that is relevant to current locale v.s. used characters.
Speaking of the possible problem behind this feature, if one sets up LANGUAGE env, it may introduces not displaying the proper localized strings. for example, one is larning Japanese but want to look at translated strings as English so that it's still easier to see, but just need an input method etc. so if one just runs the application with LANGUAGE=ja LC_CTYPE=ja_JP.UTF-8 LANG=en_US.UTF-8, it still displays the translated strings as Japanese. Well, I'm sure according to the original purpose of LANGUAGE this usage is wrong. but there may be the case that one wants to prefer Japanese fonts in any cases. So should we have different env var or?
No, in that case they will use LANGUAGE=en,ja and will still get English messages.
Well, unfortunately even that way doesn't work. because we don't usually have any po files for en. so gettext is going to fallback to next. then application still shows ja text at menu, toolbar etc.
Humm, right. Ok, what about LANGUAGE=C,ja? Not pretty, but works.
well, for only en_US or just en? hmm, yeah, it should works.
Getting near: 2007-05-13 Behdad Esfahbod <behdad@gnome.org> Part of Bug 325714 – Pango should respect $LANGUAGE * pango/pango-ot.h: * pango/pango-ot-private.h: * pango/pango-ot-tag.c (pango_ot_tag_from_script), (pango_ot_tag_from_language): * pango/pango-ot-info.c (pango_ot_info_find_script), (pango_ot_info_find_language), (pango_ot_info_find_feature), (pango_ot_info_list_languages), (pango_ot_info_list_features): * pango/pango-ot-ruleset.c (pango_ot_ruleset_new), (pango_ot_ruleset_new_for), (pango_ot_ruleset_add_feature), (pango_ot_ruleset_maybe_add_feature), (pango_ot_ruleset_maybe_add_features): Add new engine API: PANGO_OT_NO_FEATURE PANGO_OT_NO_SCRIPT PANGO_OT_TAG_DEFAULT_SCRIPT PANGO_OT_TAG_DEFAULT_LANGUAGE pango_ot_ruleset_new_for() pango_ot_ruleset_maybe_add_feature() pango_ot_ruleset_maybe_add_features() Using pango_ot_ruleset_new_for() and pango_ot_ruleset_maybe_add_features() drastically simplifies ruleset building in modules, and does correct script and language selection too. Modules need to be updated to use it though. * docs/pango-docs.sgml: * docs/pango-sections.txt: * docs/tmpl/opentype.sgml: Update.
One more step: 2007-05-14 Behdad Esfahbod <behdad@gnome.org> Part of Bug 325714 – Pango should respect $LANGUAGE * pango/pango-ot.h: * pango/pango-ot-ruleset.c (pango_ot_ruleset_get_for), (pango_ot_ruleset_description_hash), (pango_ot_ruleset_description_equal), (pango_ot_ruleset_description_copy), (pango_ot_ruleset_description_free): Add new engine API: PangoOTRulesetDescription pango_ot_ruleset_get_for() pango_ot_ruleset_description_hash() pango_ot_ruleset_description_equal() pango_ot_ruleset_description_copy() pango_ot_ruleset_description_free() The main addition is pango_ot_ruleset_get_for() that takes a ruleset description, ie. script/language and list of GSUB/GPOS features to apply, and returns a ruleset. It manages all the work to cache rulesets, so modules don't have to do that anymore. Given that modules do not deal with just one ruleset anymore (because we want to respect language, and allow user-selected features), this makes their life way easier. * docs/pango-sections.txt: * docs/tmpl/opentype.sgml: Update.
2007-05-14 Behdad Esfahbod <behdad@gnome.org> Part of Bug 325714 – Pango should respect $LANGUAGE Bug 414264 – Pango vertical writing support is different with real CJK usage. * modules/arabic/arabic-fc.c (arabic_engine_shape): * modules/basic/basic-fc.c (basic_engine_shape): * modules/syriac/syriac-fc.c (syriac_engine_shape): Remove fallback_shape() paths. Remove get_ruleset(). Use pango_ot_ruleset_get_for(), that correctly works for multiple languages. Also makes basic shaper apply the 'vert' feature for vertical text. Removes a net 500 lines. Other OpenType modules need to be ported over time, however some extensions may be needed. For example, the Hebrew shaper uses fallback code if no GPOS tables are available. Currently using pango_ot_ruleset_get_for() one cannot see which features were found.
The fixed modules (basic, arabic, syriac) also apply 'locl' feature now too. The order is hardcoded to do locl after ccmp. I'm going to fix it such that for non-Indic modules the font order of features is respected.
2007-05-14 Behdad Esfahbod <behdad@gnome.org> Bug 325714 – Pango should respect $LANGUAGE * pango/pango-language.c (pango_language_matches), (parse_default_languages), (_pango_script_get_default_language), (pango_script_get_sample_language): Make pango_script_get_sample_language() use the value of env var PANGO_LANGUAGE or LANGUAGE (checked in that order) to make better guesses. The env var should be a list of language tags, like "en:fa" for example where makes Pango choose Persian (fa) fonts instead of Arabic (ar) fonts...
So, I make it check PANGO_LANGUAGE first, and then LANGUAGE. Setting to "C:ja" wasn't working becase "C" is an unknown lang to pango and so it will chose that for every script queried. Made a special case about "C" to skip it. Anyway, fixed.
Sorry for reopening, but it still looks not correct. ASCII characters are referring to PANGO_LANGUAGE/LANGUAGE now (thanks for that) though, Chinese characters/Kanji characters is still shown as same as previous pango.
Created attachment 88984 [details] Screenshot of invoking pango-view with LANG=ja and LANGUAGE=C:ja
Created attachment 88985 [details] Screenshot of invoking pango-view with LANG=ja and LANGUAGE=C:zh
Created attachment 88986 [details] Screenshot of invoking pango-view with LANG=zh_CN and LANGUAGE=C:zh
Created attachment 88987 [details] Screenshot of invoking pango-view with LANG=zh_CN and LANGUAGE=C:ja
In the above screenshot, rendering as expected is, LANG=ja,LANGUAGE=C:ja and LANG=zh_CN,LANGUAGE=C:zh.
I don't understand what the bug is. However, your shots clearly show that the feature is working. Please open a new bug and attach one right and one wrong shots, so I can see what you expect. All the shots are expected as far as I understand.
Hmm, I may be confused. is both envvar evaluated after looking up LANG and the requested glyphs aren't available in the font that prefers for LANG? Actually it works fine for LANGUAGE=blahblahblah and LANG=en_US. but I also expected to affect it first anyway, because it becomes ugly rendering easily for displaying Chinese text with LANG=ja as the above screenshot, because Japanese fonts are usually a subset of Chinese fonts you know. it may be still useful to display a text if it's obvious or which one would be rendered prior to.
(In reply to comment #33) > Hmm, I may be confused. is both envvar evaluated after looking up LANG and the > requested glyphs aren't available in the font that prefers for LANG? Actually > it works fine for LANGUAGE=blahblahblah and LANG=en_US. but I also expected to > affect it first anyway, because it becomes ugly rendering easily for displaying > Chinese text with LANG=ja as the above screenshot, because Japanese fonts are > usually a subset of Chinese fonts you know. it may be still useful to display a > text if it's obvious or which one would be rendered prior to. Akira, again, it's really hard to understand what your expected behavior is without a good and bad screenshot. Please file a new bug, with one good and one bad shot, and the command that produced each, and why you think the bad one is bad. Thanks again.