After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 761482 - SystemLanguage handler fail to meet the w3c standard about language fallback
SystemLanguage handler fail to meet the w3c standard about language fallback
Status: RESOLVED OBSOLETE
Product: librsvg
Classification: Core
Component: general
git master
Other All
: Normal normal
: ---
Assigned To: librsvg maintainers
librsvg maintainers
Depends on:
Blocks:
 
 
Reported: 2016-02-03 04:52 UTC by Philip Tzou
Modified: 2017-12-13 18:14 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch (6.37 KB, patch)
2016-02-03 04:52 UTC, Philip Tzou
none Details | Review
test case (751 bytes, image/svg+xml)
2016-02-03 04:54 UTC, Philip Tzou
  Details
Patch 2 (6.35 KB, patch)
2016-02-03 06:02 UTC, Philip Tzou
rejected Details | Review
SVG spec rules for langtag matching (3.17 KB, patch)
2017-06-19 21:35 UTC, Gerald Roylance
none Details | Review
Fix POSIX test (3.18 KB, patch)
2017-06-19 21:49 UTC, Gerald Roylance
none Details | Review

Description Philip Tzou 2016-02-03 04:52:48 UTC
Created attachment 320311 [details] [review]
Patch

The most latest version of librsvg doesn't respect the language fallback defined by W3C very well. In the sentence "As an example, users may assume that on selecting "en-GB", they will be served any kind of English document if British English is not available." (https://www.w3.org/TR/SVGTiny12/struct.html#SystemLanguageAttribute), it does respect "any kind of English document" but ignored "if British English is not available".

Also, there're a few small bugs I also fixed along with this patch:

- If system language comes with an underscore (_) like "en_GB", the match result was incorrect.
- If system language comes with a dot (.) like "en.UTF-8", the match result was also incorrect.

Proof (use the svg file I attached):

```bash
LANG=en-GB rsvg-convert systemLanguageTest.svg -o test_en-GB.png
LANG=en_GB rsvg-convert systemLanguageTest.svg -o test_en_GB.png
LANG=en.UTF-8 rsvg-convert systemLanguageTest.svg -o test_en.UTF-8.png
```

I already craft a patch for this (see attachment too) but it doesn't include any new test case. Appreciate in advance if anyone can provide the test case(s).
Comment 1 Philip Tzou 2016-02-03 04:54:28 UTC
Created attachment 320313 [details]
test case
Comment 2 Philip Tzou 2016-02-03 06:02:50 UTC
Created attachment 320316 [details] [review]
Patch 2
Comment 3 André Klapper 2016-02-04 11:03:12 UTC
Downstream: https://phabricator.wikimedia.org/T125710
Comment 4 André Klapper 2016-03-08 20:53:13 UTC
Could the patch here get a review?
Comment 5 André Klapper 2016-10-27 19:21:21 UTC
Regarding the patch above, I'm forwarding a comment by another user from https://phabricator.wikimedia.org/T125710#2735004 who would like to not register on GNOME Bugzilla. (Note that I'm really just forwarding):


There are locale strings (with underscores) and langtags (with hyphens). RSVG seems to confuse the two notions. The patch that took locale strings and edited them into langtags seems ill advised. Using the name "locale" in RSVG is also ill advised when the comparisons are done on langtags.

RSVG should be using a library that already does langtag matching ala HTTP services. Java has such a library; I assume there is one for C++. (BTW, the Java library routines do convert langtags to locales.)

PhiLiP seems to have confused fallback matching, the SVG spec, and desired behavior. SVG 1.1 wants switch to use the first compatible systemLanguage clause. A fallback match is immediately in play; a fallback match is as good as an exact match; it should not be saved for later.

I'm basing above comments on this patch. https://bugzilla.gnome.org/show_bug.cgi?id=761482

I would deprecate that patch, but right now I'm struggling with phabricator. I searched for this bug report yesterday, but did not find it. Bawolff had to point it out to me.

The RSVG fallback matching algorithm is just wrong. That's immediately apparent because it only looks at one hyphen; a langtag can have several hyphens (another reason why using a library for the matching would be appropriate). But there's a more insidious bug.

The SVG matching algorithm requires that the user agent's langtag (userLang) be compared to the systemLanguage langtags; there's a match if userLang equals a systemLanguage langtag (e.g. userLang "en-GB" matches systemLangage "en-GB") or if userLang equals a systemLanguage langtag broken at a hyphen (e.g. userLang "en" will fallback-match systemLanguage "en-GB").

The SVG 1.1 matching algorithm never uses a subset of the user's langtag. If the user's langtag is "en-GB", then that user langtag does not fallback match the systemLanguage "en". A user demanding British English is not served a default English. See the 'Implementation Note" at

http://www.w3.org/TR/SVG/struct.html#ConditionalProcessingSystemLanguageAttribute

which states: 'Evaluates to "true" if one of the languages indicated by user preferences exactly equals one of the languages given in the value of this parameter, or if one of the languages indicated by user preferences exactly equals a prefix of one of the languages given in the value of this parameter such that the first tag character following the prefix is "-".'

That's not the behavior most people expect, but that is the behavior that SVG 1.1 demands. RSVG is not supposed to decide that if the user asks for en-GB and en-US is available, then I'll give him en-US. SVG 1.1 does not guess and does weigh options.

(Note the spec uses the plural "user preferences". User agent allows multiple preferences ala AcceptLanguages.)

The insidious bug in RSVG's matching is that it will chop the user's langtag. Say the user's langtag is "en-GB" and systemLanguage is "en-US". RSVG compares the two strings, finds they are not equal, and then looks for a fallback match. It notices that "en-US" has a hyphen at position 2, so it compares the first two characters of the user's langtag and the systemLanguage:

g_ascii_strncasecmp(userLang, systemLang, 2)

The result is the two are erroneously declared a fallback match. All characters of the user's langtag must match, but match never looks at the "-GB".

An appropriate test is (where systemLang is a langtag from systemLanguage)

if (strlen(userLang) == strlen(systemLang) and stricmp(userLang, systemLang) == 0)
    then exact match.
else if (strlen(userLang) < strlen(systemLang) and
           g_ascii_strncasecmp(userLang, systemLang, strlen(userLang)) == 0 and
           systemLang[strlen(userLang)] == '-')
    then subcomponent/fallback match
else
    no match

The RSVG fallback algorithm will even match user langtag "enx-VN" to "en-US".

An apparent goal of SVG 2.0 will use an acceptLanguages string ala SMIL allowReorder. A HTTP library would be even more appropriate for that exercise because weights must be calculated.
Comment 6 Gerald Roylance 2017-06-19 21:35:58 UTC
Created attachment 354062 [details] [review]
SVG spec rules for langtag matching

Patch changes rsvg_locale_compare to rsvg_langtag_compare.

The compare logic is changed so all of user agent langtag must be match. The previous hyphen-break test was flawed.

Previous logic for "zh-Hans" and "zh-Hant" would see the strings do not match. It would then find the hyphen at position 2 in "zh-Hant" and test if "zh" == "zh".

The SVG spec semantics requires all of the user agent langtag match b and b either ends at that point or has a hyphen.

Consequently. "zh-Hans" will match "zh-Hans-CN".

I cannot push or build.
Comment 7 Gerald Roylance 2017-06-19 21:41:26 UTC
strcmp (locale, "POSIX")

should be

strcmp (locale, "POSIX") == 0

... should have left it alone.
Comment 8 Gerald Roylance 2017-06-19 21:49:46 UTC
Created attachment 354064 [details] [review]
Fix POSIX test

see comments on previous patch.

put == 0 test on strlen
Comment 9 Gerald Roylance 2017-06-19 22:02:27 UTC
Review of attachment 320316 [details] [review]:

The patch does not follow the specification for langtag matching.

SVG 2.0 will use the SMIL allowReorder technique.

The user agent (librsvg) should have access to a list of user preferences. Typically, this would be a list derived from an Accept-Language string such as "en,fr;q=0.8,zh-Hant;q=.5,zh-Hans;q=.3". The user agent can then look at each clause in a switch element, compute its preference, and choose the one that makes the best match.
Comment 10 GNOME Infrastructure Team 2017-12-13 18:14:55 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/librsvg/issues/131.