GNOME Bugzilla – Bug 649165
Enumeration symbols are LC_CTYPE specific
Last modified: 2011-09-11 18:14:49 UTC
We recently got a bug report about PyGI programs failing in Turkish locales (https://launchpad.net/bugs/747796). Under tr_TR.UTF-8, exported symbols with an uppercase "I" come out as lowercase "i': $ python -c "import locale; from gi.repository import Gtk; locale.setlocale(locale.LC_ALL, 'tr_TR.UTF-8'); print dir(Gtk.IconSize)" ['BUTTON', 'DND', 'DiALOG', 'iNVALiD', ...] It is correct in C and e. g. de_DE.UTF-8: PYTHONPATH=. python -c "import locale; from gi.repository import Gtk; locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8'); print dir(Gtk.IconSize)" ['BUTTON', 'DIALOG', 'DND', 'INVALID', ...] In tr_TR.UTF-8, you get an AttributeError when you try to access e. g. Gtk.IconSize.DIALOG. It has been demonstrated that this can be worked around by enclosing a GI call into a pair of locale.setlocale(locale.LC_CTYPE, 'C') / locale.setlocale(locale.LC_CTYPE, '') calls. In other words, the initial "from gi.repository import Gtk" import worked correctly, but the bug seems to happen for (un)marshalling the symbol names. At the symbol marshalling time there should not happen any LC_CTYPE conversion? Or do we need to consider programming languages which actually support non-ASCII identifiers?
A simple reproducer is to run "LANG=tr_TR.UTF-8 make check". This will fail in both test_gi and test_gdbus, due to this problem. Interestingly, calling "LANG=tr_TR.UTF-8 TEST_NAMES=test_gi make check" does NOT fail, neither does TEST_NAMES=test_gdbus. It seems that running some other test (like the Gtk ones?) "clutters" the marshalling in some way.
For the record, Turkish locales map two ASCII characters to non-ASCII characters with respect to case conversion: "I" (capital I) is mapped to "ı" (lowercase i-dotless) "i" (lowercase i) is mapped to "İ" (capital I-dotted) This seems to be related, and it has been demonstrated that hacking the tr_TR locale definition for "toupper" and "tolower" to map i -> I and I->i also fixes the problem. Indeed there is a case conversion going on here with enumeration values. E. g. if the gir specifies <enumeration name="IconSize" [...] <member name="invalid" then this is accessed in python as "INVALID". I think gi/module.py IntrospectionModule.__getattr__() does this conversion. If I add some debugging to the "for value_info in info.get_values():" loop, I get: IntrospectionModule __getattr_(IconSize): upping dialog == DiALOG Importing Gtk somehow triggers a setlocale(), which explains why the gdbus and gi tests work when being called on their own, but fail under tr_TR.UTF-8 when test_overrides also run: $ python -c 'import locale; print locale.getlocale(); from gi.repository import Gtk; print locale.getlocale();' (None, None) ('de_DE', 'UTF8') So I think I have found the root cause here. The question is now whether we should really use a locale specific upper() call here, or just switch to the C locale for this? I think we should do the latter. locale-specific constant names do sound a bit crazy to me.
This patch fixes it: --- a/gi/module.py +++ b/gi/module.py @@ -24,6 +24,7 @@ from __future__ import absolute_import import os import gobject +import locale import gi from .overrides import registry @@ -121,7 +122,13 @@ class IntrospectionModule(object): wrapper.__module__ = 'gi.repository.' + info.get_namespace() for value_info in info.get_values(): + # switch to C locale for upper() to avoid locale specific + # identifier conversion (e. g. in Turkish 'i'.upper() == 'i') + # see https://bugzilla.gnome.org/show_bug.cgi?id=649165 + old_ctype = locale.getlocale(locale.LC_CTYPE) + locale.setlocale(locale.LC_CTYPE, 'C') value_name = value_info.get_name().upper() + locale.setlocale(locale.LC_CTYPE, old_ctype) setattr(wrapper, value_name, wrapper(value_info.get_value())) I'm now working on a proper test case, and then submit a complete git-formatted patch as an attachment.
Created attachment 187038 [details] [review] proposed patch Proposed patch, including test suite integration. Unfortunately the standard Python library does not have an ASCII only variant of upper(). gi.repository.GLib provides g_ascii_toupper(), but this only works on a single char, and I'm not sure whether it's a good idea to use this in gi/module.py (feels so recursive). Also, performance-wise it would be a lot worse, I suppose.
A few comments. First str.upper() is defined to be locale-dependent for 8-bit strings. I don't know what that implies for unicode strings. Second, I agree that locale-dependent identifiers is crazy talk :). I think I'd actually implement the temporary switch in locales as a context manager so that 1) it works for the with-statement; 2) you can't accidentally *not* reset the locale should some exception occur during the work. E.g. (mostly untested): import locale from contextlib import contextmanager @contextmanager def c_locale(): old_ctype = locale.getlocale(locale.LC_CTYPE) locale.setlocale(locale.LC_CTYPE, 'C') try: yield finally: locale.setlocale(locale.LC_CTYPE, old_ctype) with c_locale(): for value_info in info.get_values(): value_name = value_info.get_name().upper() setattr(wrapper, value_name, wrapper(value_info.get_value())) If you don't like the call syntax in the with statement, you could instead create a class with __enter__() and __exit__() methods, then instantiate that class globally so everyone has access to it.
Created attachment 187099 [details] [review] proposed patch Thanks Barry for your review. This indeed looks a bit complicated, but I agree that this should be implemented in a more robust way. So I just used a try:/finally: around the loop now, which should achieve the same effect.
Created attachment 187100 [details] [review] proposed patch Meh, forgot to run format-patch, this is really the updated one.
Hmm, I'm not sure how this fares with threaded applications -- could this affect locale-dependent functionality in threads running concurrently with this? Alternatively to switching locales back and forth, we could just use a private, locale-unaware toupper() function, like this one: --- 8< --- from string import maketrans fromchars = 'abcdefgjhijklmnopqrstuvwxyz' tochars = 'ABCDEFGJHIJKLMNOPQRSTUVWXYZ' transtable = maketrans(fromchars, tochars) def toupper(s): return s.translate(transtable) --- >8 ---
Created attachment 187105 [details] [review] proposed patch Of course, thanks Nils! KISS for the win :-)
Comment on attachment 187105 [details] [review] proposed patch Looks good to me. The local switching was insane but this is much better and most likely better performing.
Pushed to master and 2-28 branches.
*** Bug 655528 has been marked as a duplicate of this bug. ***
The fix contains an error: the call string.maketrans(... should read str.maketrans(... As a consequence of this fix, all imports of Gtk fails. See https://bugzilla.redhat.com/show_bug.cgi?id=737375. Would really like to reopen bug, but I can't (dumb? not allowed?).
(In reply to comment #13) > The fix contains an error: the call string.maketrans(... should read > str.maketrans(... > > As a consequence of this fix, all imports of Gtk fails. See > https://bugzilla.redhat.com/show_bug.cgi?id=737375. Would really like to reopen > bug, but I can't (dumb? not allowed?). This was fixed some time ago, in the pygobject-2-28 branch: http://git.gnome.org/browse/pygobject/commit/?h=pygobject-2-28&id=667bec76ccbc85cc1d54a0e68977dbda241c028c and in the master branch: http://git.gnome.org/browse/pygobject/commit/?id=8f89ff24fcac627ce15ca93038711fded1a7c5ed