GNOME Bugzilla – Bug 681915
pygtk import changes default encoding, pygobject doesn't
Last modified: 2012-08-15 20:55:53 UTC
We are spotting issues with the encodings porting code from pygtk to pygi, still python 2.7. >>> sys.getdefaultencoding() 'ascii' >>> u'¡Hola %s!' % 'camión' Traceback (most recent call last):
+ Trace 230678
>>> import gtk >>> sys.getdefaultencoding() 'utf-8' >>> u'¡Hola %s!' % 'camión' u'\xa1Hola cami\xf3n!' But if we try in GTK+3 (in a new Python session): >>> sys.getdefaultencoding() 'ascii' >>> from gi.repository import Gtk >>> sys.getdefaultencoding() 'ascii' >>> u'¡Hola %s!' % 'camión' Traceback (most recent call last):
The behaviour of Gtk3 is the correct one, it should never change the default encoding. You have other problem here, you're using non-ascii bytes when entering a "sequence of bytes" that you want to interpret as a "sequence of characters". So, *NEVER* do 'camión', just do u'camión'.
Hey Facundo, that's what I was wanting to know, if this is intended behaviour or a regression. That an import change the default encoding doesn't sound good for me too. So this is an important change and people porting code should be aware. Thanks.
(In reply to comment #2) > Hey Facundo, > > that's what I was wanting to know, if this is intended behaviour or a > regression. That an import change the default encoding doesn't sound good for > me too. So this is an important change and people porting code should be > aware. Thanks. Yeah, PyGObject is doing the right thing here imho. PyGTK should never have called sys.setdefaultencoding (goes way back to 2004, see bug #132040). From Python's docs (http://docs.python.org/library/sys.html#sys.setdefaultencoding): This function is only intended to be used by the site module implementation and, where needed, by sitecustomize. Once used by the site module, it is removed from the sys module’s namespace. Note that libraries are not valid users of that function. So, PyGobject not setting the default encoding to utf8 is a good thing as doing so like PyGTK used to do often leads to subtly broken code. As you can see here: #!/usr/bin/env python # -*- coding: utf8 -*- import sys print 'default encoding is', sys.getdefaultencoding() try: id = u'éou' str(id) except UnicodeEncodeError: print 'UnicodeEncodeError was raised' else: print 'Oops, where did the UnicodeEncodeError go???' print print 'let\'s import gtk and see what happens:' import gtk print 'default encoding is', sys.getdefaultencoding() try: id = u'éou' str(id) except UnicodeEncodeError: print 'UnicodeEncodeError was raised' else: print 'Oops, where did the UnicodeEncodeError go???'
Thanks all for the clarification, I will close this ticket then. I think for things like this users deserve a porting guide.