After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 681915 - pygtk import changes default encoding, pygobject doesn't
pygtk import changes default encoding, pygobject doesn't
Status: RESOLVED NOTABUG
Product: pygobject
Classification: Bindings
Component: introspection
Git master
Other Linux
: Normal normal
: ---
Assigned To: Nobody's working on this now (help wanted and appreciated)
Python bindings maintainers
Depends on:
Blocks:
 
 
Reported: 2012-08-15 14:00 UTC by Manuel Quiñones
Modified: 2012-08-15 20:55 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Manuel Quiñones 2012-08-15 14:00:34 UTC
We are spotting issues with the encodings porting code from pygtk to pygi, still python 2.7.

>>> sys.getdefaultencoding()
'ascii'
>>> u'¡Hola %s!' % 'camión'
Traceback (most recent call last):
  • File "<stdin>", line 1 in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
>>> import gtk
>>> sys.getdefaultencoding()
'utf-8'
>>> u'¡Hola %s!' % 'camión'
u'\xa1Hola cami\xf3n!'

But if we try in GTK+3 (in a new Python session):

>>> sys.getdefaultencoding()
'ascii'
>>> from gi.repository import Gtk
>>> sys.getdefaultencoding()
'ascii'
>>> u'¡Hola %s!' % 'camión'
Traceback (most recent call last):
  • File "<stdin>", line 1 in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

Comment 1 Facundo Batista 2012-08-15 14:12:31 UTC
The behaviour of Gtk3 is the correct one, it should never change the default encoding.

You have other problem here, you're using non-ascii bytes when entering a "sequence of bytes" that you want to interpret as a "sequence of characters".

So, *NEVER* do 'camión', just do u'camión'.
Comment 2 Manuel Quiñones 2012-08-15 14:23:57 UTC
Hey Facundo,

that's what I was wanting to know, if this is intended behaviour or a regression.  That an import change the default encoding doesn't sound good for me too.  So this is an important change and people porting code should be aware.  Thanks.
Comment 3 Dieter Verfaillie 2012-08-15 20:26:31 UTC
(In reply to comment #2)
> Hey Facundo,
> 
> that's what I was wanting to know, if this is intended behaviour or a
> regression.  That an import change the default encoding doesn't sound good for
> me too.  So this is an important change and people porting code should be
> aware.  Thanks.

Yeah, PyGObject is doing the right thing here imho. PyGTK should
never have called sys.setdefaultencoding (goes way back to 2004,
see bug #132040).

From Python's docs (http://docs.python.org/library/sys.html#sys.setdefaultencoding):

  This function is only intended to be used by the site module
  implementation and, where needed, by sitecustomize. Once used by the
  site module, it is removed from the sys module’s namespace.

Note that libraries are not valid users of that function.

So, PyGobject not setting the default encoding to utf8 is a good
thing as doing so like PyGTK used to do often leads to subtly broken
code. As you can see here:

#!/usr/bin/env python
# -*- coding: utf8 -*-

import sys
print 'default encoding is', sys.getdefaultencoding()
try:
    id = u'éou'
    str(id)
except UnicodeEncodeError:
    print 'UnicodeEncodeError was raised'
else:
    print 'Oops, where did the UnicodeEncodeError go???'

print
print 'let\'s import gtk and see what happens:'
import gtk
print 'default encoding is', sys.getdefaultencoding()
try:
    id = u'éou'
    str(id)
except UnicodeEncodeError:
    print 'UnicodeEncodeError was raised'
else:
    print 'Oops, where did the UnicodeEncodeError go???'
Comment 4 Manuel Quiñones 2012-08-15 20:55:53 UTC
Thanks all for the clarification, I will close this ticket then.  I think for things like this users deserve a porting guide.