After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 620579 - Accept unicode objects in addition to strings
Accept unicode objects in addition to strings
Status: RESOLVED FIXED
Product: pygobject
Classification: Bindings
Component: introspection
unspecified
Other Linux
: Normal major
: ---
Assigned To: Nobody's working on this now (help wanted and appreciated)
Python bindings maintainers
Depends on:
Blocks: 619039
 
 
Reported: 2010-06-04 16:45 UTC by Toms Bauģis
Modified: 2010-12-17 17:25 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
when converting to UTF-8 accept Python Unicode objects as input (Python 2) (1.91 KB, patch)
2010-11-17 19:40 UTC, johnp
none Details | Review
when converting to UTF-8 accept Python Unicode objects as input (Python 2) (3.41 KB, patch)
2010-11-18 18:30 UTC, johnp
committed Details | Review
Properly handle unicode object in properties (4.00 KB, patch)
2010-12-16 20:46 UTC, johnp
committed Details | Review

Description Toms Bauģis 2010-06-04 16:45:34 UTC
Right now if unicode is passed in as value, it is rejected instead of being encoded.
Comment 1 Martin Pitt 2010-11-17 09:24:45 UTC
Confirmed with current pygobject 2.27.0, still requires an explicit .encode('UTF-8') everywhere.
Comment 2 johnp 2010-11-17 16:32:40 UTC
On Python 2?  Can you provide an example?  Python 3 handles unicode much better.
Comment 3 Martin Pitt 2010-11-17 17:15:44 UTC
Yes, it happens with python 2.6 and 2.7:

$ python -c 'from gi.repository import Gtk; Gtk.require_version("3.0"); l=Gtk.Label(); l.set_label(u"foo")'

(-c:22055): Gtk-WARNING **: Unable to locate theme engine in module_path: "pixmap",
Traceback (most recent call last):
  • File "<string>", line 1 in <module>
  • File "/usr/lib/pymodules/python2.6/gtk-2.0/gi/types.py", line 40 in function
    return info.invoke(*args)
TypeError: argument 1: Must be string, not unicode

Comment 4 johnp 2010-11-17 19:40:45 UTC
Created attachment 174712 [details] [review]
when converting to UTF-8 accept Python Unicode objects as input (Python 2)
Comment 5 johnp 2010-11-17 19:41:27 UTC
Does this work for you?
Comment 6 Martin Pitt 2010-11-18 15:02:22 UTC
It does accept unicode objects now, and as long as the actual strings just have ASCII characters, it works.

However, now it crashes on non-ASCII characters with

$ python -c 'import locale; locale.setlocale(locale.LC_ALL, ""); print locale.getlocale(); from gi.repository import Gtk; Gtk.require_version("3.0");
l=Gtk.Label(); l.set_label(u"ä")'
('de_DE', 'UTF8')

Traceback (most recent call last):
  • File "<string>", line 2 in <module>
  • File "/usr/lib/pymodules/python2.6/gtk-2.0/gi/types.py", line 40 in function
    return info.invoke(*args)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)


Explicitly calling Gtk.set_locale() doesn't help either (although that shouldn't be required in the first place).

Thanks!
Comment 7 johnp 2010-11-18 16:54:59 UTC
Martin, is this a regression or has it never worked with PyGObject/PyGtk?
Comment 8 Martin Pitt 2010-11-18 17:05:24 UTC
It is a regression from pygtk. It has always "just worked" with pygtk to supply unicode objects to pygtk objects, and strings were displayed normally. I am currently porting a pygtk (2.0) project of mine to pygi Gtk 3.0, and that's how I noticed that in the first place. My current workaround is to add .encode('UTF-8') everywhere.

I don't know whether it has ever worked with pygobject and GI, since I really just started using it a few days ago.
Comment 9 johnp 2010-11-18 18:30:38 UTC
Created attachment 174795 [details] [review]
when converting to UTF-8 accept Python Unicode objects as input (Python 2)

ha, git-bz doesn't handle unicode either so I had to append this patch manually.  This should fix your issue.
Comment 10 johnp 2010-11-18 18:40:05 UTC
I should note that Python 3 makes these issue go away (e.g. it properly handles Unicode).
Comment 11 Martin Pitt 2010-11-18 18:59:41 UTC
Awesome, that does it. Thank you!
Comment 12 Philip Withnall 2010-11-18 21:09:38 UTC
With Python 2 and this pygobject master (i.e. with this patch), I'm getting TypeErrors in Totem with the patch in bug #619039 applied:

Traceback (most recent call last):
  • File "/opt/gnome2/build/lib64/totem/plugins/opensubtitles/opensubtitles.py", line 319 in do_activate
    self.os_append_menu()
  • File "/opt/gnome2/build/lib64/totem/plugins/opensubtitles/opensubtitles.py", line 427 in os_append_menu
    tooltip=_(u"Download movie subtitles from OpenSubtitles"))
TypeError: could not convert value for property `label'

The file has "# -*- coding: utf-8 -*-" specified at the top. The code in question:

        self.action = Gtk.Action(name='opensubtitles',
                                 label=_(u'_Download Movie Subtitles…'),
                                 tooltip=_(u"Download movie subtitles from OpenSubtitles"))

(Note the Unicode ellipsis in the "label" property value.)

Is this a problem at my end?
Comment 13 johnp 2010-11-18 23:55:47 UTC
No, we handle properties a bit different.  To be clear, the patch didn't break you, it just didn't fix your issue - correct?
Comment 14 Philip Withnall 2010-11-19 00:23:17 UTC
(In reply to comment #13)
> No, we handle properties a bit different.  To be clear, the patch didn't break
> you, it just didn't fix your issue - correct?

Correct.
Comment 15 Martin Pitt 2010-11-19 08:17:43 UTC
In case it's helpful for testing, small standalone reproducer:

python -c 'from gi.repository import Gtk; Gtk.require_version("3.0"); Gtk.MessageDialog(message_format=u"hello ♥").run()'

It does accept the unicode argument just fine (u"hello" works). Explicitly setting the locale doesn't help.

With old pygtk2 this just worked:

python -c 'import gtk; gtk.MessageDialog(message_format="hello ♥").run()'
Comment 16 Johan (not receiving bugmail) Dahlin 2010-11-19 13:03:44 UTC
For the old static bindings, pango calls (which is imported by the gtk)

  PyUnicode_SetDefaultEncoding("utf-8");

However, the oldest open PyGTK bug asks to avoid calling that, see
https://bugzilla.gnome.org/show_bug.cgi?id=132040

I guess that should be done for at least python 2.x for compatibility. For Python 3.x there's a chance to avoid doing that.

All strings in Glib/Gtk are utf-8 and relevant linux distributions also  use utf-8 everywhere, see http://fedoraproject.org/wiki/Features/PythonEncodingUsesSystemLocale
for instance

Either way, pygobject should do the right thing per default, it should not be necessary to set the locals manually.
Comment 17 Philip Withnall 2010-12-13 19:26:38 UTC
John, any news?
Comment 18 johnp 2010-12-14 02:31:45 UTC
Sorry, no.  I haven't looked at it.  I'll look a bit more tomorrow.
Comment 19 johnp 2010-12-16 20:46:58 UTC
Created attachment 176550 [details] [review]
Properly handle unicode object in properties

 There are still some cavets in Python 2:
      - properties are returned as String objects with the unicode code points
      - you must add # coding=utf-8 to the top of your python file or python
        will error out if it sees embeded unicode charaters (such as when
        supporting python 3 and python 2 from the same source)
Comment 20 johnp 2010-12-16 20:50:21 UTC
This passes Martin's example in comment 15 except when run from the command like my terminal will convert the heart to its unicode code points.  When run from a file with # coding=utf8 at the top this works flawlessly.  Note that Python 2 does still have some drawbacks when working with utf8 that aren't going to be fixed but at least now we accept unicode objects.
Comment 21 johnp 2010-12-16 20:51:03 UTC
Please test out patch and let me know if it fixes your issue and I can commit
Comment 22 Philip Withnall 2010-12-17 13:17:03 UTC
It fixes the issue for Totem, thanks!