After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 426185 - gobject.markup_escape_text mishandles python unicode type
gobject.markup_escape_text mishandles python unicode type
Status: RESOLVED FIXED
Product: pygobject
Classification: Bindings
Component: general
3.2.x
Other All
: Normal minor
: ---
Assigned To: Nobody's working on this now (help wanted and appreciated)
Python bindings maintainers
Depends on:
Blocks:
 
 
Reported: 2007-04-04 09:47 UTC by Dima Tisnek
Modified: 2013-02-27 14:31 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Dima Tisnek 2007-04-04 09:47:17 UTC
I would like to be able to do this:
something.set_markup(_("<b>Total: %s</b>") % gobject.markup_escape_text(mydata))

at least data types 'str' and 'unicode' should be handled correctly.
preferably any data convertable to string should be accepted, that is anything that defines __str__().

gobject.markup_escape_text() should allow type 'unicode' input and return same type as input. If unicode string is escaped, it should first be encoded to utf-8, ran through g_markup_escape and decoded back to unicode string. Alternatively a pure python replacement can be made.

g_markup_escape() expects utf-8 string, gobject.markup_escape_text() accepts both str() and unicode(), however does not handle unicode type correctly.

Other information:
# test case:
>>> gobject.markup_escape_text(u"\u20ac")
Traceback (most recent call last):
  • File "<stdin>", line 1 in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

Comment 1 Paul Pogonyshev 2007-04-07 14:05:33 UTC
If you do sys.setdefaultencoding ('utf-8'), it will be at least possible to pass unicode objects into gobject.markup_escape_text() (though it will still return strings.)  This is what is done by gtk (in later versions by pango package) anyway, so this will not break anything.
Comment 2 Paul Pogonyshev 2007-04-07 14:07:54 UTC
See bug #328031.
Comment 3 ulrik sverdrup 2011-04-02 15:09:55 UTC
Paul: sys.setdefaultencoding ('utf-8')  only made it harder for us using pygtk to catch encoding bugs. This was very frustrating. Also it did not help that pygtk did mix up  "UTF-8" (a byte string in a particular encoding) and "Unicode" (unicode, abstract text type)  in both the documentation and implementation.
Comment 4 Martin Pitt 2012-04-21 11:53:08 UTC
Confirmed with 3.2.
Comment 5 Simon Feltman 2013-02-06 05:37:53 UTC
For reference, this talk has some good points as to why we should not be using setdefaultencoding:
http://nedbatchelder.com/text/unipain.html
Comment 6 Martin Pitt 2013-02-27 14:31:12 UTC
The static bindings for these went away a while ago, so with the 3.7.x versions this should be fixed.