GNOME Bugzilla – Bug 426185
gobject.markup_escape_text mishandles python unicode type
Last modified: 2013-02-27 14:31:12 UTC
I would like to be able to do this: something.set_markup(_("<b>Total: %s</b>") % gobject.markup_escape_text(mydata)) at least data types 'str' and 'unicode' should be handled correctly. preferably any data convertable to string should be accepted, that is anything that defines __str__(). gobject.markup_escape_text() should allow type 'unicode' input and return same type as input. If unicode string is escaped, it should first be encoded to utf-8, ran through g_markup_escape and decoded back to unicode string. Alternatively a pure python replacement can be made. g_markup_escape() expects utf-8 string, gobject.markup_escape_text() accepts both str() and unicode(), however does not handle unicode type correctly. Other information: # test case: >>> gobject.markup_escape_text(u"\u20ac") Traceback (most recent call last):
+ Trace 124895
If you do sys.setdefaultencoding ('utf-8'), it will be at least possible to pass unicode objects into gobject.markup_escape_text() (though it will still return strings.) This is what is done by gtk (in later versions by pango package) anyway, so this will not break anything.
See bug #328031.
Paul: sys.setdefaultencoding ('utf-8') only made it harder for us using pygtk to catch encoding bugs. This was very frustrating. Also it did not help that pygtk did mix up "UTF-8" (a byte string in a particular encoding) and "Unicode" (unicode, abstract text type) in both the documentation and implementation.
Confirmed with 3.2.
For reference, this talk has some good points as to why we should not be using setdefaultencoding: http://nedbatchelder.com/text/unipain.html
The static bindings for these went away a while ago, so with the 3.7.x versions this should be fixed.