GNOME Bugzilla – Bug 683667
gnome-clock crashes when an alarm name contains non-ascii characters.
Last modified: 2012-09-10 17:16:32 UTC
Created attachment 223847 [details] the samaple alarm data gnome-clock crashes when an alarm name contains non-ascii characters. Steps to reproduce: 1. Run gnome-clocks. 2. Click the Alarm button. 3. Click the New button to add a new alarm 4. Enter any non-ascii characters (e.g. Japanese "あ", whose codepoint is U+3042) in the Name entry. 5. Click the Save button to save the alarm. 6. Click the Quit button to exit gnome-clocks 7. Run gnome-clocks again, then gnome-clocks crashes. Traceback: Traceback (most recent call last):
+ Trace 230827
self.win = Window(self)
self.alarm = Alarm()
self.load_alarms()
self.add_alarm_widget(alarm)
label = GLib.markup_escape_text(alarm.name)
Sample alarm data (I've attached): $ cat ~/.local/share/gnome-clocks/alarms.json [{"minute": "56", "name": "\u3042", "hour": "21", "days": [0, 1, 2, 3, 4, 5, 6]}]
Created attachment 223848 [details] [review] patch for this bug Handle unicode encoding/decoding. Please review it.
The text retrieved from a gtk entry is always encoded as utf8, there should be no need to decode it... how did you trigger the bug? Is your system not using utf8 by default? Maybe the problem is the json serialization and we must force it it save as utf8
(In reply to comment #2) > The text retrieved from a gtk entry is always encoded as utf8, there should be > no need to decode it... how did you trigger the bug? Is your system not using > utf8 by default? Maybe the problem is the json serialization and we must force > it it save as utf8 I've already described how to reproduce the bug. Please see the above steps to reproduce. If you enter non-ascii string as alarm name, then a decoded unicode string (e.g. "name": "\u3042") will be dumped on alarms.json . You must encode the unicode sring from alarms.json in UTF-8 because GLib.markup_escape_text() requires UTF-8 string. That is why my patch encode string which is passed to GLib.markup_escape_text(). But as you said, gtk entry returns an encoded UTF-8 string, which is passed GLib.markup_escape_text() at the same location, too. So you must decode the UTF-8 string that gtk entry returned before you pass it to GLib.markup_escape_text().
Review of attachment 223848 [details] [review]: Sorry, but I still do not understand how this happens (I cannot test right now). That's why I was asking if your system is using a different encoding. If we need to do encoding conversion, that should happen in the AlarmStorage class when loading/saving the json file, but according to http://docs.python.org/library/json.html they should already be utf8 by default...
(In reply to comment #4) > Review of attachment 223848 [details] [review]: > > Sorry, but I still do not understand how this happens (I cannot test right > now). That's why I was asking if your system is using a different encoding. > I'm sorry for late reply. My system is Ubuntu 12.10 (Quantal Quetzal), which use UTF-8 as the default encoding. Here is information about my system: ---------------- $ uname -a Linux nurigabe 3.5.0-14-generic #15-Ubuntu SMP Thu Sep 6 22:57:58 UTC 2012 i686 i686 i686 GNU/Linux $ cat /etc/issue Ubuntu quantal (development branch) \n \l $ locale LANG=ja_JP.UTF-8 LANGUAGE=ja:en LC_CTYPE="ja_JP.UTF-8" LC_NUMERIC="ja_JP.UTF-8" LC_TIME="ja_JP.UTF-8" LC_COLLATE="ja_JP.UTF-8" LC_MONETARY="ja_JP.UTF-8" LC_MESSAGES="ja_JP.UTF-8" LC_PAPER="ja_JP.UTF-8" LC_NAME="ja_JP.UTF-8" LC_ADDRESS="ja_JP.UTF-8" LC_TELEPHONE="ja_JP.UTF-8" LC_MEASUREMENT="ja_JP.UTF-8" LC_IDENTIFICATION="ja_JP.UTF-8" LC_ALL= $ python --version Python 2.7.3 ---------------- > If we need to do encoding conversion, that should happen in the AlarmStorage > class when loading/saving the json file, but according to > http://docs.python.org/library/json.html they should already be utf8 by > default... It doesn't matter what encoding json uses. And, I don't convert encoding. I just do encode and decode by UTF-8. As the above traceback says, the direct cause for the crash is that an unicode string, which is NOT encoded to UTF-8 string, is passed to GLib.markup_escape_text(). GLib.markup_escape_text() requires UTF-8 strings (NOT unicode strings). But, json.load() returns unicode strings. The json reference about json.load() says, json.load() returns "simply decoded to a unicode object" [1]. That is why we must encode unicode string which json.load returns by UTF-8 before passing it to GLib.markup_escape_text(). (The souce of caribou may be helpful. [2]) [1] http://docs.python.org/library/json.html#json.load [2] http://git.gnome.org/browse/caribou/tree/caribou/antler/keyboard_view.py#n74
Thank you for the clarification! I committed the following patch that should fix the problem http://git.gnome.org/browse/gnome-clocks/commit/?id=fd612ef60b528c875693c7aac906df082eaa7f4a