GNOME Bugzilla – Bug 168760
Exception with non-UTF8 filenames
Last modified: 2013-04-06 20:36:51 UTC
From http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=296553: When I run: $ meld xxxx xxxx- Traceback (most recent call last):
+ Trace 56232
ret = task()
misc.run_dialog(
position 20-22: invalid data Or the other: $ meld yyyy yyyy.old (meld:4031): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text() (meld:4031): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text() Traceback (most recent call last): File "/usr/lib/meld/task.py", line 131, in iteration ret = task() File "/usr/lib/meld/filediff.py", line 565, in _set_files_internal yield _("[%s] Set num panes") % self.label_text UnicodeDecodeError: 'utf8' codec can't decode byte 0xb3 in position 9: unexpected code byte
What is the expected behaviour here? How do we find the correct encoding? Do we take always assume filenames are encoded with the locale value? On a practical note, it's quite error prone to have to convert encodings and pass around the original encoding everywhere, so this is unlikely to get fixed.
I believe the GLib standard is to use an environmental variable "G_FILENAME_ENCODING". I was sent this code for Sound Juicer: realpath = g_filename_from_utf8 (path, -1, NULL, NULL, &error); if (error) { const char *charset = g_getenv("G_FILENAME_ENCODING"); if (!charset) { g_get_charset (&charset); } realpath = g_convert_with_fallback (path, -1, charset, "UTF-8","_", NULL, NULL, NULL); g_error_free (error); } There may be more elegant ways of doing this, but I think the relevant magic is in the g_filename_* functions.
The problem is not so much getting the path as propagating the fact that it was converted to everywhere that uses filenames so that the file can be opened/moved etc. Thats a nightmare methinks.
Shouldn't be. When the file is loaded, check for G_FILENAME_ENCODING and if its set, transform from that locale to UTF-8, and store the filename everywhere inside Meld as UTF-8. Then when saving, check for G_FILENAME_ENCODING again, and if set transfrom from UTF-8 to the locale. I believe the secret is still using UTF-8 internally everywhere, but being aware that filenames may need to be converted before hitting the file system. I have a feeling that if you use gnome-vfs to load/save files, this happens for you (as you work with URIs).
Well, thats fine if ALL files are encoded with G_FILENAME_ENCODING, but I'll bet that in most cases there will be a mixture of utf8 and whatever. I'll leave the bug open for the record, but am probably not going to implement this as I still think the implementation/testing (especially testing) cost would not be worth the effort.
*** Bug 351627 has been marked as a duplicate of this bug. ***
I'd like to take a look at this, but I have no idea how to create problematic filenames. If anyone can provide guidance as to how one gets a variety of oddly encoded filenames on disk, it would be much appreciated.
This problem has (probably) been fixed in the development version (I hope). The fix will be available in the next major software release (assuming I didn't break anything). Thank you for your bug report.