After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 168760 - Exception with non-UTF8 filenames
Exception with non-UTF8 filenames
Status: RESOLVED FIXED
Product: meld
Classification: Other
Component: general
0.9.x
Other Linux
: Low normal
: ---
Assigned To: Stephen Kennedy
Stephen Kennedy
: 351627 (view as bug list)
Depends on: 350801
Blocks:
 
 
Reported: 2005-02-28 14:27 UTC by Ross Burton
Modified: 2013-04-06 20:36 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Ross Burton 2005-02-28 14:27:02 UTC
From http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=296553:

When I run:
$ meld xxxx xxxx-
Traceback (most recent call last):
  • File "/usr/lib/meld/task.py", line 131 in iteration
    ret = task()
  • File "/usr/lib/meld/filediff.py", line 587 in _set_files_internal
    misc.run_dialog(
UnicodeDecodeError: 'utf8' codec can't decode bytes in
position 20-22: invalid data

Or the other:
$ meld yyyy yyyy.old

(meld:4031): Pango-WARNING **: Invalid UTF-8 string passed to
pango_layout_set_text()

(meld:4031): Pango-WARNING **: Invalid UTF-8 string passed to
pango_layout_set_text()
Traceback (most recent call last):
  File "/usr/lib/meld/task.py", line 131, in iteration
    ret = task()
  File "/usr/lib/meld/filediff.py", line 565, in
    _set_files_internal
yield _("[%s] Set num panes") % self.label_text
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb3 in
position 9: unexpected code byte
Comment 1 Stephen Kennedy 2005-04-06 19:13:10 UTC
What is the expected behaviour here? How do we find the correct encoding?
Do we take always assume filenames are encoded with the locale value?

On a practical note, it's quite error prone to have to convert encodings
and pass around the original encoding everywhere, so this is unlikely
to get fixed.
Comment 2 Ross Burton 2005-04-06 20:20:59 UTC
I believe the GLib standard is to use an environmental variable
"G_FILENAME_ENCODING".  I was sent this code for Sound Juicer:

  realpath = g_filename_from_utf8 (path, -1, NULL, NULL, &error);
  if (error) {
    const char *charset = g_getenv("G_FILENAME_ENCODING");
    if (!charset) {
      g_get_charset (&charset);
    }
    realpath = g_convert_with_fallback (path, -1, charset, "UTF-8","_", NULL,
NULL, NULL);
    g_error_free (error);
  }

There may be more elegant ways of doing this, but I think the relevant magic is
in the g_filename_* functions.
Comment 3 Stephen Kennedy 2005-04-06 20:35:19 UTC
The problem is not so much getting the path as propagating the fact that it was
converted to everywhere that uses filenames so that the file can be opened/moved
etc. Thats a nightmare methinks.
Comment 4 Ross Burton 2005-04-06 21:06:24 UTC
Shouldn't be.  When the file is loaded, check for G_FILENAME_ENCODING and if its
set, transform from that locale to UTF-8, and store the filename everywhere
inside Meld as UTF-8.  Then when saving, check for G_FILENAME_ENCODING again,
and if set transfrom from UTF-8 to the locale.

I believe the secret is still using UTF-8 internally everywhere, but being aware
that filenames may need to be converted before hitting the file system.

I have a feeling that if you use gnome-vfs to load/save files, this happens for
you (as you work with URIs).
Comment 5 Stephen Kennedy 2005-04-06 21:25:53 UTC
Well, thats fine if ALL files are encoded with G_FILENAME_ENCODING, but
I'll bet that in most cases there will be a mixture of utf8 and whatever.

I'll leave the bug open for the record, but am probably not going to implement
this as I still think the implementation/testing (especially testing) cost
would not be worth the effort.
Comment 6 Stephen Kennedy 2006-08-17 20:09:44 UTC
*** Bug 351627 has been marked as a duplicate of this bug. ***
Comment 7 Kai Willadsen 2009-10-11 00:43:11 UTC
I'd like to take a look at this, but I have no idea how to create problematic filenames. If anyone can provide guidance as to how one gets a variety of oddly encoded filenames on disk, it would be much appreciated.
Comment 8 Kai Willadsen 2013-04-06 20:36:51 UTC
This problem has (probably) been fixed in the development version (I hope). The fix will be available in the next major software release (assuming I didn't break anything). Thank you for your bug report.