GNOME Bugzilla – Bug 93751
Problems with file names encoding
Last modified: 2004-12-22 21:47:04 UTC
When i get a file name using g_dir_read_name (), if the name contains characters above 127, g_print complains saying [Invalid UTF-8]. Then i tried g_filename_to_utf8 but this one returns NULL and the error is the same: invalid utf8. The documentation about talks about "the encoding for filenames". I also created a GString by adding the characters in the file name using g_string_append_unichar and when i g_print it, no complains made, but the > 127 character is not displyed correctly (tried differents LC_ALL values such as C and es_ES and also setting the terminal in UTF8 mode).
Created attachment 11186 [details] contains test case, makefile and a file whose name causes trouble
Ah, forgot to say that i'm using glib 2.0.6 (debian sid)
Have you tried setting the G_BROKEN_FILENAMES envvar ? See http://developer.gnome.org/doc/API/2.0/glib/glib-running.html
g_print only accepts UTF-8, you must ensure anything passed to it is UTF-8. If you have non-UTF-8 filenames, then you have to set G_BROKEN_FILENAMES, and run in a locale that reflects your filenames (such as en_US.ISO-8859-1). Then g_filename_to_utf8() will convert the filenames. g_string_append_unichar() works because Latin-1 has the same numeric values as Unicode, so you are converting from Latin-1 to Unicode when you append each Latin-1 char as if it were a Unicode char. But this won't work in any locale other than Latin-1.
(My aim is to use g_utf8_to_utf16. That's why i need the file name to be utf8. So using g_print is just a way to show what's happening to me) I use linux and have not done anything to change the encoding of the file names. gpanjav@lalo2:~/gnome2/testfile$ echo $LC_ALL es_ES gpanjav@lalo2:~/gnome2/testfile$ echo $G_BROKEN_FILENAMES gpanjav@lalo2:~/gnome2/testfile$ ./testfile [Invalid UTF-8] GOMAESPUMA - El Calcetín.mp3 (null) Now using es_ES.ISO-8859-1 and setting G_BROKEN_FILENAMES: gpanjav@lalo2:~/gnome2/testfile$ export LC_ALL=es_ES.ISO-8859-1 gpanjav@lalo2:~/gnome2/testfile$ export G_BROKEN_FILENAMES=1 gpanjav@lalo2:~/gnome2/testfile$ ./testfile [Invalid UTF-8] GOMAESPUMA - El Calcetín.mp3 (null) And the same happens when using en_US.ISO-9959-1. Am i doing anything wrong?
Does your test app call setlocale()? (I'll look at it later)
Nope! But i have just tried setting setlocale (""); and it doesn't work either. Thanks for the quick replies!
Shouldn't that be setlocale (LC_ALL, "") ? you should verify that g_get_charset () returns indeed ISO-8859-1 after the setlocale call.
Yeah! Now it works when i set G_BROKEN_FILENAMES (and using the proper call to setlocale as Matthias pointed out). So i guess i will first try g_filename_to_utf8 and if it fails, i will setenv (if not already set) G_BROKEN_FILENAMES and try again. Or is there any other way to do it? Thanks! (i let the bug open for you both to see it just in case you wanna answer my question before closing it :-)
Oh, crap. setenv does not work cause 'broken' is only set on the first call to have_broken_filenames! Any hint?
If your filenames are broken, that is a property of your local environment. Thus you should put G_BROKEN_FILENAMES=1 in your environment. Apps should not set it, only read it.