GNOME Bugzilla – Bug 547254
g_file_new_for_commandline_arg docs should clarify codeset/encoding issues
Last modified: 2014-01-12 17:50:24 UTC
Would adding this text to the doc comment for g_file_new_for_commandline_arg() correspond to what is its intended semantics? I.e., is the intent that this can be used directly for argv[] elements? * On POSIX, @arg is in the GLib file name encoding. That is, in case * it is a file name, it is treated as a byte string identical to the * file name on disk. No codeset or encoding is enforced or * assumed. On Windows, @arg should be as in the argument * vector passed to main(), that is, in the C library's locale * encoding. (and then, to match what it says, for G_OS_WIN32 the argument should in the file name case be converted from the locale to the GLib file name encoding using g_locale_to_utf8().) Or is the intent that this function should be passed file name arguments from the command line that have already been passed through GOption processing? On POSIX that makes no difference, but on Windows such arguments have then already been converted from the locale encoding the the GLib file name encoding, i.e. UTF-8. I guess in general much of the GIO API that deals with file names could use clarified documentation about encoding issues. I assume this API has been developed by people on modern Linux systems where UTF-8 locales and file names are strongly recommended (even if not necessarily absolutely enforced), so it has been easy to forget legacy POSIX systems and sites with lots of legacy file systems on large file servers etc where file names on disk can be in random encodings, locales use for instance EUC-JP, etc.
Yeah, I guess that is true (the need for clarification, and the desire to forget icky encoding issues...)
In general the GIO filenames are byte strings that correspond to the on-disk filenames and should not be assumed to be in any specific encoding, or even valid. This is the only way to handle filenames, because they can be (in unix) anything, and we still need to be able to reference those files. If you need to display a filename you can either use the display name as per query_info() which is guaranteed to be in utf8, or use the uri. However, the question still stands wrt g_file_new_for_commandline_arg(), as that has a special use and we need to make sure it works. I don't know what is the best solution, require passing through g_option/g_locale_to_utf8 or require it to be as in the main() arg vector. What is your opinion?
See bug 722025 for some more recent discussion on this topic (which should resolve the issue entirely). *** This bug has been marked as a duplicate of bug 722025 ***