GNOME Bugzilla – Bug 96531
Function to guess display form of a filename
Last modified: 2011-02-18 16:13:45 UTC
I cannot stop thiking it is broken with several reasons below. - By starting a gedit once with G_BROKEN_FILENAMES=1 and restarting a new gedit next without it, in non-UTF-8 locales such as ja_JP.eucJP, you will get gtk_wanrings, such as: Gtk-Message: The filename "No_NAME 1" couldn't be converted to UTF-8 (try setting the environment variable G_BROKEN_FILENAMES): (following some Japanese message) - Not only in gedit, Nautilus has a similar problem. - glib, gedit, eel and nautilus has own copies of has_broken_filenames(). - I'm afraid GNOME project's calling a component of other project "broken". I'd like to propose some alternative solution to handle filename's encoding variants in glib's API so that other components can import it. Also, for locale varients of file contents, why not gedit provide some APIs? Then, gedit and gnome-terminal will be able to have Load feature of encodings without code duplication in-between.
Not really sure what this bug is asking for: - G_BROKEN_FILENAMES is a runtime configuration option, not an API. The API in Glib is: - There is a on disk encoding of filenames - There are functions to convert between this encoding and UTF-8. - If you _don't_ have a consistent encoding for the filenames on disk, you are in a broken situation - even more broken then the situation that G_BROKEN_FILENAMES refers to. :-) Making guesses as to what encoding a filename is in that fails g_filename_to_utf8() is in for display purposes is not unreasonable. Nautilus does this. But it is mostly a question of structuring your app correct to keep the filename in filename form as long as possible then converting to UTF-8 only for display. I suppose we could add a g_filename_to_display_utf8() call that first tried g_filename_to_utf8() and then if that failed, g_locale_to_utf8(), but I'm not sure that that is that useful. - File content encoding handling is basically a totally separate issue. It certainly would be nice to have encoding guessing functionality for files and maybe IO channels. But that is a reasonably hard problem. (I thought there was an open bug on this, I don't see it, however)
Calling G_BROKEN_FILENAMES an API may be wrong. Sure. I'm okay to call it a run-time configuration option. But this is not really a point I'd want to argue :-) Basically, what I'm asking for are - two things. 1) Let's stop relying on G_BROKEN_FILENAMES. 2) And start thinking of a better way to handle filename's encodings for gedit and nautilus and other gnome apps. It may be not glib's issue, but some other modules. gnome-vfs could have encoding information of a file, for example. I understood a purpose of G_BROKEN_FILENAMES but anyway it is insufficient to save really broken situation where more than two different encoded filenames are found in the same diretory folder. [Agreed on file contents is an separate issue. I did not know if it's really glib's so did not log a separate bugzilla. Do you have any idea which module should be right? What I'm asking for it is to propose to have a common code in one module so that gedit and gnome-terminal should avoid duplicated codes as well as duplicated bugs.]
Reopening... My claim was basically from the fact that I've had to find and fixed many similar bugs. They are due to lack of call to g_filename_from/to_utf8() for the code conversion between gtk_file_selection widget using string(UTF-8) and the strings of the filenames. There were indeed lots and still a few left, like bug 114296 in libgnomeui. To some extents, this has something to do with dependency on G_BROKEN_FILENAMES=TRUE, so I'm wondering if there can be any better way to handle filenames in GTK+ and GNOME land than G_BROKEN_FILENAME and g_filename_to/from_utf8(). So, I'd like to argue here what can be a resolution if any.
The plan is that the GTK+-2.4 file selector will separate out "display name" from "file encoding", so you'll be able to select files in the GUI with broken encodings. I don't think there are any other places in GTK+ that make interpretation of filenames. Certainly none in GLib/ATK/Pango. I suppose a "make a best effort to create a display form of this filename" function could be useful, though guessing the encoding of a string as short as a filename is almost impossible.
Some related comments in bug 114068
Thanks for the head up. I read the discussion in the bug 114068, but I do prefer to go with the above mentioned plan that one is able to select "display name" in fileselection. Has this been implemented already in late GTK+? Then, I'm fine to close this bug fixed,(and I'd second not to have 114068 change, either).
Glib interprets the variable G_FILENAME_ENCODING now, which can be set to a list of encodings (with @locale being recognized as the charset of the current locale). This list of encodings could be used for iterative guessing in a g_filename_get_display_name() function.
About comment #6, the new GtkFileChooser is in place and it does the right thing with respect to filenames / display names: it may not be able to display a filename, but it does let you select it and otherwise work with it. Indeed, as Matthias says, we still don't handle a list of encodings in G_FILENAME_ENCODING; only the first element in that list is used. I don't know if we still need a function that guesses display versions of filenames based on such a list. See the recommendations here: http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#file-name-encodings and here: http://primates.ximian.com/~federico/news-2004-06.html#15
we have a function now to guess display versions of filenames based on a list of encodings.