GNOME Bugzilla – Bug 321144
gnome_file_entry_get_full_path() should return glib encoded filename
Last modified: 2005-12-28 10:17:29 UTC
I newly compiled the libgnomeui cvs, and when I run libgnomeui/test-gnome/test-gnome, I found below bug: 1) run test-gnome which under test-gnome directory of libgnomeui 2) click 'pixmap entry', then 'Pixmap Entry' dialog popup 3) click 'Browse' Button, and select one picture file whose filename is non-utf8 encoded result: 1) The picture can not display on preview area. I run these steps on zh_CN.GB18030( one popular locale in China ) on solaris.
refresh_preview() use gnome_file_entry_get_full_path() to get the picture file full path, and gnome_file_entry_get_full_path() will return the path in utf8 encoding, and then use gdk_pixbuf_new_from_file() to create pixbuf. As a result, gdk_pixbuf_new_from_file() can't find this utf8-encoded file on disk, because the filename on disk it gb18030 encoded, so can't preview picture file.
Yandong, do you know how to create patch? Just fix problem in your sources (they are downloaded from CVS, right?) and then type command "cvs diff -upr > patch.diff" . That command will create file patch.diff with required changes. Attach patch to this bug (there is link "Create a New Attachment" below). Of course, if it's hard to you I can do it myself, but I thought you are interesing in gnome hacking :)
Yes, I am interest in gnome hacking, I will try to provide one patch for this bug.
Created attachment 54723 [details] [review] convert filename encoding from utf8 to glib encoding
Thanks, I've committed this patch to both branches. The only minor issue was in parenthesis around g_utf8_get_filename (we use some_func (arg1, arg2), not some_func ( arg1, arg2 )). You are welcome to fix other problems :)
Created attachment 56425 [details] [review] patch Sometimes filename itself is in UTF-8 encoding. In this case, it's unnecessary to invoke g_filename_from_utf8() when Glib encoding is non-utf8 locale. Otherwise, the filename possibly is displayed properly in file chooser dialog, but the file can't be opened.
Created attachment 56426 [details] [review] patch Sometimes filename itself is in UTF-8 encoding. In this case, it's unnecessary to invoke g_filename_from_utf8() when Glib encoding is non-utf8 locale. Otherwise, the filename possibly is displayed properly in file chooser dialog, but the file can't be opened.
Zheng, the behaviour you suggest is quite unclear actually, there is no way for programmer looking at the api docs understand it correctly. So I wonder if it's needed at all.
Hmm, looking in gnome-file-entry.c, I notice that gnome_file_entry_get_full_path() is also used in fentry_get_property() to return the value of the PROP_FILENAME property. So, this property, when queried, will then be returned in the GLib file name encoding. OK, that might be what we want, but it should probably be documented. However, setting the PROP_FILENAME property in fentry_set_property() directly passes the value without any conversion to gtk_entry_set_text(). I.e. when you set the property it should be in UTF-8. To me this inconsistency seems rather unfortunate. Is it really what we want? At least, it should be well documented. The patch also didn't add any documentation that gnome_file_entry_get_full_path() returns the string in GLib file name encoding. Are we sure there aren't dozens of other functions in various GNOME platform libraries that take or return pathnames, where there is confusion whether the pathnames are in UTF-8 always, or in the GLib file name encoding (= on-disk encoding on Unix)? They should all be searched for and notes about the encoding added to their doc comments....
Hi Nickolay, simon zheng's situation will appear if you have both utf8 encoded filename and non-utf8 encode filename on disk, such as you login utf8 locale, create on multibyte filename which is utf8 encoded, then logout, login non-utf8 locale, such as gb18030, and create one mutlibyte filename which is gb18030 locale. And gtkfilechooser can display both filename correctly on gb18030 locale(can't display both file correctly on utf8 locale, will display gb18030 encoded file as ???), cause gtkfilechooser will determine whether filename is utf8 or not, if yes, then gtkfilechooser will display it directly, if not, gtk will convert it from gb18030 to utf8 and then display it. This kind of behavior of gtk can display correctly sometimes(very rarely for end user, and commonly for users who will switch locale frequently, just like QA) and reduce garbled filename, but it will cause many extra code in gnome just like this one. I am wondering that whether this behavior is needed or not, but it is actually in.
Note that in general it is not possible to "determine" whether a fle name is (intended to be) UTF-8 or not. Sure, the bytes in a file name might be valid UTF-8, but that doesn't mean it can't also be some other encoding, on the contrary. There is in general no way to know what file name encoding the user who originally created a file used. All UTF-8 strings are also valid ISO-8859-x strings, for instance, so in locales where ISO-8859-x is/was a common legacy encoding, I don't see how some software could be able to "correctly" display a file name that is in ISO-8859-x if it happens to also be legal UTF-8, if the software chooses to display it as UTF-8 if it is valid UTF-8. (The software wouldneed to have some advanced AI and a dictionary to guesstimate which interpretation makes most sense...) Maybe the situation is different for UTF-8 vs. GB18030.
Tor, your mentioned example is the same as UTF-8 vs. GB18030. I looked into g_filename_display_name(), and found glib encoding is conversation standard. If glib encoding is UTF-8, the filename will displayed as legal UTF-8. Similarily, if glib encoding is ISO-8859-x, the filename will displayed as legal ISO-8859-x. Please note, this filename can be displayed, but we don't guarantee to display properly, perphaps not your expected one. Do we think this kind of display is invaild? If so, it's uncessary to support operation on those files in encoding differet from glib encoding, and the patch can be cacelled.