GNOME Bugzilla – Bug 309947
Converting filenames doesn't work as expected
Last modified: 2018-05-24 10:44:25 UTC
I have a filename with this basename: Massive%20T%F7ne%20-%20Cruisen.mp3 Rhythmbox says that the filename is broken (maybe because it is not UTF-8). Nautilus can display the filename though.
Created attachment 48899 [details] [review] Proposed Patch This patch uses g_locale_to_utf8 instead of g_filename_to_utf8 to produce the same results as nautilus does. It also checks whether the filename is already a valid UTF-8 string and does no conversion in this case.
Replacing g_filename_to_utf8 with g_locale_to_utf8 seems broken to me, it doesn't do the same thing at all, g_filename_to_utf8 uses the G_FILENAME_ENCODING environment variable to get the charset to convert from, while g_locale_to_utf8 arbitrarily uses your locale charset (which will be UTF-8 most of the time with modern distros). Your filename seems to be 7-bit ASCII, so it's valid UTF-8. The tmp = gnome_vfs_unescape_string_for_display (basename); seems weird though, I guess what happens is that it will unescape the string, and replace the %F7 with an invalid utf-8 character, which explains the failure.
Created attachment 48900 [details] [review] patch Okay, this patch changes the following: 1. Don't use g_filename_to_utf8 if the filename is already a UTF-8 string 2. if g_filename_to_utf8 fails, try with g_locale_to_utf8 So in case the string is already UTF-8 the patch omits the g_filename conversion and in case the conversion fails it tries again with g_locale. I think this patch could make both of us happy.
What's the point of 2 when you can get the same behaviour is you set G_FILENAME_ENCODING to whatever your locale is ? This has the advantage of giving you a consistent behaviour between all gtk+ apps. However, I'm a bit concerned by your bug report. Is the real file name Massive%20T%F7ne%20-%20Cruisen.mp3 or did you paste an escaped file name as used in an http url for example?
It is the content of the basename variable.
Several recently changes in cvs (to fix other bugs) have potentially made thing worse with that kind of file name. Rhythmbox now un-escapes and then re-escapes uris to get a "canonical uri", which it needs for the database. There are some cases where it is given a file name or uri by an external program or file, and it has to guess whether it is escaped already or not - if it's given an unescaped file name which contain "percent digit digit", it might get it wrong.
URIs generated internally should be correctly handled, and Rhythmbox does about as well as it can to with ones passed from external programs, so I'm closing this.
This is anything but fixed. These errors still happen on my system with latest HEAD and if you don't want to fix this mark it as a WONTFIX, but not fixed.
Just to check, the name of the file actually has "%F7" and the like in it, they aren't just the escaped form of it's real name? Also, how are you trying to add the file to Rhythmbox (Import Folder, Import File, drag-and-drop, etc)?
Everything works correctly with a filename with '%F7' in it. Invalid utf-8 filenames (such as ones that would contain '%F7' when escaped) also work everywhere except the song info window, which says "Unknown file name" rather than attempting to display the filename. This is exactly what the glib documentation says we should do. With G_FILENAME_ENCODING set appropriately, filenames valid in some other encoding (not valid utf-8) are also displayed correctly. What this patch does is add another attempt at converting filenames to something we can display if g_filename_to_utf8 fails. This will help in situations where the user has mostly utf-8 filenames (so they don't want to set G_FILENAME_ENCODING), but also has some in their chosen non-utf8 LC_CTYPE encoding. Since most users would be using utf-8 locales now, I don't think this really helps much.
(In reply to comment #10) > Everything works correctly with a filename with '%F7' in it. Invalid utf-8 > filenames (such as ones that would contain '%F7' when escaped) also work > everywhere except the song info window, which says "Unknown file name" rather > than attempting to display the filename. This is exactly what the glib > documentation says we should do. With G_FILENAME_ENCODING set appropriately, > filenames valid in some other encoding (not valid utf-8) are also displayed > correctly. Where does the glib documentation tell this?
I think I was talking about this: http://library.gnome.org/api/glib/2.8/glib-Character-Set-Conversion.html.en#file-name-encodings-checklist
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/rhythmbox/issues/64.