GNOME Bugzilla – Bug 318280
Adding "(invalid encoding)" to file names with unknown encodings confuses me
Last modified: 2008-10-13 00:29:54 UTC
When encountering file names with characters unknown to the current encoding, nautilus adds a string "(invalid encoding)" to the file name. When I rename the file (e.g. replace the question marks with the right umlauts), the string is added to the new file name, garbling the file extension if I do not remove it manually. Suggestions: a) As soon as I click on it or press F2 to rename, remove the "(invalid encoding)". If one leaves invalid chars in there, re-add it after finishing the renaming process. b) Instead of the string, blend a warning icon over the file name at the rightmost position c) "Guess" the right encoding (e.g. by a configurable priority table (for me, it could look like "first try UTF-8, then ISO8859-1, then ISO8859-15, then bail out"). If it bails out, there could still be sugesstion a) or b) as fallback. Also, please inform the user that it is not the system encoding by an icon or a string. Other information:
Thanks for your bug report! > As soon as I click on it or press F2 to rename, remove the "(invalid encoding)". Sounds like a worthwile enhancement. > Instead of the string, blend a warning icon over the file name at the rightmost position How would that icon look? > "Guess" the right encoding (e.g. by a configurable priority table (for me, it could look like "first try UTF-8, then ISO8859-1, then ISO8859-15, then bail out") How can we know that a particular encoding is appropriate or not?
> How would that icon look? I just played around with gimp and clicked together an icon using existing icons I found in /usr/share/pixmaps: http://www.emmes-world.de/encwarning.png A little red exclamation mark in the corner could also do the job. > How can we know that a particular encoding is appropriate or not? Initially, the "priority table" would contain only one encoding, the default one (on my Ubuntu system, it's UTF-8, AFAIK this the default encoding for all GNOME installations). So, the behaviour would not change compared to the current behaviour. But advanced users could add their "legacy encoding" (e.g. ISO8859-1 or -2 or KOI8,...) they used on their old files to the list. Now, if a file name is not valid UTF-8, the other encoding(s) are tried until the filename contains no non-printable characters. Then, it is likely that this filename encoding is the right one. A right-click on the warning icon or the filename could then offer some "Convert to system default encoding" option, which converts from the detected encoding to the default one, e.g. UTF-8. If no useful encoding is found (none in the table would result in an "invalid encoding"-free string), the current behaviour (replace unprintable characters with ?s and warn) could keep in place, leaving it to the user to rename the file to a valid name. Ciao Martin
> I just played around with gimp and clicked together an icon (...): http://www.emmes-world.de/encwarning.png I don't think that this icon is appropriate. It looks rather suggests that something is not allowed. Traversing "Invalid Encoding" into an icon is IMHO almost impossible, feel free to prove me wrong though. > Now, if a file name is not valid UTF-8, the other encoding(s) are tried > until the filename contains no non-printable characters. > Then, it is likely that this filename encoding is the right one. Let's investigate codepages: Isn't it true that the interpretation of all characters between 128 and 255 completely depends on the codepage used? So how do we know whether cp437 or cp850 is appropriate? I think the same is true for ISO8859-1/15.
> Traversing "Invalid Encoding" into an icon is IMHO > almost impossible, feel free to prove me wrong though. I did a second try: http://www.emmes-world.de/encwarning2.png But I actually don't care if it's an icon or some text. > Let's investigate codepages: Isn't it true that the interpretation of all > characters between 128 and 255 completely depends on the codepage used? So how > do we know whether cp437 or cp850 is appropriate? By looking which is the topmost of the ones I selected. If I configured my nautilus to first try UTF-8, then ISO8859-1, then ISO8859-15, for most german file names it does not matter. But if I used an euro sign while saving with ISO8859-1 (e.g. "500€-Schein.xcf", I will get ¤ instead of € ("500¤-Schein.xcf"). As I configured the encoding table myself (I know that I should take a look at it), it's still better than "500?-Schein.xcf (invalid encoding)". Don't get me wrong, nautilus should not try every possible encoding, only the one or few I configured it to try (could look like http://www.emmes-world.de/encpanel.png with some GNOME-usability-specs compliant polish). Ciao Martin
> a) As soon as I click on it or press F2 to rename, remove the "(invalid encoding)". Please, yes. Was just gonna open my own bug with this exact request. :)
Thanks for taking the time to report this bug. This particular bug has already been reported into our bug tracking system, but we are happy to tell you that the problem has already been fixed. It should be solved in the next software version. You may want to check for a software upgrade. *** This bug has been marked as a duplicate of 326747 ***