GNOME Bugzilla – Bug 92276
Nautilus fails to display non-ASCII filenames in remote FTP directories
Last modified: 2005-03-01 00:23:42 UTC
Nautilus may not be used to browse FTP servers and Windows Netbios shares despite its built-in functionality. All these _never_ use UTF-8 to provide filenames and Nautilus stupidly displays "Illegal Unicode Sequnce". This bug makes it impossible to use Nautilus for browsing anything ouside my local HDD. There must be some way for the user to explicitly choose the remote server's codepage !NOT! hidden somewhere in the GConf registry. Different servers may use different non-UTF encodings. If this is solely a gnome-vfs bug, then, please, push it to the vfs people as sth. that must be fixed before the "real" release.
Dave: is this gnome-vfs or nautilus? Kjartan: any idea how serious this problem really is? sdiconov: you'd be more persuasive if every bug you filed were (1) more detailed and (2) you weren't convinced that every single bug you file is the most important bug in all of GNOME. We do definitely appreciate your feedback- it's important that we know those bugs are there, so please do keep filing them. But perspective is important- we have something like 6,000 bugs, so yours are just a few of many.
What kind of feedback would you like to have? I'll send anything you ask for. This problem is actually a lack of functionality and can easily be reproduced. To do so you need some working ftp and samba servers with some Russian (or some other non-ASCII and non-UTF8) filenames. The servers should use microsoft-cp1251 or koi8 8-bit (or some other) codepages for presenting the filenames (just like they do it in real life). To solve it Nautilus has to provide the "iocharset=<codepage>" parameter to smbmount and do sth analogous for FTP. The parameter may differ and depends on the server's setup and user's locale settings. This is important. Trust me. It is important in the same grade as having the ability to browse the samba shares. Not being able to read filenames makes using file shares impossible. -Just Talk------------------------------------------------------------ Gnome developers often tend to forget about non-English languages and produce crippled software as a result. Nautilus 1.xx "Notes" sidebar, for instance, never displayed anything other then numbers and useless latin letters. As a result I could not use it. Current Nautilus text view cannot be used to read texts because it has no codepage switching ability (iconv) and sometimes shows scramled text. A polite feature request was sent long ago (bug #64214), but nothing happened. I wrote all this to remind about legacy i18n stuff compatibility issues that will plague users outside of US. No offence was implied.
I guess this is fairly important the same way that i18n in gnome-terminal is important and so on. It's time we start pay attention to the localization bugs for non latin languages in my opinion. It's not like this is a big surprise to us, we've had these kinds of reports all the way through the 1.x.x development cycle. 2.x was supposed to be the one where pango was going to help fix this once and for all. I know that we can't fix 'em all at once, but maybe Sun or someone could do a review (same as for security) for at least their primary languages?
I currently have Nautilus 2.2.4. Whenever I try it to access any local FTPs or smb shares I see no filenames. Problem #1 Nautilus 2.2.4 displays only "Ilegal unicode sequence" "Ilegal unicode sequence" "Ilegal unicode sequence" "Ilegal unicode sequence"... instead of file and directory names. Problem #2 The samba shares' names are not ASCII or UTF in my LAN too. So, I cannot enter URLs in nautilus. When I enter a Russian share name "smb://.../кино" (smb://.../kino) I get a message that "smb://.../%d0%ba%d0%b8%d0%bd%d0%be" cannot be found. My locale is ru_RU.CP1251. The local ftp servers return filenames in the same (cp1251 or windows-1251) codepage (does not help) while smb shares give the filenames in the legacy DOS cp866 codepage. This bug is almost a year old... :( I wish I could tell the exact lines of code, but i'm not C literate enough.
Upping the pri on this as this problem has been around for some time now.
Does this still break in the same way?
Yes it does break. It is still impossible to use nautilus-2.4.0 for browsing FTP or SMB. I attach a screenshot showing this bug in action. The "?????? ?????? (Illegal Unicode sequence)" string is shown instead of each and every non Latin filename. My locale is ru_RU.CP1251. Server's locale is most probably ru_RU.KOI8-R. gFTP works right without any tweaking. Same happens with SAMBA shares, although I can mount them normally with smbmount or Linneighborhood if the smb.conf contains client code page = 866 character set = 1251 and I supply the iocharset=cp1251 option to smbmount.
Created attachment 21103 [details] Nautilus (buggy) and gftp (works) showing same FTP directory with Russian filenames in an 8-bit encoding.
See http://mail.gnome.org/archives/nautilus-list/2003-October/msg00089.html for a thread about the same issue
It means that Nautilus MUST have the "View/Encoding/..." menu and some shortcuts to force any encoding manually. WWW browsers do have this. All text viewers need this too. Not being a C developer I would suggest the following algorithm: --------------------------------------------------------------------- For remote filesystems (FTP, SMB) 1) Show Unicode if it is Unicode OR try to autodetect (I believe that http://trific.ath.cx//software/enca/ ENCA library can be used to accomplish this). Do the autodetect once to speed things up. 2) If atodetect fails show everything using the current system locale encoding 3) ALWAYS have the manual encoding selector available allowing the user to correct wrong guessings. For local filesystem: 1) Show everything non-UTF using the current system locale encoding 2) ALWAYS have the manual encoding selector available allowing the user to correct wrong guessings. -------------------------------------------------------------------- The reason is the legacy encodings zoo. Some form of a quick manual codepage selector is necessary because filenames written in a different encoding do occur even on the local harddrive. Two users that speak one language but use different locales cannot exchange files. Common example: Linux archivers fail to extract files compressed by Windows tools and vice versa because the filenames seem to contain illegal characters. Another example: after mounting your harddrive from a different OS (a CD-based rescue system or another copy of Linux) you cannot read names of your own files because of different locale. One and the same folder can have files created by both systems and become unmanageable because some filenames will always be unreadable. Ignoring such problems makes Nautilus and Gnome as a whole unusable.
for smb this is solved in gnome-vfs 2.5.x. It always uses utf8 uris, and the converts as needed to/from the remote machine.
Retitling then. I presume this is still a problem with ftp?
No, there is a change, but nothing was fixed. Both SMB and FTP problems remained in Gnome 2.6. Though SMB did get better. It is still impossible to manage files in Gnome when filenames are not ASCII and use different encodings. Such cases include - archive files coming from Windows and other systems - remote shares (FTP, SMB) - files written by other users or systems with a diffent locale settings - files on mounted volumes with no or wrong iocharset= option.
FTP: the problem persists and nothing has changed in Nautilus 2.6.1
SMB: There is an obvious change because I no longer see "Illegal Unicode sequence" messages. Instead I see GARBAGE (screenshot attached) and Nautilus is unable to mount and enter any shared folder. It gets the Russian share names wrong and then fails to find them, as I understand from its error messages. My /etc/samba/smb.conf contains this: # Enabling internationalization: # you can match a Windows code page with a UNIX character set. # Windows: CP437 (US, default), CP737 (GREEK), CP850 (Latin1 - Western European), # CP852 (Eastern Eu.), CP861 (Icelandic), CP866 (Cyrillic - Russian), # CP932 (Japanese - Shift-JIS), CP936 (Simpl. Chinese), CP949 (Korean Hangul), # CP950 (Trad. Chin.). # UNIX: ISO8859-1 (Western European), ISO8859-2 (Eastern Eu.), # ISO8859-5 (Russian Cyrillic), KOI8-R (Alt-Russ. Cyril.), # CP1251 (Belarusian/Bulgarian), KOI8-U (Ukrainian) # Basically, all charsets, supported by iconv(3) are permitted here # See iconv -l for complete list of encodings # Note that UTF8 is also supported and Samba3 defaults to it in unix and display charsets. # This is an example for Russian users: dos charset = CP866 unix charset = CP1251 display charset = CP1251 It makes samba-client work right. I can access local shares using LinNeiborhood as a samba frontend. Nautilus still fails to show and find names of shares. The screenshot shows: 1) Nautilus window with scrambled sharenames 2) Another Nautilus window with error message after an attempt to enter a share 3) LinNeiborhood (an old samba forntend) which successfully shows all shares and is able to mount them correctly.
Created attachment 28112 [details] A screenshot of Nautilus 2.6.1 showing this bug and an error message. This is GNOME 2.6.0
# This is an example for Russian users: dos charset = CP866 unix charset = CP1251 display charset = CP1251 You need to configure Samba so that it uses UTF-8 share names if you want Nautilus to work properly I think.
Ok. I changed standard and proven setup unix charset = CP1251 display charset = CP1251 into unix charset = UTF8 display charset = UTF8 It broke everything and fixed nothing :(. Nautilus still shows same nonsense and smbmount no longer works correctly. My locale is ru_RU.CP1251 $ locale LANG=ru_RU.CP1251 LC_CTYPE="ru_RU.CP1251" LC_NUMERIC="ru_RU.CP1251" LC_TIME=ru_RU.CP1251 LC_COLLATE="ru_RU.CP1251" LC_MONETARY="ru_RU.CP1251" LC_MESSAGES="ru_RU.CP1251" LC_PAPER="ru_RU.CP1251" LC_NAME="ru_RU.CP1251" LC_ADDRESS="ru_RU.CP1251" LC_TELEPHONE="ru_RU.CP1251" LC_MEASUREMENT="ru_RU.CP1251" LC_IDENTIFICATION="ru_RU.CP1251" * I cannot switch to ru_RU.UTF8 because it means a lot of work and extra load. I want compatibility with some software (e.g. Midnight Commander) and loads of Windows text files I still want to read without recoding. Besides, such change of locale will make most filenames on my harddrive and removable storage inacessible (they are CP1251 too!). I'll have to rename 10.000's of files and I do not have tools for automatic conversion of filesystems from one locale into another. In other words switching to UTF8
Created attachment 36420 [details] A screenshot of Nautilus 2.8.2 unable to show filenames on an FTP Nautilus 0.0.1 - 2.8.2 was and is useless pile of bits when it comes to browsing remote sites be it SMB or FTP. Will Nautilus 2.10 just work for a change? Mozilla does FTP browsing easily and GFTP works like a breeze in the same situation. (This remote FTP server has filenames in CP1251)
Can you provide a URL to a public ftp server that can be used as a testing ground for this problem?
From re-reading, I'm markign this a dup of bug 136613- appears to be the same root cause. I'd leave this one open, but bug 136613 has a patch (albeit one marked needs-work.) *** This bug has been marked as a duplicate of 136613 ***
I created a test FTP dir with some non-english filenames. Try this link: ftp://ftp.altlinux.ru/pub/people/slava/Filename_test There are 3 dummy files and one directory there. All have filenames in Russian. You need CP1251 encoding to see the first level file and directory. The directory contains two additional files. One has filename in UTF-8 and the other in cp1251. Mozilla and gftp do show them correctly (locale ru_RU.CP1251). Mozilla also has nice codepage selector that makes it possible to see any filenames. I can enter the directory and download any of the files. Nautilus only shows the UTF-8 filename and "Illegal Unicode sequence" instead of other filenames. It cannot enter the directory and cannot download files with 8-bit cp1251 names. See screenshot.
Created attachment 36502 [details] Screenshot of Nautilus Mozilla and gftp showing my test ftp directory. Mozilla and gftp show files correctly but Nautilus misinterprets the filenames.
One more note: This problem also plagues names of SMB shares.
Thanks for setting up this test for us. I'm not able to test this though, since my distro libc doesn't support the ru_RU.CP1251 locale at all it seems. Testing using ru_RU.UTF-8 shows the garbled filenames as well, and if I use that I can't see the correct filenames in gftp either. Mozilla seems to be able to make more sense of this when I chose Cyrillic Windows-1251 as the encoding, but I'm not really sure there either since my russian is reall rusty ;-)
OK. I think that it is important to test with an 8-bit encoding. I added another file and a directory with KOI8-R name. Even really antique distros supported ru_RU.KOI8-R. Even if you cannot read ;) you can just compare the picture with screenshots.
Created attachment 36516 [details] A reference card showing results in different locales
Yes, it doesn't show the right names using KOI8-R either. Didn't try G_BROKEN_FILENAMES though, should that matter?
Have you tried G_FILENAME_ENCODINGS with the development versions? That should let programs try more than just current locale and UTF-8 when trying to make sense of mixed encodings in directory listings.
Not yet. I never install anything without rpm+apt, but I think that i'll have some beta 2.9 rpms available soon. BTW Can Nautilus optionally use a charset detection library? Here is one example : http://trific.ath.cx/software/enca/ /* There is also a problem of mixed encodings on local FS. It is actually a kernel design problem. Filenames should be stored in UTF and recoded on demand by something as uniform as glibc or kernel. Unfortunately kernel developers live on Mars and know nothing about Earth user's problems. Nautilus could provide at least partial solution by using a web browser-like charset selector (Mozilla and even Epiphany have it). */
Concerning GnomeVFS - there is the same problem there (I have described it in bug 136613), with the only difference that Nautilus "likes" UTF-8, while GnomeVFS in 'Open/Save file' dialogs prefers system locale (and not UTF-8).