After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 92276 - Nautilus fails to display non-ASCII filenames in remote FTP directories
Nautilus fails to display non-ASCII filenames in remote FTP directories
Status: RESOLVED DUPLICATE of bug 136613
Product: nautilus
Classification: Core
Component: Internationalization (i18n)
2.9.x
Other Linux
: High major
: ---
Assigned To: Nautilus Maintainers
Nautilus Maintainers
Depends on:
Blocks:
 
 
Reported: 2002-09-01 20:03 UTC by sdiconov
Modified: 2005-03-01 00:23 UTC
See Also:
GNOME target: ---
GNOME version: 2.9/2.10


Attachments
Nautilus (buggy) and gftp (works) showing same FTP directory with Russian filenames in an 8-bit encoding. (109.32 KB, image/jpeg)
2003-10-31 20:59 UTC, sdiconov
Details
A screenshot of Nautilus 2.6.1 showing this bug and an error message. (231.31 KB, image/jpeg)
2004-05-28 07:43 UTC, sdiconov
Details
A screenshot of Nautilus 2.8.2 unable to show filenames on an FTP (171.34 KB, image/png)
2005-01-23 18:04 UTC, sdiconov
Details
Screenshot of Nautilus Mozilla and gftp showing my test ftp directory. (325.84 KB, image/jpeg)
2005-01-25 12:03 UTC, sdiconov
Details
A reference card showing results in different locales (21.06 KB, image/png)
2005-01-25 19:09 UTC, sdiconov
Details

Description sdiconov 2002-09-01 20:03:02 UTC
Nautilus may not be used to browse FTP servers and Windows Netbios shares
despite its built-in functionality. All these _never_ use UTF-8 to provide
filenames and Nautilus stupidly displays "Illegal Unicode Sequnce". This
bug makes it impossible to use Nautilus for browsing anything ouside my
local HDD. 

There must be some way for the user to explicitly choose the remote
server's codepage !NOT! hidden somewhere in the GConf registry. Different
servers may use different non-UTF encodings.

If this is solely a gnome-vfs bug, then, please, push it to the vfs people
as sth. that must be fixed before the "real" release.
Comment 1 Luis Villa 2002-09-10 13:42:34 UTC
Dave: is this gnome-vfs or nautilus?
Kjartan: any idea how serious this problem really is?
sdiconov: you'd be more persuasive if every bug you filed were (1)
more detailed and (2) you weren't convinced that every single bug you
file is the most important bug in all of GNOME. We do definitely
appreciate your feedback- it's important that we know those bugs are
there, so please do keep filing them. But perspective is important- we
have something like 6,000 bugs, so yours are just a few of many.
Comment 2 sdiconov 2002-09-10 14:55:18 UTC
What kind of feedback would you like to have? I'll send anything you
ask for. 

This problem is actually a lack of functionality and can easily be
reproduced. To do so you need some working ftp and samba servers with
some Russian (or some other non-ASCII and non-UTF8) filenames. The
servers should use microsoft-cp1251 or koi8 8-bit (or some other)
codepages for presenting the filenames (just like they do it in real
life). 

To solve it Nautilus has to provide the "iocharset=<codepage>"
parameter to smbmount and do sth analogous for FTP. The parameter may
differ and depends on the server's setup and user's locale settings.    


This is important. Trust me. It is important in the same grade as
having the ability to browse the samba shares. Not being able to read
filenames makes using file shares impossible. 

-Just Talk------------------------------------------------------------
Gnome developers often tend to forget about non-English languages and
produce crippled software as a result. Nautilus 1.xx "Notes" sidebar,
for instance, never displayed anything other then numbers and useless
latin letters. As a result I could not use it. Current Nautilus text
view cannot be used to read texts because it has no codepage switching
ability (iconv) and sometimes shows scramled text. A polite feature
request was sent long ago (bug #64214), but nothing happened. 

I wrote all this to remind about legacy i18n stuff compatibility
issues that will plague users outside of US. No offence was implied. 
Comment 3 Kjartan Maraas 2002-09-10 18:47:25 UTC
I guess this is fairly important the same way that i18n in
gnome-terminal is important and so on. It's time we start pay
attention to the localization bugs for non latin languages in my
opinion. It's not like this is a big surprise to us, we've had these
kinds of reports all the way through the 1.x.x development cycle. 2.x
was supposed to be the one where pango was going to help fix this once
and for all.

I know that we can't fix 'em all at once, but maybe Sun or someone
could do a review (same as for security) for at least their primary
languages?
Comment 4 sdiconov 2003-06-15 07:30:01 UTC
I currently have Nautilus 2.2.4. Whenever I try it to access any local
FTPs or smb shares I see no filenames. 

Problem #1 Nautilus 2.2.4 displays only "Ilegal unicode sequence"
"Ilegal unicode sequence" "Ilegal unicode sequence" "Ilegal unicode
sequence"... instead of file and directory names. 

Problem #2 The samba shares' names are not ASCII or UTF in my LAN too.
So, I cannot enter URLs in nautilus. When I enter a Russian share name
"smb://.../&#1082;&#1080;&#1085;&#1086;" (smb://.../kino) I get a message that
"smb://.../%d0%ba%d0%b8%d0%bd%d0%be" cannot be found.

My locale is ru_RU.CP1251. The local ftp servers return filenames in
the same (cp1251 or windows-1251) codepage (does not help) while smb
shares give the filenames in the legacy DOS cp866 codepage. 

This bug is almost a year old... :( I wish I could tell the exact
lines of code, but i'm not C literate enough.
Comment 5 Kjartan Maraas 2003-07-08 23:22:02 UTC
Upping the pri on this as this problem has been around for some time now.
Comment 6 Kjartan Maraas 2003-10-31 16:27:23 UTC
Does this still break in the same way?
Comment 7 sdiconov 2003-10-31 20:57:24 UTC
Yes it does break. It is still impossible to use nautilus-2.4.0 for
browsing FTP or SMB. 

I attach a screenshot showing this bug in action. The "?????? ??????
(Illegal Unicode sequence)" string is shown instead of each and every
non Latin filename. My locale is ru_RU.CP1251. Server's locale is most
probably ru_RU.KOI8-R. gFTP works right without any tweaking. 

Same happens with SAMBA shares, although I can mount them normally
with smbmount or Linneighborhood if the smb.conf contains 

 client code page = 866
 character set = 1251

and I supply the iocharset=cp1251 option to smbmount.
Comment 8 sdiconov 2003-10-31 20:59:36 UTC
Created attachment 21103 [details]
Nautilus (buggy) and gftp (works) showing same FTP directory with Russian filenames in an 8-bit encoding.
Comment 9 Christophe Fergeau 2003-11-15 23:14:38 UTC
See
http://mail.gnome.org/archives/nautilus-list/2003-October/msg00089.html
for a thread about the same issue
Comment 10 sdiconov 2003-11-16 00:48:47 UTC
It means that Nautilus MUST have the "View/Encoding/..." menu and some
shortcuts to force any encoding manually. WWW browsers do have this.
All text viewers need this too. Not being a C developer I would
suggest the following algorithm:

---------------------------------------------------------------------

For remote filesystems (FTP, SMB)
1) Show Unicode if it is Unicode OR try to autodetect (I believe that
http://trific.ath.cx//software/enca/ ENCA library can be used to
accomplish this). Do the autodetect once to speed things up.

2) If atodetect fails show everything using the current system locale
encoding

3) ALWAYS have the manual encoding selector available allowing the
user to correct wrong guessings.


For local filesystem:

1) Show everything non-UTF using the current system locale encoding

2) ALWAYS have the manual encoding selector available allowing the
user to correct wrong guessings.

--------------------------------------------------------------------

The reason is the legacy encodings zoo. Some form of a quick manual
codepage selector is necessary because filenames written in a
different encoding do occur even on the local harddrive. Two users
that speak one language but use different locales cannot exchange files. 

Common example: Linux archivers fail to extract files compressed by
Windows tools and vice versa because the filenames seem to contain
illegal characters.

Another example: after mounting your harddrive from a different OS (a
CD-based rescue system or another copy of Linux) you cannot read names
of your own files because of different locale. One and the same folder
can have files created by both systems and become unmanageable because
some filenames will always be unreadable.

Ignoring such problems makes Nautilus and Gnome as a whole unusable.
Comment 11 Alexander Larsson 2004-02-09 15:13:53 UTC
for smb this is solved in gnome-vfs 2.5.x. It always uses utf8 uris,
and the converts as needed to/from the remote machine.
Comment 12 Luis Villa 2004-05-05 14:13:39 UTC
Retitling then. I presume this is still a problem with ftp?
Comment 13 sdiconov 2004-05-28 06:34:55 UTC
No, there is a change, but nothing was fixed. Both SMB and FTP problems remained
in Gnome 2.6. Though SMB did get better. 

It is still impossible to manage files in Gnome when filenames are not ASCII and
use different encodings.

Such cases include
- archive files coming from Windows and other systems
- remote shares (FTP, SMB)
- files written by other users or systems with a diffent locale settings
- files on mounted volumes with no or wrong iocharset= option.
Comment 14 sdiconov 2004-05-28 06:35:45 UTC
FTP: the problem persists and nothing has changed in Nautilus 2.6.1
Comment 15 sdiconov 2004-05-28 07:42:17 UTC
SMB: There is an obvious change because I no longer see "Illegal Unicode
sequence" messages. Instead I see GARBAGE (screenshot attached) and Nautilus is
unable to mount and enter any shared folder. It gets the Russian share names
wrong and then fails to find them, as I understand from its error messages.

My /etc/samba/smb.conf contains this:

# Enabling internationalization:
# you can match a Windows code page with a UNIX character set.
# Windows: CP437 (US, default), CP737 (GREEK), CP850 (Latin1 - Western European),
# CP852 (Eastern Eu.), CP861 (Icelandic), CP866 (Cyrillic - Russian),
# CP932 (Japanese - Shift-JIS), CP936 (Simpl. Chinese), CP949 (Korean Hangul),
# CP950 (Trad. Chin.).
# UNIX: ISO8859-1 (Western European), ISO8859-2 (Eastern Eu.),
# ISO8859-5 (Russian Cyrillic), KOI8-R (Alt-Russ. Cyril.),
# CP1251 (Belarusian/Bulgarian), KOI8-U (Ukrainian)
# Basically, all charsets, supported by iconv(3) are permitted here
# See iconv -l for complete list of encodings
# Note that UTF8 is also supported and Samba3 defaults to it in unix and display
charsets.
# This is an example for Russian users:
   dos charset = CP866
   unix charset = CP1251
   display charset = CP1251

It makes samba-client work right. I can access local shares using LinNeiborhood
as a samba frontend. Nautilus still fails to show and find names of shares. 

The screenshot shows:
1) Nautilus window with scrambled sharenames
2) Another Nautilus window with error message after an attempt to enter a share
3) LinNeiborhood (an old samba forntend) which successfully shows all shares and
is able to mount them correctly.
Comment 16 sdiconov 2004-05-28 07:43:55 UTC
Created attachment 28112 [details]
A screenshot of Nautilus 2.6.1 showing this bug and an error message.

This is GNOME 2.6.0
Comment 17 Christophe Fergeau 2004-05-28 07:46:22 UTC
# This is an example for Russian users:
   dos charset = CP866
   unix charset = CP1251
   display charset = CP1251

You need to configure Samba so that it uses UTF-8 share names if you want
Nautilus to work properly I think.
Comment 18 sdiconov 2004-05-28 08:01:48 UTC
Ok. I changed standard and proven setup 

   unix charset = CP1251
   display charset = CP1251

into

   unix charset = UTF8
   display charset = UTF8

It broke everything and fixed nothing :(. Nautilus still shows same nonsense and
smbmount no longer works correctly. My locale is ru_RU.CP1251

$ locale
LANG=ru_RU.CP1251
LC_CTYPE="ru_RU.CP1251"
LC_NUMERIC="ru_RU.CP1251"
LC_TIME=ru_RU.CP1251
LC_COLLATE="ru_RU.CP1251"
LC_MONETARY="ru_RU.CP1251"
LC_MESSAGES="ru_RU.CP1251"
LC_PAPER="ru_RU.CP1251"
LC_NAME="ru_RU.CP1251"
LC_ADDRESS="ru_RU.CP1251"
LC_TELEPHONE="ru_RU.CP1251"
LC_MEASUREMENT="ru_RU.CP1251"
LC_IDENTIFICATION="ru_RU.CP1251"

* I cannot switch to ru_RU.UTF8 because it means a lot of work and extra load. I
want compatibility with some software (e.g. Midnight Commander) and loads of
Windows text files I still want to read without recoding. Besides, such change
of locale will make most filenames on my harddrive and removable storage
inacessible (they are CP1251 too!). I'll have to rename 10.000's of files and I
do not have tools for automatic conversion of filesystems from one locale into
another.

In other words switching to UTF8  
   
Comment 19 sdiconov 2005-01-23 18:04:55 UTC
Created attachment 36420 [details]
A screenshot of Nautilus 2.8.2 unable to show filenames on an FTP

Nautilus 0.0.1 - 2.8.2 was and is useless pile of bits when it comes to
browsing remote sites be it SMB or FTP. Will Nautilus 2.10 just work for a
change? Mozilla does FTP browsing easily and GFTP works like a breeze in the
same situation. 

(This remote FTP server has filenames in CP1251)
Comment 20 Kjartan Maraas 2005-01-24 14:15:31 UTC
Can you provide a URL to a public ftp server that can be used as a testing
ground for this problem?
Comment 21 Luis Villa 2005-01-24 22:49:09 UTC
From re-reading, I'm markign this a dup of bug 136613- appears to be the same
root cause. I'd leave this one open, but bug 136613 has a patch (albeit one
marked needs-work.)

*** This bug has been marked as a duplicate of 136613 ***
Comment 22 sdiconov 2005-01-25 11:49:35 UTC
I created a test FTP dir with some non-english filenames. Try this link:

ftp://ftp.altlinux.ru/pub/people/slava/Filename_test

There are 3 dummy files and one directory there. All have filenames in Russian.
You need CP1251 encoding to see the first level file and directory. The
directory contains two additional files. One has filename in UTF-8 and the other
in cp1251. 

Mozilla and gftp do show them correctly (locale ru_RU.CP1251). Mozilla also has
nice codepage selector that makes it possible to see any filenames. I can enter
the directory and download any of the files.

Nautilus only shows the UTF-8 filename and "Illegal Unicode sequence" instead of
other filenames. It cannot enter the directory and cannot download files with
8-bit cp1251 names.

See screenshot.
Comment 23 sdiconov 2005-01-25 12:03:28 UTC
Created attachment 36502 [details]
Screenshot of Nautilus Mozilla and gftp showing my test ftp directory.

Mozilla and gftp show files correctly but Nautilus misinterprets the filenames.
Comment 24 sdiconov 2005-01-25 12:12:52 UTC
One more note: This problem also plagues names of SMB shares.
Comment 25 Kjartan Maraas 2005-01-25 13:11:53 UTC
Thanks for setting up this test for us. I'm not able to test this though, since
my distro libc doesn't support the ru_RU.CP1251 locale at all it seems. Testing
using ru_RU.UTF-8 shows the garbled filenames as well, and if I use that I can't
see the correct filenames in gftp either. Mozilla seems to be able to make more
sense of this when I chose Cyrillic Windows-1251 as the encoding, but I'm not
really sure there either since my russian is reall rusty ;-)
Comment 26 sdiconov 2005-01-25 18:57:42 UTC
OK. I think that it is important to test with an 8-bit encoding. I added another
file and a directory with KOI8-R name. Even really antique distros supported
ru_RU.KOI8-R. 

Even if you cannot read ;) you can just compare the picture with screenshots.
Comment 27 sdiconov 2005-01-25 19:09:03 UTC
Created attachment 36516 [details]
A reference card showing results in different locales
Comment 28 Kjartan Maraas 2005-01-26 01:06:18 UTC
Yes, it doesn't show the right names using KOI8-R either. Didn't try
G_BROKEN_FILENAMES though, should that matter?
Comment 29 Kjartan Maraas 2005-01-31 17:33:09 UTC
Have you tried G_FILENAME_ENCODINGS with the development versions? That should
let  programs try more than just current locale and UTF-8 when trying to make
sense of mixed encodings in directory listings.
Comment 30 sdiconov 2005-01-31 18:53:09 UTC
Not yet. I never install anything without rpm+apt, but I think that i'll have
some beta 2.9 rpms available soon.

BTW Can Nautilus optionally use a charset detection library? Here is one example
: http://trific.ath.cx/software/enca/

/* There is also a problem of mixed encodings on local FS. It is actually a
kernel design problem. Filenames should be stored in UTF and recoded on demand
by something as uniform as glibc or kernel. Unfortunately kernel developers live
on Mars and know nothing about Earth user's problems.  

Nautilus could provide at least partial solution by using a web browser-like
charset selector (Mozilla and even Epiphany have it). */
Comment 31 Alexey Rusakov 2005-03-01 00:23:42 UTC
Concerning GnomeVFS - there is the same problem there (I have described it in
bug 136613), with the only difference that Nautilus "likes" UTF-8, while
GnomeVFS in 'Open/Save file' dialogs prefers system locale (and not UTF-8).