GNOME Bugzilla – Bug 166973
gnome-vfs just assume that filename encoding is utf-8 if there is no information
Last modified: 2008-09-06 19:07:51 UTC
Please describe the problem: If the protocol have no information about character set encoding of filenames, gnome-vfs just assume that filename is utf-8 encoding. But, world is not utf-8 compatible. I want have a option to set charset encoding, at least "current locale's or utf-8" (like G_FILENAME_ENCODING of glib's). In many environment, we can't use the advantage of gnome-vfs at all just because of this problem. This is very sad situation. Steps to reproduce: 1. use ssh, ftp, ... via gnome-vfs 2. see the filename of non-utf8 filename 3. amen Actual results: broken filename labeled "incorrect unicode" Expected results: converted filename Does this happen every time? yes Other information: example screenshot will be attached
Created attachment 37308 [details] filename encoding example screenshot This nautilus screen is made of ssh connection of gnome-vfs.
I was looking a the code and gnome-vfs already supports the G_FILENAME_ENCODING enviroment variable. Did you try setting it before running nautilus?
Yes, I always sets G_FILENAME_ENCODING=@locale in GNOME environment. and I make sure of that by /proc/[pid of nautilus]/environ file. Nautilus make and/or read local filename as EUC-KR encoding (locale encoding). but, in ftp/sftp protocol, make UTF-8 filename and read filename as UTF-8. (or, maybe, does the CVS code correct this problem?)
Ohh so this does not happen while browsing the local filesystem but only over sftp? (I guess so therefore I am moving this over to the sftp Component)
Oh, I'm sorry i'm not specify the problem clearly. This ploblem is about the remote filename with network protocols which does not have encoding information. For example, ftp, ssh/sftp, ... (samba does not have this problem because the protocol provide encoding information)
gnome-vfs, in the long run, shall provide an API to set encoding information into connection object. It's the only way to assure connectivity over network. Such transfer with implicit charset sometimes make application broken, by pango's segfault with illegal byte sequences. W3C did provide a statement (I can't find it this moment) to encourage people set encoding properly over network, for this era of mixture charsets.
What does it effectively mean if a protocol does not have encoding informations? GnomeVFS just transfers escaped URIs.
nautilus-connect-server create a connection to remote system. This dialog does not provide entry of encoding infomation. I tried a glance, and couldn't find any place to set encoding. GnomeVFSHandle or whatever can provide a property of encoding and promote to use the information between transfers. eg. send a filename of EUC-KR to UTF-8 system, filename conversion shall be done by transfer layer. And vice versa.
It would probably be best to expose a combobox/radiobuttons where you can choose the encoding in the connect server dialog when the selected protocol is ftp[/sftp?]. The filenames then could be converted back and forth from that encoding to utf-8 by gnome-vfs. Not the most beautiful thing in the world, but a "good" workaround (the user shouldnt need to select the encoding, but there is no sane way around that ... sigh). Note that the encoding mess is just that, a mess. Nobody guarantees that one does not traverse filesystem borders with the ftp protocol, and then, the encoding could be even another one. And there is no way to find whether a given encoding is "correct". But I guess providing some means to select the encoding is better than none, but best will be when all the world switched to unicode, of course :) I'd suggest maintaining a file with locale -> legacy encoding mappings and at least only provide that *one entry* (or maybe at max. 3) in the gui. for example: LANG="de" Encoding [*] Unicode [ ] Latin 1 (this one is used most) [ ] Other [______] at least for German, there is "just" one legacy used encoding. Other languages have multiple ones (Japanese comes to mind, having Shift-Jis, Euc/Jp, Windows Codepage 932, ISO-2022-JP): LANG="ja" Encoding [*] Unicode [ ] Shift Jis [ ] EUC Jp (this one is used most) [ ] Windows 932 [ ] Other [______] As you see, here the mess starts. Note that Shift Jis even reads weird on paper, but hey, its still used. Sigh. Kang Jeong-Hee, how is it with Korean ? Are there multiple legacy encodings used ?
A connected server is in no way a "connection" in the protocol sense. Its more like a bookmark or a shortcut. Its not visible at the i/o levels in gnome-vfs.
danny_milo: Korean have EUC-KR, CP949 and iso blah thing, as regular. And I strongly disagree to provide radio buttons. Just a combo box. alexl: Yeah, gnome-vfs may have no responsibility on encoding conversion. But there're convenient 'display' APIs. They need encoding from and to. BTW, I've heard that one of RFC define FTP with encoding information. :)
> BTW, I've heard that one of RFC define FTP with encoding information. :) But, most of ftp servers didn't implement that feature. :( to all: World of encoding is in chaos. Most of ftp clients, ftp servers, blabla servers, and blabla clients are not responsible for encoding problem if the protocol says nothing about encoding. But there EXISTS the problem. If all the people in the world use unicode, this problem will be automatically fixed. Before this utopia occur, however, something or someone should do workaround. (I'm used to convert encoding of transferred filename manually.) Gnome-vfs is very good place to insert workaround.
Where the encoding information belong to? NautilusFile provide a display name. For local file system, g_filename_display_basename() works. In this routine, G_FILENAME_ENCODING environmental variable certainly take effect. For remote, GnomeVFSURI come up. But encoding conversion does not occur. Only last piece of URI (short name) extracted from full length. The string shall be converted. But encoding conversion require @from and @to encoding name. @to is UTF-8 always. @from may determined from G_FILENAME_ENCODING, implicitly. I mean, with implicit, that G_FILENAME_ENCODING is just a list of fallback encodings. There might be an encoding from outside of list. It have to be done at runtime, explicitly. Environmental variable is not matter of runtime. The approach with G_FILENAME_ENCODING can just be a workaround.
gnome-keyring item type of GNOME_KEYRING_ITEM_ENCODING, for example, may put this problem simple way.
gnome-vfs has been deprecated and superseded by gio/gvfs since GNOME 2.22, hence mass-closing many of the gnome-vfs requests/bug reports. This means that gnome-vfs is NOT actively maintained anymore, however patches are still welcome. If your reported issue is still valid for gio/gvfs, please feel free to file a bug report against glib/gio or gvfs. @Bugzilla mail recipients: query for gnome-vfs-mass-close to get rid of these notification emails all together. General further information: http://en.wikipedia.org/wiki/GVFS Reasons behind this decision are listed at http://www.mail-archive.com/gnome-vfs-list@gnome.org/msg00899.html