GNOME Bugzilla – Bug 309493
Automatic character code recognition not working for Japanese text
Last modified: 2005-08-26 13:07:22 UTC
Please describe the problem: When choosing a text file with Japanese encoded text, e.g. JIS, S-JIS, EUC the automatic character code recognition fails to properly display the text. Opening UTF-8 documents works correctly. Choosing the _correct_ encoding from the menu also displays the text correctly. However, the user is forced to know which of 4 or 5 encoding the document is in, in advance to properly open the text document. Emacs, for example, can correctly decode the text. Steps to reproduce: 1. Open a text document with S-JIS or other Japanese encoding 2. Character coding should be set to "Automatic" (default) 3. Click "Open" Actual results: Garbage is displayed on screen. Expected results: Document encoding is correctly chosen. Does this happen every time? Yes. Other information: Perhaps this functionality has not been implemented yet, in which case this is a feature request!
The encoding autodetection code uses the list of encodings defined in the "/apps/gedit-2/preferences/encodings/auto_detected" gconf key. Please, try to update that list using the gconf-editor and lemme know if gedit is now able to auto-detect Japanese encoded text. IIMO, the japanese translators should translate the default value of the above gconf key so to match the needs of the japanese people.
Thanks, I didn't know about the hidden key for auto-detection. However, it doesn't seem to make any difference. I will try to attach some test cases for you.
Created attachment 49571 [details] gedit encoding screenshot
Created attachment 49572 [details] jap_euc_emacs.txt
Created attachment 49573 [details] jap_jis_emacs.txt
Created attachment 49574 [details] jap_sjis_emacs.txt
Dear Paolo Maggi I absolutely for the life of me cannot get bugzilla to attach files (firewall problem?), so here they are anyway. 1. Some test cases for 3 different types of Japanese test 2. A screen shot of emacs and gedit, so you know what kind of output you can expect. The file opened is jap_sjis_emacs.txt. SJIS is a common Japanese character code found on windows platforms. All these files were created in emacs on the same compter (FreeBSD 5.4, Gnome 2.10) Thanks for your interest and help in fixing this longstanding bug! Lachlan
Reopening the bug
Lachlan: could you please make another test for me? Please, remove the ~/.gnome2/gedit-metadata.xml file. And, after having modified the gconf keys as you shown in the screenshow, try to open the file. It should work. At least, it works for me. I think you are still seeing the problem due to another known bug I'm going to fix, i.e. since you succesfully opened the file using UTF-8 encoding, gedit now thinks the file is UTF-8 encoded. Removing the metadata file you reset the gedit memory and so it should now use the gconf key to autodetect the encoding. Please, lemme know if this tip solves your problem.
Yes! This works fine. Thanks for your hard work, Paolo. ;-) It also seems that the order of gconf key element for auto-detection may be important. At first I had SHIFT-JIS,ISO-2022-JP,EUC-JP,UTF-8 as the order in the gconf keys, and the ISO-2022-JP (ie the JIS encoded file) was not detected correctly. Then I changed the order to ISO-2022-JP, SHIFT-JIS,EUC-JP,UTF-8 and all files seem to be correctly opened for me when testing on the various test cases below. Again, let me congratulate you on finding the source of the problem.
Thanks for confirming. Closing as NOTABUG.