After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 309493 - Automatic character code recognition not working for Japanese text
Automatic character code recognition not working for Japanese text
Status: RESOLVED NOTABUG
Product: gedit
Classification: Applications
Component: general
2.10.x
Other All
: Normal minor
: 2.12.0
Assigned To: Gedit maintainers
gedit QA volunteers
Depends on:
Blocks:
 
 
Reported: 2005-07-05 02:18 UTC by Lachlan
Modified: 2005-08-26 13:07 UTC
See Also:
GNOME target: ---
GNOME version: 2.9/2.10


Attachments
gedit encoding screenshot (135.86 KB, image/png)
2005-07-22 15:04 UTC, Paolo Maggi
Details
jap_euc_emacs.txt (58 bytes, text/plain)
2005-07-22 15:05 UTC, Paolo Maggi
Details
jap_jis_emacs.txt (70 bytes, text/plain)
2005-07-22 15:05 UTC, Paolo Maggi
Details
jap_sjis_emacs.txt (58 bytes, text/plain)
2005-07-22 15:06 UTC, Paolo Maggi
Details

Description Lachlan 2005-07-05 02:18:25 UTC
Please describe the problem:
When choosing a text file with Japanese encoded text, e.g. JIS, S-JIS, EUC the
automatic character code recognition fails to properly display the text.

Opening UTF-8 documents works correctly.

Choosing the _correct_ encoding from the menu also displays the text correctly.

However, the user is forced to know which of 4 or 5 encoding the document is in,
in advance to properly open the text document. Emacs, for example, can correctly
decode the text.

Steps to reproduce:
1. Open a text document with S-JIS or other Japanese encoding
2. Character coding should be set to "Automatic" (default)
3. Click "Open"

Actual results:
Garbage is displayed on screen.

Expected results:
Document encoding is correctly chosen.

Does this happen every time?
Yes.

Other information:
Perhaps this functionality has not been implemented yet, in which case this is a
feature request!
Comment 1 Paolo Maggi 2005-07-21 17:11:18 UTC
The encoding autodetection code uses the list of encodings defined in the
"/apps/gedit-2/preferences/encodings/auto_detected" gconf key.
Please, try to update that list using the gconf-editor and lemme know if gedit
is now able to auto-detect Japanese encoded text. 
IIMO, the japanese translators should translate the default value of the above
gconf key so to match the needs of the japanese people.
Comment 2 Lachlan 2005-07-22 05:34:28 UTC
Thanks, I didn't know about the hidden key for auto-detection. However, it
doesn't seem to make any difference. I will try to attach some test cases for you.
Comment 3 Paolo Maggi 2005-07-22 15:04:21 UTC
Created attachment 49571 [details]
gedit encoding screenshot
Comment 4 Paolo Maggi 2005-07-22 15:05:15 UTC
Created attachment 49572 [details]
jap_euc_emacs.txt
Comment 5 Paolo Maggi 2005-07-22 15:05:38 UTC
Created attachment 49573 [details]
jap_jis_emacs.txt
Comment 6 Paolo Maggi 2005-07-22 15:06:06 UTC
Created attachment 49574 [details]
jap_sjis_emacs.txt
Comment 7 Paolo Maggi 2005-07-22 15:06:29 UTC
Dear Paolo Maggi

I absolutely for the life of me cannot get bugzilla to attach files
(firewall problem?), so here they are anyway.

1. Some test cases for 3 different types of Japanese test
2. A screen shot of emacs and gedit, so you know what kind of output you
can expect. The file opened is jap_sjis_emacs.txt. SJIS is a common
Japanese character code found on windows platforms. All these files were
created in emacs on the same compter (FreeBSD 5.4, Gnome 2.10)

Thanks for your interest and help in fixing this longstanding bug!

Lachlan
Comment 8 Paolo Maggi 2005-07-22 15:07:27 UTC
Reopening the bug
Comment 9 Paolo Maggi 2005-08-26 10:11:18 UTC
Lachlan: could you please make another test for me? 
Please, remove the ~/.gnome2/gedit-metadata.xml file.
And, after having modified the gconf keys as you shown in the screenshow, try to
open the file. It should work. At least, it works for me.

I think you are still seeing the problem due to another known bug I'm going to
fix, i.e. since you succesfully opened the file using UTF-8 encoding, gedit now
thinks the file is UTF-8 encoded. Removing the metadata file you reset the gedit
memory and so it should now use the gconf key to autodetect the encoding.

Please, lemme know if this tip solves your problem.
Comment 10 Lachlan 2005-08-26 12:57:43 UTC
Yes! This works fine. Thanks for your hard work, Paolo. ;-)

It also seems that the order of gconf key element for auto-detection may be
important. At first I had SHIFT-JIS,ISO-2022-JP,EUC-JP,UTF-8 as the order in the
gconf keys, and the ISO-2022-JP (ie the JIS encoded file) was not detected
correctly. 

Then I changed the order to ISO-2022-JP, SHIFT-JIS,EUC-JP,UTF-8 and all files
seem to be correctly opened for me when testing on the various test cases below.

Again, let me congratulate you on finding the source of the problem.
Comment 11 Paolo Maggi 2005-08-26 13:07:22 UTC
Thanks for confirming. 

Closing as NOTABUG.