GNOME Bugzilla – Bug 77564
'binary warning' on loading of euc-kr
Last modified: 2010-05-05 10:59:20 UTC
Package: gedit Severity: enhancement Version: 1.116.0 Synopsis: 'binary warning' on loading of euc-kr Bugzilla-Product: gedit Bugzilla-Component: general Description: """ Could not open the file "/home/Keizi/src/xchat-20020314/po/ko.po" because it contains invalid UTF-8 data. Probably, you are trying to open a binary file. """ xchat 1.9 is in stage of porting to gtk2. and .po for gtk2 have to contain utf8 not native locale. so I try to convert by gedit2. load euc-kr and save utf8. I believe gedit2 can make it. but gedit2 failed to load the ko.po file. I think to ignore command of user is out of expect. what about to make 'load anyway' feature? ------- Bug moved to this database by unknown@bugzilla.gnome.org 2002-04-03 21:17 ------- Reassigning to the default owner of the component, maggi@athena.polito.it.
The problem here is that GtkTextView can only display valid UTF-8 text. If the text file is not in UTF-8 format, then gedit tries to convert it making the assumption that the file was written using the current user locale. I really cannot imagine another reliable way to solve this problem, even if I know it is not perfect.
yeah. I understand the complexity between native and utf8. this is really big problem to i18n, I think. btw, the file that I failed to load is really in euc-kr only. and I'm in locale of ko_KR.eucKR. it's funny why gedit2 failed.
Really strange. Please, attach the file you cannot open and your locale configuration.
Created attachment 7547 [details] ko.po in euc-kr.
well, the ko.po has ad broken character. euc-kr 2-byte widechar. so there sometimes be 1-byte broken character by accident. so gedit's warning is right. but I think gedut'd better to support to 'load anyway'.
Hi, from yesterday build (25/6/2002) of anoncvs, I'm using gnome2 as dogfood. The problem is that I have seen this bug with two or three files. gedit show the same warning that "less" in the command line will show. And when you read it with less, it shows <E1> <ED> where it should show caracters with tilde. Do you want some of this files?
Manuel: yes, please attach the files to this report. Which locale are you using?
llanero@llanero:~/gnome2$ echo $LOCALE llanero@llanero:~/gnome2$ echo $LANG es_ES.ISO-8859-1 llanero@llanero:~/gnome2$ file louie.txt louie.txt: ISO-8859 English text I tried: export LOCALE=es but it still happens. Please, note that after I hit the OK button, gedit does *not* disable all the icons in the toolbar, menus, ... (assertion `active_child != NULL' failed).
Created attachment 9462 [details] chat conversation saved from xchat.
More info: llanero@llanero:~/gnome2$ cat louie.txt --> Estás hablando ahora en #bugs Can you read "Est'as" ?? Here it shows correctly, but in gnome-terminal I don't see the "a" + the tilde.
See the screenshot at: http://llanero.eresmas.net/bugs/Screenshot-Gnome-terminal.png Kang: what distro are you using? I think it is not gedit fault, but some lib or misconfiguration.
gedit 0.9.7 loads the file but does not show the bad caracter: http://llanero.eresmas.net/bugs/Screenshot-gedit.png Note that I can write "á" in gedit2, save it and load the file again with no problems. The same file (I don't think mcopy changes the file :) in a redhat box with ximian gnome2 shows the file correctly both with the text component in nautilus and with gedit 1.121.1. Back at my computer, the text view compononent in nautilus shows it correctly! (gedit still shows the message about UTF-8). I will upgrade the red hat box to gedit2 in short. Please, test if the text component in nautilus works for you.
Hi, I updated gnome2 in the RH7.2 box. It displays louie.txt correctly both with the text component and gedit. Also, updated gedit on my box, I gedit still tells me that it is not UTF-8, but the text component shows it. What is the text component doing that gedit does not?
Please, update your gedit to current CVS HEAD and let me know.
I've been waiting a day so anoncvs is updated for sure. It still happens here.
Hi! I updated gedit from cvs today and the bug is gone. I also tried with the other bug attachment and it also works. Can the reporter please test this?
attachment 9462 [details] is ok to load on 0714 snapshot of ximian red-carpet on rh72.
And what about attachment 7547 [details]?
7547 is broken. EUC-KR the multibyte character appears to be each single byte. I think gedit failed to check this file with EUC-KR and fallback to ISO-8859-1, I have no idea how to deal the broken widechar, like the 7547 have in.
Created attachment 9870 [details] broken EUC-KR text loaded in ISO-8859 encoding.
The algorith used by gedit is: 1. Try to load the file as UTF-8 2. If it fails, load it using current locale encoding 3. It if fails, load it using ISO 8859-15 4. If it fails, display an error message. Have you a better idea?
I can't load this file (ko.po) in 2.1 when I add "Korean" to the input filters using preferences.
Andrew: could you please attach the file you are referring to? If you are referring to the ko.po file already attached. It was broken and there is no way for gedit to display broken files (or binary files).
I was referring to the ko.po already attached. Is this bug fixed now, with the new input encoding preferences etc?
I think it is fixed. Closing
This still happens with .doc files and several other extensions. The dropdown menu for encoding does not help at all. Reported at https://bugs.edge.launchpad.net/gedit/+bug/575500