GNOME Bugzilla – Bug 156199
Can't open files with unknown encoding (binary file)
Last modified: 2015-11-27 00:51:53 UTC
Please describe the problem: It's not possible to open a file with undetectable character encoding Steps to reproduce: 1. Try to open this file: http://navi.cx/~mike/inkscape/0.39/inkscape-0.39.package Actual results: An error dialog... Expected results: That the file was opened with the desktop default encoding. Does this happen every time? yes Other information:
Note that inkscape-0.39.package contains binary data. For this reason gedit is not able to open it. Why do you expect that the file was opened with the desktop default encoding?
In this case it contained text that I was curious about and opening it at all is better than not being able to see what it looks like. Even if only to see that the file is unreadable. Maybe default to ascii or hex codes instead, but that's not the main issue.
Perhaps gedit should suggest that you might want to open it with ghex2. For example, when microsoft notepad encounters a file that is too large for its silly brain, it offers to launch wordpad instead. How does this sound?
*** Bug 119426 has been marked as a duplicate of this bug. ***
trying to open a binary file in 2.10.2 results in the following message: "Could not open the file "/home/testuser/test.bin" gedit was not able to automatically detect the character coding. Please, check that you are not trying to open a binary file and try again selecting a character coding in the 'Open File...' (or 'Open Location') dialog." I'm going to confirm this bug as a feature enhancement, please correct me if I'm wrong here.
*** Bug 305809 has been marked as a duplicate of this bug. ***
setting this to unversion enhancement for the gnome version.
Sometimes allowing me to open it using an arbitrary encoding may make me easier to determine which encoding it is. Imagine I usually work on only a couple of encodings, and I have some ability to identify what they look like if opened in a wrong encoding. It doesn't open the file at all, so I have to go to gnome-terminal, "head" the file and then go back to gedit to choose the right encoding. I remembered in some old gedit, it does not prevent me from opening the file, and give me a file opened with strange characters. Hence, I think this is a regression.
Maybe gedit could use the universal charset detector from mozilla to autodetect the encoding. (You can use it from XPCOM, or build a standalone copy.)
The point is: if the file cannot be opened by any encoding gedit supports, gedit refuses to open the file. However, we think that there are legitimate reasons to want to open the file, even though we know that some characters in the file cannot be represented in that encoding. Maybe you can give me a warning, but I still want to do it. This is not related to whether gedit is smart enough to find me a suitable encoding.
I have this problem with gedit all the time. It could at least display it as a [] charater or something that signifies that it is a binary character. Using ghex2 is unacceptable for viewing files that are mostly text, with binary sparsely distributed. They loose the text formatting that is actually there and is tough to read. Sure I can use vim, but to not have a feature embedded in a Desktop when Windows has had as the default is yet another reason for some less savvy folks to want to use MS Windows. Gnome is a great desktop, but stuff like this is really annoying. The point is it should warn warn warn, open the file for read only (most people usually don't wanna edit the binary, just see it), display the binary characters halfassed, or not at all, but it should not freaking fail.
Ubuntu bug about that: https://launchpad.net/distros/ubuntu/+source/gedit/+bug/75151
I have just tried to open output of my C application in gedit. This is how i check its contents: $ cat entities8.txt thetasym 0x03D1 As you see, it has only one line (including \n at the end), with NO SPECIAL CHARACTERS. Of course, it's not a binary file. When i try to open it in gedit, it says that it can't recognize encoding. but it's just plain ASCII! Why i can't open file WITH NO ENCODING? ah yeah, mime-type is text/plain
SORRY! it's my fault i forgot i put to this file \0 character... anyway, it should just inform about unrecognized encoding, and there should be something like "ignore" button to open it anyway.
Is this something that is going to happen? It seems like it would be an easy bug to fix and for some of us it is a huge usability issue. I would attach my example file of a corrupted text document, but unfortunately, it is a list of friends' phone numbers - the point is, ghex is not at all the right application for opening it. It is fundamentally a text file, but it has some weird characters in it, but Microsoft Notepad opens it without an issue...
4 years and nothing? With gedit set as the default text editor on Ubuntu, I run into this issue rather annoyingly frequently. There are lots of files out there that are mostly text but happens to have a few unprintable characters. To have gedit at the very least *show* these files would be very useful, perhaps with a warning on save.
Created attachment 120044 [details] [review] NUL-replacement patch This patch replaces NUL bytes with '?' character. This should allow you to at least view every type of file. There is no warning that the NUL bytes have been replaced, so be careful not to save over the original.
still not fixed in 2.28.0? Is there any reason that everybody ignores this issue? Seems easy to fix or at least create some temporary workaround. It's very annoyng to install second editor just to view one file, because gedit refuses to open it. ␀␁␂␃␄␅␆␇␈␉␊␋␌␍␎␏␐␑␒␓␔␕␖␗␘␙␚␛␜␝␞␟␠␡
what is the problem with coda's patch? Most editors have their way to handle null characters. Is there a gedit policy which blocks the adoption of coda's patch? Are there other plans/ideas/requirements?
In this cycle we reworked all the load and saving to use always gio streams and filter streams, so now it would be possible to implement this, though we still need some api in gio to mark the regions with invalid chars, so we can convert them to a visible char i.e ? and reconvert it to a invalid char when saving it. If someone wants to make it, ask for it and we'll provide more info.
*** Bug 488801 has been marked as a duplicate of this bug. ***
Binary files can now be opened, but it can be very slow, due to performance problems in GtkTextView with very long lines. See bug #721632 and bug #727777 for potential solutions to the slowness (and of course the bug in GtkTextView to improve performances of very long lines).