Bug 156199 – Can't open files with unknown encoding (binary file)

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 156199 - Can't open files with unknown encoding (binary file)


Summary:	Can't open files with unknown encoding (binary file)


Status:	RESOLVED FIXED

Product:	gedit
Classification:	Applications
Component:	general
Version:	2.28.x
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	Gedit maintainers
QA Contact:	Gedit maintainers

URL:
Whiteboard:

Duplicates:	119426 305809 488801 (view as bug list)
Depends on:
Blocks:

Reported:	2004-10-23 00:04 UTC by Johan Ersvik
Modified:	2015-11-27 00:51 UTC

See Also:
GNOME target:	---
GNOME version:	Unversioned Enhancement

Attachments
NUL-replacement patch (570 bytes, patch) 2008-10-06 19:25 UTC, coda	none	Details \| Review

Description Johan Ersvik 2004-10-23 00:04:19 UTC

Please describe the problem:
It's not possible to open a file with undetectable character encoding

Steps to reproduce:
1. Try to open this file: http://navi.cx/~mike/inkscape/0.39/inkscape-0.39.package


Actual results:
An error dialog...

Expected results:
That the file was opened with the desktop default encoding.

Does this happen every time?
yes

Other information:

Comment 1 Paolo Maggi 2004-11-10 18:38:25 UTC

Note that inkscape-0.39.package contains binary data.
For this reason gedit is not able to open it.

Why do you expect that the file was opened with the desktop default encoding?

Comment 2 Johan Ersvik 2004-11-10 19:55:02 UTC

In this case it contained text that I was curious about and opening it at all is
better than not being able to see what it looks like. Even if only to see that
the file is unreadable. Maybe default to ascii or hex codes instead, but that's
not the main issue.

Comment 3 Josh Lee 2005-01-13 21:37:11 UTC

Perhaps gedit should suggest that you might want to open it with ghex2.

For example, when microsoft notepad encounters a file that is too large for its
silly brain, it offers to launch wordpad instead. How does this sound?

Comment 4 Christian Kirbach 2005-03-30 16:31:53 UTC

*** Bug 119426 has been marked as a duplicate of this bug. ***

Comment 5 Brent Smith (smitten) 2005-07-21 17:09:10 UTC

trying to open a binary file in 2.10.2 results in the following message:

"Could not open the file "/home/testuser/test.bin"

gedit was not able to automatically detect the character coding. Please, check
that you are not trying to open a binary file and try again selecting a
character coding in the 'Open File...' (or 'Open Location') dialog."

I'm going to confirm this bug as a feature enhancement, please correct me if I'm
wrong here.

Comment 6 Brent Smith (smitten) 2005-07-21 17:11:33 UTC

*** Bug 305809 has been marked as a duplicate of this bug. ***

Comment 7 Brent Smith (smitten) 2005-07-21 17:14:40 UTC

setting this to unversion enhancement for the gnome version.

Comment 8 Alan Siu-Lung Tam 2006-04-05 03:27:05 UTC

Sometimes allowing me to open it using an arbitrary encoding may make me easier to determine which encoding it is. Imagine I usually work on only a couple of encodings, and I have some ability to identify what they look like if opened in a wrong encoding. It doesn't open the file at all, so I have to go to gnome-terminal, "head" the file and then go back to gedit to choose the right encoding.

I remembered in some old gedit, it does not prevent me from opening the file, and  give me a file opened with strange characters. Hence, I think this is a regression.

Comment 9 Christian Persch 2006-04-08 08:29:00 UTC

Maybe gedit could use the universal charset detector from mozilla to autodetect the encoding. (You can use it from XPCOM, or build a standalone copy.)

Comment 10 Alan Siu-Lung Tam 2006-04-08 08:49:51 UTC

The point is: if the file cannot be opened by any encoding gedit supports, gedit refuses to open the file.

However, we think that there are legitimate reasons to want to open the file, even though we know that some characters in the file cannot be represented in that encoding. Maybe you can give me a warning, but I still want to do it.

This is not related to whether gedit is smart enough to find me a suitable encoding.

Comment 11 Brian 2006-05-26 15:51:09 UTC

I have this problem with gedit all the time.  It could at least display it as a [] charater or something that signifies that it is a binary character.

Using ghex2 is unacceptable for viewing files that are mostly text, with binary sparsely distributed.  They loose the text formatting that is actually there and is tough to read.  Sure I can use vim, but to not have a feature embedded in a Desktop when Windows has had as the default is yet another reason for some less savvy folks to want to use MS Windows.  Gnome is a great desktop, but stuff like this is really annoying.

The point is it should warn warn warn, open the file for read only (most people usually don't wanna edit the binary, just see it), display the binary characters halfassed, or not at all, but it should not freaking fail.

Comment 12 Sebastien Bacher 2006-12-11 09:05:43 UTC

Ubuntu bug about that: https://launchpad.net/distros/ubuntu/+source/gedit/+bug/75151

Comment 13 Fluxid 2007-05-12 19:29:24 UTC

I have just tried to open output of my C application in gedit. This is how i check its contents:

$ cat entities8.txt
thetasym        0x03D1

As you see, it has only one line (including \n at the end), with NO SPECIAL CHARACTERS.
Of course, it's not a binary file.

When i try to open it in gedit, it says that it can't recognize encoding.
but it's just plain ASCII!
Why i can't open file WITH NO ENCODING?

ah yeah, mime-type is text/plain

Comment 14 Fluxid 2007-05-12 19:38:53 UTC

SORRY!
it's my fault
i forgot i put to this file \0 character...

anyway, it should just inform about unrecognized encoding, and there should be something like "ignore" button to open it anyway.

Comment 15 nul.all 2008-08-31 19:48:34 UTC

Is this something that is going to happen? It seems like it would be an easy bug to fix and for some of us it is a huge usability issue.
I would attach my example file of a corrupted text document, but unfortunately, it is a list of friends' phone numbers - the point is, ghex is not at all the right application for opening it. It is fundamentally a text file, but it has some weird characters in it, but Microsoft Notepad opens it without an issue...

Comment 16 Lars Gaarden 2008-09-13 12:02:30 UTC

4 years and nothing?

With gedit set as the default text editor on Ubuntu, I run into this issue rather annoyingly frequently. There are lots of files out there that are mostly text but happens to have a few unprintable characters. To have gedit at the very least *show* these files would be very useful, perhaps with a warning on save.

Comment 17 coda 2008-10-06 19:25:01 UTC

Created attachment 120044 [details] [review]
NUL-replacement patch

This patch replaces NUL bytes with '?' character. This should allow you to at least view every type of file. There is no warning that the NUL bytes have been replaced, so be careful not to save over the original.

Comment 18 stlevo 2009-11-12 10:58:38 UTC

still not fixed in 2.28.0? Is there any reason that everybody ignores this issue?
Seems easy to fix or at least create some temporary workaround. It's very annoyng to install second editor just to view one file, because gedit refuses to open it. 

␀␁␂␃␄␅␆␇␈␉␊␋␌␍␎␏␐␑␒␓␔␕␖␗␘␙␚␛␜␝␞␟␠␡

Comment 19 logari81 2010-02-19 23:44:03 UTC

what is the problem with coda's patch? Most editors have their way to handle null characters. Is there a gedit policy which blocks the adoption of coda's patch? Are there other plans/ideas/requirements?

Comment 20 Ignacio Casal Quinteiro (nacho) 2010-02-20 10:18:40 UTC

In this cycle we reworked all the load and saving to use always gio streams and filter streams, so now it would be possible to implement this, though we still need some api in gio to mark the regions with invalid chars, so we can convert them to a visible char i.e ? and reconvert it to a invalid char when saving it. If someone wants to make it, ask for it and we'll provide more info.

Comment 21 André Klapper 2010-03-03 23:00:51 UTC

*** Bug 488801 has been marked as a duplicate of this bug. ***

Comment 22 Sébastien Wilmet 2014-08-11 20:32:29 UTC

Binary files can now be opened, but it can be very slow, due to performance problems in GtkTextView with very long lines.

See bug #721632 and bug #727777 for potential solutions to the slowness (and of course the bug in GtkTextView to improve performances of very long lines).