After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 755795 - 2.46 considers empty files as octet-stream rather than text (leads to gedit not opening those)
2.46 considers empty files as octet-stream rather than text (leads to gedit n...
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gio
2.46.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks: 770017
 
 
Reported: 2015-09-29 15:11 UTC by Sebastien Bacher
Modified: 2016-08-19 10:15 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
GLocalFile: return text/plain for empty files (2.50 KB, patch)
2015-09-29 15:22 UTC, Allison Karlitskaya (desrt)
committed Details | Review
Only fallback to text/plain for 0-length files if we don't have an idea what it is (4.10 KB, patch)
2015-09-30 10:16 UTC, Iain Lane
none Details | Review
Only fallback to text/plain for 0-length files if we don't have an idea what it is (4.17 KB, patch)
2015-09-30 10:20 UTC, Iain Lane
none Details | Review

Description Sebastien Bacher 2015-09-29 15:11:07 UTC
That has been reported on https://bugs.launchpad.net/ubuntu/+source/xdg-utils/+bug/1497170 and is likely due to https://git.gnome.org/browse/glib/commit/?id=b6fc1df022a0326e7c36122b1416061bf796c98f
"GLocalFileInfo: don't content-sniff zero-length files"

in those case would it make sense to return "text/plain" so editors can open those?
Comment 1 Allison Karlitskaya (desrt) 2015-09-29 15:22:35 UTC
Created attachment 312365 [details] [review]
GLocalFile: return text/plain for empty files

Previously, GLib returned text/plain for empty files.

This is important because people may want to open empty (eg:
just-created) text files with the text editor.

An unintended side-effect of b6fc1df022a0326e7c36122b1416061bf796c98f
caused GLib to start returning application/octet-stream instead of
text/plain for these files.

This commit is essentially a revert of that commit, with a different
solution: we move the special-case up a bit in the function and
hard-code it to text/plain.

This change does not exactly maintain the old behaviour: previously, a
"fast" lookup would have returned application/octet-stream on an empty
file and now it will return text/plain.  I consider this to be an
improvement (since we're returning better data) and don't expect it to
cause problems.
Comment 2 Colin Walters 2015-09-29 15:26:16 UTC
Review of attachment 312365 [details] [review]:

LGTM, just one comment comment.

::: gio/glocalfileinfo.c
@@ +1246,3 @@
+       * that appear normal but are not (eg: files in /proc and /sys)
+       */
+      return g_content_type_from_mime_type ("text/plain");

I think it's worth maintaining a link to this bug report in the comment
here, and/or:

/* We have historically returned text/plain for 0 length files */
Comment 3 Allison Karlitskaya (desrt) 2015-09-29 16:30:43 UTC
Attachment 312365 [details] pushed as 202a9c3 - GLocalFile: return text/plain for empty files
Comment 4 Allison Karlitskaya (desrt) 2015-09-29 16:31:55 UTC
(and backported to 2.46, since this is a compatibility fix)
Comment 5 Iain Lane 2015-09-29 16:49:05 UTC
It seems that historically we have trusted the extension and only returned text/plain for 0-length files with no extension.

With glib 2.44

  -rw-rw-r-- 1 laney laney 0 Sep 29 17:23 /home/laney/temp/a.ini

  standard::content-type: application/x-wine-extension-ini
  standard::fast-content-type: application/x-wine-extension-ini

and with master

  standard::content-type: text/plain
  standard::fast-content-type: text/plain

Maybe move it after g_content_type_guess and do this if result_uncertain? That seems to get the old behaviour back for me at least.
Comment 6 Colin Walters 2015-09-29 17:47:21 UTC
Reopening based on comment #5
Comment 7 Matthias Clasen 2015-09-29 17:58:27 UTC
I'm not convinced the old behavior is worth preserving, really. Or is it ?
Comment 8 Colin Walters 2015-09-29 19:05:01 UTC
I'd flip that around - what's the value of changing behavior?
Comment 9 Iain Lane 2015-09-30 10:16:52 UTC
Created attachment 312415 [details] [review]
Only fallback to text/plain for 0-length files if we don't have an idea what it is

For example, 0-length foo.exe has historically been
application/x-ms-dos-executable
Comment 10 Iain Lane 2015-09-30 10:20:05 UTC
Created attachment 312416 [details] [review]
Only fallback to text/plain for 0-length files if we don't have an idea what it is

For example, 0-length foo.exe has historically been
application/x-ms-dos-executable
Comment 11 Matthias Clasen 2015-09-30 13:27:15 UTC
I don't know. An empty file is rarely a valid anything, other than a text file.

For instance, GKeyFile will complain if you feed it an empty .ini file.
Comment 12 Matthias Clasen 2015-09-30 13:27:39 UTC
and ms dos would complain if you tried to make it run an 0-length .exe file
Comment 13 Iain Lane 2015-09-30 14:28:30 UTC
I think it would be better to do behaviour changes on purpose instead of accidentally.

Don't know what might rely on this previous behaviour but I wouldn't at all be surprised if something did, and we shouldn't randomly change it without being sure it's the right thing to do.
Comment 14 Iain Lane 2015-10-06 13:40:21 UTC
Would someone decide what to do please? If we are going to change the behaviour back then we should do it before .1 if possible.
Comment 15 Matthias Clasen 2015-10-06 14:03:06 UTC
Get your colleague to comment - afair, desrt considered this change an improvement.
Comment 16 Allison Karlitskaya (desrt) 2015-10-09 13:40:25 UTC
kind of annoying that "Create New Text Document" won't work if I happen to choose a bad name....
Comment 17 Allison Karlitskaya (desrt) 2015-10-09 13:41:47 UTC
if you can find some actual real case this is broken by this I say let's go with your change.  Otherwise, I'm still fairly convinced that this is an improvement.
Comment 18 Iain Lane 2015-10-10 19:05:01 UTC
No I don't have one right now. I'm just saying that it won't be surprising to me to find something that expects the old behaviour.

If you don't mind this chance then just close the bug and then when something comes up you get to tell people to fix their code.
Comment 19 Allison Karlitskaya (desrt) 2015-10-14 08:41:20 UTC
I think we're convinced that the behaviour, as it stands, is the best it can be.
Comment 20 Bastien Nocera 2015-10-14 09:16:22 UTC
The correct mime-type for empty files is application/x-zerosize
GEdit can then assign itself that.
Comment 21 Michi Henning 2015-10-21 01:36:24 UTC
This change has broken our code, unfortunately.

Previously, for a file called "empty.mp3",
g_file_query_info(..., G_FILE_ATTRIBUTE_STANDARD_FAST_CONTENT_TYPE, ...)
returned "audio/mpeg". Now, for the same file, we get "text/plain".

Would it be better to pay attention to the file extension first and use that as the initial guess, and then do the zero-size check after that? This would be more compatible with the old behavior.
Comment 22 Allison Karlitskaya (desrt) 2015-10-21 08:42:15 UTC
Why is there an empty file called "empty.mp3"?  This is, objectively, not an mp3 file...
Comment 23 Allison Karlitskaya (desrt) 2015-10-21 08:45:12 UTC
Also: can you please describe the problem that this causes in terms of an actual piece of software that used to work and no longer works correctly?
Comment 24 Michi Henning 2015-10-21 12:24:42 UTC
It broke one of our unit tests. The tests make sure that we deal with corrupt MP3 files correctly, so one test uses an empty file called "empty.mp3" to check this. Because of the changed content type, the code took an unexpected path that triggered the unit test failure.

In terms of fixing it, it's not such a big deal. The code now takes both old and new behavior into account. But it was a needless break.

The real question here is how to best deal with empty files. What is the content type of empty files that are called empty.txt, empty.tiff, empty.mp3, etc? I think it's reasonable to just take the file extension and base the guess on that. Looking whether the file is empty before doing anything else and concluding that it is a text file is reasonable too, IMO (although it seems a little strange to say that a file called x.mp3 is a text file just because it's empty). It's a borderline case, I understand.

What broke our code is that the content type *changed* from the previous version. If it had been "text/plain" all along, we could have handled that too.

One final thought: what should the content type of a file called "myprog.cpp" be if that file is empty? (An empty file is a legal C++ source file that compiles just fine.) I guess one could come up with other semi-plausible cases along the same lines.
Comment 25 Bastien Nocera 2015-10-21 12:26:40 UTC
application/x-zerosize! It should be application/x-zerosize (unless you only have the filename obviously, no way to know it's empty by looking at just the name).
Comment 26 Michi Henning 2015-10-21 12:34:24 UTC
application/x-zerosize works for me. As long as what is returned for empty files is well-defined and doesn't change without warning :-)

(As an aside, our code uses G_FILE_ATTRIBUTE_STANDARD_FAST_CONTENT_TYPE first and, if it comes up with a file type that we are interested in, we believe the answer. Otherwise, if it says "application/octet-stream", we do a deep inspection of the file to find out what's inside.)

Maybe doing a stat of the file and checking for zero-size should be done only if the file name doesn't produce a guess?
Comment 27 Michi Henning 2015-10-21 13:04:57 UTC
As an aside, when I call with G_FILE_ATTRIBUTE_STANDARD_FAST_CONTENT_TYPE, it would be ultra-nice to state in the doc what will be returned if no guess can be made. Right now, it just says:

"A key in the "standard" namespace for getting the fast content type. The fast content type isn't as reliable as the regular one, as it only uses the filename to guess it, but it is faster to calculate than the regular content type. Corresponding GFileAttributeType is G_FILE_ATTRIBUTE_TYPE_STRING."
Comment 28 Matthias Clasen 2015-10-21 14:23:26 UTC
(In reply to Bastien Nocera from comment #25)
> application/x-zerosize! It should be application/x-zerosize (unless you only
> have the filename obviously, no way to know it's empty by looking at just
> the name).

application/x-zerosize fails the most important use case - can I create an empty file and open it in my text editor
Comment 29 Bastien Nocera 2015-10-21 14:32:10 UTC
(In reply to Matthias Clasen from comment #28)
> (In reply to Bastien Nocera from comment #25)
> > application/x-zerosize! It should be application/x-zerosize (unless you only
> > have the filename obviously, no way to know it's empty by looking at just
> > the name).
> 
> application/x-zerosize fails the most important use case - can I create an
> empty file and open it in my text editor

Add application/x-zerosize to your text editor's mime-types?
Comment 30 Allison Karlitskaya (desrt) 2015-10-27 14:22:29 UTC
I'm sorry I broke your testsuite, Michi, and thanks for the feedback on the change.

Given that you've already fixed the problem there, and that I continue to believe that the new behaviour is more logical, I'm going to close this again until someone comes with new information or better ideas.

I don't consider application/x-zerosize to be reasonable since this would be an even larger deviation from existing behaviour and it wouldn't improve the situation with regards to text editors (today).
Comment 31 Michi Henning 2015-10-27 22:49:09 UTC
It wasn't a big deal to fix, so no harm done. On reflection, I agree with rejecting application/x-zerosize, given the backward compatibility concern.

My one comment would be that the following would deal a little better with the borderline case:

For empty files only:
- report "text/plain" if the file does not have an extension
- report the file type implied by the extension if the extension is recognized
- report "text/plain" if the extension is not recognized

I think this would be less surprising.
Comment 32 Carlos Soriano 2016-08-19 09:50:24 UTC
Users of Nautilus reported about this.

I think saying that it's text/plain is wrong because seems people use templates like empty files with extensions, so opening gedit for test.odt goes against the expectations.

I would say either do the mime type guess here https://git.gnome.org/browse/glib/tree/gio/glocalfileinfo.c#n1239 with g_content_type_guess so it reports the application appropriate for that extension, or return application/x-zerosize and let the clients like Nautilus figure out what to do.
Comment 33 Emmanuele Bassi (:ebassi) 2016-08-19 10:00:14 UTC
(In reply to Carlos Soriano from comment #32)
> Users of Nautilus reported about this.
> 
> I think saying that it's text/plain is wrong because seems people use
> templates like empty files with extensions, so opening gedit for test.odt
> goes against the expectations.

To be fair, the appropriate way to create a template file is to create an actual "blank" file with the application and save it into Templates, not use `touch somerandomefile.somerandomextension`.

> I would say either do the mime type guess here
> https://git.gnome.org/browse/glib/tree/gio/glocalfileinfo.c#n1239 with
> g_content_type_guess so it reports the application appropriate for that
> extension, or return application/x-zerosize and let the clients like
> Nautilus figure out what to do.

Both strategies have issues with backward compatibility.
Comment 34 Carlos Soriano 2016-08-19 10:15:07 UTC
(In reply to Emmanuele Bassi (:ebassi) from comment #33)
> (In reply to Carlos Soriano from comment #32)
> > Users of Nautilus reported about this.
> > 
> > I think saying that it's text/plain is wrong because seems people use
> > templates like empty files with extensions, so opening gedit for test.odt
> > goes against the expectations.
> 
> To be fair, the appropriate way to create a template file is to create an
> actual "blank" file with the application and save it into Templates, not use
> `touch somerandomefile.somerandomextension`.

I completely agree, kinda makes me angry when it's not done in that way :) but we can try to still do our best.

> 
> > I would say either do the mime type guess here
> > https://git.gnome.org/browse/glib/tree/gio/glocalfileinfo.c#n1239 with
> > g_content_type_guess so it reports the application appropriate for that
> > extension, or return application/x-zerosize and let the clients like
> > Nautilus figure out what to do.
> 
> Both strategies have issues with backward compatibility.

Right, although it looks more like a bug than anything else.