After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 648846 - content sniffing should not be based on file extension
content sniffing should not be based on file extension
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: Misc
unspecified
Other Linux
: Normal normal
: ---
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
Depends on:
Blocks: 648849
 
 
Reported: 2011-04-28 07:18 UTC by arno
Modified: 2011-09-22 17:43 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch v1 (6.04 KB, patch)
2011-04-28 07:23 UTC, arno
reviewed Details | Review
patch v1.1 (5.99 KB, patch)
2011-04-29 14:41 UTC, arno
needs-work Details | Review
patch v1.2 (5.85 KB, patch)
2011-05-03 05:50 UTC, arno
committed Details | Review

Description arno 2011-04-28 07:18:23 UTC
Hi,
currently, in sniff_unknown, when no matching row for types_table has been found, libsoup uses gio function "g_content_type_guess".
That functions returns a content-type based on filename[1]. But media-type-sniffing specification tells: "It is essential that file extensions are not used for determining the media type for octets fetched over HTTP". So, if I understand correctly, both libsoup and the spec, libsoup behavior should be changed here.

Here is a page returning an empty content-type. If I understand correctly, it should be understood as text/plain. Actually, some browsers I tested display it as text file (firefox, chromium).
http://renevier.net/misc/content_type.css
But with a browser using libsoup, it's understood as text/css, and browser does not display it as text

[1]: http://git.gnome.org/browse/glib/tree/gio/gcontenttype.c?id=549d895fa4e9c4b5a35c5d9da4db39ca11dc71cc#n910
[2] http://tools.ietf.org/html/draft-abarth-mime-sniff-06#page-5
Comment 1 arno 2011-04-28 07:23:59 UTC
Created attachment 186798 [details] [review]
patch v1

patch proposal
Comment 2 Gustavo Noronha (kov) 2011-04-28 15:05:22 UTC
Review of attachment 186798 [details] [review]:

I am not strongly opposed, though we do lose a bit of functionality unfortunately. Notice that we are protecting against the worst problem in the current code, that would be upgrading it to something the browser would try to render as a script or html. Perhaps we need a g_content_type_guess that acts conservatively =P.

::: libsoup/soup-content-sniffer.c
@@ +307,3 @@
+	for (i = 0; i < resource_length; i++) {
+		if (byte_looks_binary[resource[i]]) {
+            return g_strdup("application/octet-stream");

This is indented by spaces instead of tabs like the rest.
Comment 3 arno 2011-04-29 14:41:30 UTC
Created attachment 186889 [details] [review]
patch v1.1

same patch with some style fixup
Comment 4 arno 2011-04-29 16:41:40 UTC
(In reply to comment #2)
> Review of attachment 186798 [details] [review]:
> 
> I am not strongly opposed, though we do lose a bit of functionality
> unfortunately. Notice that we are protecting against the worst problem in the
> current code, that would be upgrading it to something the browser would try to
> render as a script or html. Perhaps we need a g_content_type_guess that acts
> conservatively =P.

I think I still see a problem with current implementation:

Spec warns about following scenario[1]:
> in some cases, file extensions can be supplied by malicious
> parties.  For example, most PHP installations let the attacker
> append arbitrary path information to URLs (e.g.,
> http://example.com/foo.php/bar.html) and thereby determine the
> file extension.

And also about this one:
> where, e.g., a server uses the above table to determine that
> content is not HTML and thus safe from cross-site scripting
> attacks, but then a user agent detects it as HTML anyway and
> allows script to execute

Consider worse case when server exhibits those two behaviors:

If server receives input:
- if it matches a pattern of the spec, sanitize it to remove scripts, attributes etc
- otherwise, do not sanitize anything and send it as unknown/unknown relying on the browser to not display it as html

Imagine server also allow user to specify file extension.

Then, user can send some html content (not matching a pattern of the spec), set the extension to html, and that content will be executed as html.

Of course, problem should be fixed on the server, but the point of the spec is to protect user even when the server is misconfigured.

[1]: http://tools.ietf.org/html/draft-abarth-mime-sniff-06
Comment 5 Dan Winship 2011-05-02 21:08:48 UTC
Comment on attachment 186889 [details] [review]
patch v1.1

Looks right. can you remove fixup_xdg_dirs() from sniffing-test too? It's only needed to make gio sniffing work

Do you have a GNOME git account or do you need me to commit this?
Comment 6 arno 2011-05-03 05:50:48 UTC
Created attachment 187098 [details] [review]
patch v1.2

diff with previous patch:
removes fixup_xdg_dirs()
also removes a useless trailing whitespace.

And I don't have a GNOME git account.

(In reply to comment #5)
> Do you have a GNOME git account or do you need me to commit this?

No, I don't have a GNOME git account.
Comment 7 Dan Winship 2011-08-08 22:03:49 UTC
belatedly, pushed (with a slightly-rewritten commit message).
Thanks for the patch
Comment 8 Alejandro G. Castro 2011-09-18 21:39:43 UTC
Apparently this change broke one of the webkit unit tests:

/webkit/WebKitBuild/Release/Programs/unittests(svn-trunk)$ ./testmimehandling 
/webkit/mime/remote-PDF: No bp log location saved, using default.
OK
/webkit/mime/remote-HTML: OK
/webkit/mime/remote-TEXT: OK
/webkit/mime/remote-OGG: **
ERROR:../../Source/WebKit/gtk/tests/testmimehandling.c:128:mime_type_policy_decision_requested_cb: assertion failed (mime_type == "audio/x-vorbis+ogg"): ("application/octet-stream" == "audio/x-vorbis+ogg")
Aborted (core dumped)

Should we add support again for this in soup now that we are not using gio?
Comment 9 Gustavo Noronha (kov) 2011-09-19 03:54:03 UTC
I vote for changing the test. That's not part of the spec, and it should not be a problem to not support it, I guess.
Comment 10 Dan Winship 2011-09-19 11:30:51 UTC
it's also possible that bug 648849 fixes that case
Comment 11 Gustavo Noronha (kov) 2011-09-19 11:56:19 UTC
Yeah, I think bug 648849 should only add support for application/ogg only, though, I haven't seen any mention of audio/x-vorbis+ogg, I think that's a peculiarity of gio that we were testing for.
Comment 12 Alejandro G. Castro 2011-09-22 17:43:53 UTC
FYI we removed the test in the last release, thanks for the comments.