GNOME Bugzilla – Bug 648846
content sniffing should not be based on file extension
Last modified: 2011-09-22 17:43:53 UTC
Hi, currently, in sniff_unknown, when no matching row for types_table has been found, libsoup uses gio function "g_content_type_guess". That functions returns a content-type based on filename[1]. But media-type-sniffing specification tells: "It is essential that file extensions are not used for determining the media type for octets fetched over HTTP". So, if I understand correctly, both libsoup and the spec, libsoup behavior should be changed here. Here is a page returning an empty content-type. If I understand correctly, it should be understood as text/plain. Actually, some browsers I tested display it as text file (firefox, chromium). http://renevier.net/misc/content_type.css But with a browser using libsoup, it's understood as text/css, and browser does not display it as text [1]: http://git.gnome.org/browse/glib/tree/gio/gcontenttype.c?id=549d895fa4e9c4b5a35c5d9da4db39ca11dc71cc#n910 [2] http://tools.ietf.org/html/draft-abarth-mime-sniff-06#page-5
Created attachment 186798 [details] [review] patch v1 patch proposal
Review of attachment 186798 [details] [review]: I am not strongly opposed, though we do lose a bit of functionality unfortunately. Notice that we are protecting against the worst problem in the current code, that would be upgrading it to something the browser would try to render as a script or html. Perhaps we need a g_content_type_guess that acts conservatively =P. ::: libsoup/soup-content-sniffer.c @@ +307,3 @@ + for (i = 0; i < resource_length; i++) { + if (byte_looks_binary[resource[i]]) { + return g_strdup("application/octet-stream"); This is indented by spaces instead of tabs like the rest.
Created attachment 186889 [details] [review] patch v1.1 same patch with some style fixup
(In reply to comment #2) > Review of attachment 186798 [details] [review]: > > I am not strongly opposed, though we do lose a bit of functionality > unfortunately. Notice that we are protecting against the worst problem in the > current code, that would be upgrading it to something the browser would try to > render as a script or html. Perhaps we need a g_content_type_guess that acts > conservatively =P. I think I still see a problem with current implementation: Spec warns about following scenario[1]: > in some cases, file extensions can be supplied by malicious > parties. For example, most PHP installations let the attacker > append arbitrary path information to URLs (e.g., > http://example.com/foo.php/bar.html) and thereby determine the > file extension. And also about this one: > where, e.g., a server uses the above table to determine that > content is not HTML and thus safe from cross-site scripting > attacks, but then a user agent detects it as HTML anyway and > allows script to execute Consider worse case when server exhibits those two behaviors: If server receives input: - if it matches a pattern of the spec, sanitize it to remove scripts, attributes etc - otherwise, do not sanitize anything and send it as unknown/unknown relying on the browser to not display it as html Imagine server also allow user to specify file extension. Then, user can send some html content (not matching a pattern of the spec), set the extension to html, and that content will be executed as html. Of course, problem should be fixed on the server, but the point of the spec is to protect user even when the server is misconfigured. [1]: http://tools.ietf.org/html/draft-abarth-mime-sniff-06
Comment on attachment 186889 [details] [review] patch v1.1 Looks right. can you remove fixup_xdg_dirs() from sniffing-test too? It's only needed to make gio sniffing work Do you have a GNOME git account or do you need me to commit this?
Created attachment 187098 [details] [review] patch v1.2 diff with previous patch: removes fixup_xdg_dirs() also removes a useless trailing whitespace. And I don't have a GNOME git account. (In reply to comment #5) > Do you have a GNOME git account or do you need me to commit this? No, I don't have a GNOME git account.
belatedly, pushed (with a slightly-rewritten commit message). Thanks for the patch
Apparently this change broke one of the webkit unit tests: /webkit/WebKitBuild/Release/Programs/unittests(svn-trunk)$ ./testmimehandling /webkit/mime/remote-PDF: No bp log location saved, using default. OK /webkit/mime/remote-HTML: OK /webkit/mime/remote-TEXT: OK /webkit/mime/remote-OGG: ** ERROR:../../Source/WebKit/gtk/tests/testmimehandling.c:128:mime_type_policy_decision_requested_cb: assertion failed (mime_type == "audio/x-vorbis+ogg"): ("application/octet-stream" == "audio/x-vorbis+ogg") Aborted (core dumped) Should we add support again for this in soup now that we are not using gio?
I vote for changing the test. That's not part of the spec, and it should not be a problem to not support it, I guess.
it's also possible that bug 648849 fixes that case
Yeah, I think bug 648849 should only add support for application/ogg only, though, I haven't seen any mention of audio/x-vorbis+ogg, I think that's a peculiarity of gio that we were testing for.
FYI we removed the test in the last release, thanks for the comments.