GNOME Bugzilla – Bug 547020
HEAD request not allowed/working breaks query_info (e.g. Amazon S3, IMDB)
Last modified: 2015-03-01 17:12:34 UTC
+++ This bug was initially created as a clone of Bug #545000 +++ Amazon S3 allows generation of links with an expiration date. An example of such a link is: https://d134w4tst3t.s3.amazonaws.com:443/a?Signature=6VJ9%2BAdPVZ4Z7NnPShRvtDsLofc%3D&Expires=1249330377&AWSAccessKeyId=0EYZF4DV8A7WM0H73602 gvfs is not able to access this link gvfs-info "https://d134w4tst3t.s3.amazonaws.com:443/a?Signature=6VJ9%2BAdPVZ4Z7NnPShRvtDsLofc%3D&Expires=1249330377&AWSAccessKeyId=0EYZF4DV8A7WM0H73602" Error getting info: HTTP Client Error: Forbidden Identical results when doing the same through (py)gio import gio f = gio.File("https://d134w4tst3t.s3.amazonaws.com:443/a?Signature=6VJ9%2BAdPVZ4Z7NnPShRvtDsLofc%3D&Expires=1249330377&AWSAccessKeyId=0EYZF4DV8A7WM0H73602") f.query_info("standard::*") <class 'gio.Error'>: HTTP Client Error: Forbidden Works fine when clicking the link in firefox
amazon.com does not support HEAD requests.
Has anyone looked at this?
The tests/get in libsoup seems to work (though it'd be easier to check if the file contained some data), so this would probably be a gvfs issue.
A bit of debug: $ GVFS_HTTP_DEBUG=all /usr/libexec/gvfsd-http uri="https://d134w4tst3t.s3.amazonaws.com/a?Signature=6VJ9%2BAdPVZ4Z7NnPShRvtDsLofc%3D&Expires=1249330377&AWSAccessKeyId=0EYZF4DV8A7WM0H73602" setting 'uri' to 'https://d134w4tst3t.s3.amazonaws.com/a?Signature=6VJ9%2BAdPVZ4Z7NnPShRvtDsLofc%3D&Expires=1249330377&AWSAccessKeyId=0EYZF4DV8A7WM0H73602' Added new job source 0x1ad9020 (GVfsBackendHttp) Queued new job 0x1ad90a0 (GVfsJobMount) + try_mount: https://d134w4tst3t.s3.amazonaws.com/a?Signature=6VJ9%2BAdPVZ4Z7NnPShRvtDsLofc%3D&Expires=1249330377&AWSAccessKeyId=0EYZF4DV8A7WM0H73602 send_reply, failed: 0 register_mount_callback, mount_reply: 0x1ad1500, error: (nil) backend_dbus_handler org.gtk.vfs.Mount:QueryInfo Queued new job 0x1adc8d0 (GVfsJobQueryInfo) > HEAD /a?Signature=6VJ9%2BAdPVZ4Z7NnPShRvtDsLofc%3D&Expires=1249330377&AWSAccessKeyId=0EYZF4DV8A7WM0H73602 HTTP/1.1 > Soup-Debug-Timestamp: 1219406742 > Soup-Debug: SoupSessionAsync 1 (0x1adc830), SoupMessage 1 (0x1adc970), SoupSocket 1 (0x1add0b0) > Host: d134w4tst3t.s3.amazonaws.com > User-Agent: gvfs/0.2.5 < HTTP/1.1 403 Forbidden < Soup-Debug-Timestamp: 1219406743 < Soup-Debug: SoupMessage 1 (0x1adc970) < x-amz-request-id: 30B1C4D3A24E62A9 < x-amz-id-2: JTRKS1c8jZtxFJ0NmqV/YHWbc+54AmHWuxDWTWebv/4LI8A+7RJR46saCwTYNWaF < Content-Type: application/xml < Transfer-Encoding: chunked < Date: Fri, 22 Aug 2008 12:05:42 GMT < Server: AmazonS3 send_reply(0x1adc8d0), failed=1 (HTTP Client Error: Forbidden) As Alex mentioned, you can't do a gvfs-info on it as S3 doesn't support seem to support HEAD.
(In reply to comment #3) > The tests/get in libsoup seems to work (though it'd be easier to check if the > file contained some data), so this would probably be a gvfs issue. Here's a link to a file which contains data (4 bytes, 'asd\n'): https://d134w4tst3t.s3.amazonaws.com:443/b?Signature=ttakwvULWoghvDf5WfaeaJmOHlw%3D&Expires=1250516229&AWSAccessKeyId=0EYZF4DV8A7WM0H73602
(In reply to comment #1) > amazon.com does not support HEAD requests. Doesn't appear to be true: http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index.html?RESTObjectHEAD.html Googling for a random s3.amazonaws.com URL turns up http://s3.amazonaws.com/apache.3cdn.net/3e5b3bfa1c1718d07f_6rm6bhyc4.pdf which responds to HEAD just fine. However, the documentation for expiring URLs (http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index.html?RESTAuthentication.html) says that they are "suitable only for simple object GET requests"... So I'm guessing that this is a bug in S3, and they're forgetting to allow HEAD as well. Or more likely, they're checking the Signature and finding that it doesn't match, because the StringToSign used "GET" as the method, and the actual request used "HEAD". If you create a new signed URL using "HEAD" instead of "GET" in the StringToSign, does that make it possible to do HEAD but not GET? (You can do a HEAD request to a URL by using "curl -I URL".)
> If you create a new signed URL using "HEAD" instead > of "GET" in the StringToSign, does that make it possible to do HEAD but not > GET? (You can do a HEAD request to a URL by using "curl -I URL".) I have written an e-mail to Amazon describing the problem. I do not have time at the moment to construct the request manually.
Is there any chance that this might be fixed before GNOME 2.24, I would like to be able to support Amazon S3 in Conduit this cycle.
John Stowers: You might want to ping the #nautilus irc channel too.
BTW, I have posted this issue to AWS Developer Connection[1] -- unfortunately no reaction from an AWS engineer so far! [1] http://developer.amazonwebservices.com/connect/thread.jspa?threadID=24304
(In reply to comment #8) > Is there any chance that this might be fixed before GNOME 2.24, I would like to > be able to support Amazon S3 in Conduit this cycle. There's not a lot gvfs can do here. You're trying to do a query_info, and at the moment, S3 returns nonsensical information in response to query_info. The only way this could be "fixed" at the gvfs level would be to have it check the URI to see if it looked like an S3 expiring URI, and then do a GET instead of a HEAD in that case, but throw away the response body. Which would suck. AFAICT, the bug is on Amazon's side, and the right fix is for them to fix it, at which point gvfs will automatically start working correctly without needing any changes. (Note that other operations, eg "gvfs-cat", work fine; it's only query_info, and anything that depends on it, that's broken.)
> > (Note that other operations, eg "gvfs-cat", work fine; it's only query_info, > and anything that depends on it, that's broken.) > Interesting. So I could theoretically work around this by using python urllib to get the size and mtime, and just using g_file_copy to copy the file?
(In reply to comment #12) > Interesting. So I could theoretically work around this by using python urllib > to get the size and mtime, and just using g_file_copy to copy the file? Yes, it looks like that would work; call urlopen() to open the URL, then call info() on the returned file, check the size and mtime, and if you don't want to download the rest of the message body, you can close the file. You can't do this asynchronously though... Another possibility would be to just not check the mtime in this case; in the S3 backend, if g_file_query_info() returns "forbidden", then just re-download the file unconditionally.
bug 596615 shows another example of a file server allowing GET but not HEAD. bug 598505 has a patch to let you do g_file_input_stream_query_info() after starting a g_file_read(), although that wouldn't really be great here since you don't want to start a read until after seeing the info...
There is another report of a broken server sending 404 on HEAD but working fine on GET: bug #547020
Args, I mean bug #bug 601776.
*** Bug 634335 has been marked as a duplicate of this bug. ***
From the analysis of bug 634335: IMDB (http://www.imdb.com) is another instance of a server responding with NOT_ALLOWED for HEAD but works perfectly with GET. (The other thing mentioned in the bug report, i.e. that gvfs-open doesn't work due to this bug here, seems to work here)
*** Bug 601776 has been marked as a duplicate of this bug. ***
If HEAD is not allowed, then query_info is not going to work...