GNOME Bugzilla – Bug 738168
HTTP redirection broken
Last modified: 2016-09-20 08:15:55 UTC
Seems HTTP redirection was broken when we started to download files in chunks. I'll copy&paste my chat with Dan about this to save time: <zeenix> does libsoup take care of direction for us if we download in chunks? <zeenix> and have soup_message_body_set_accumulate (msg, FALSE) <danw> what do you mean direction? <zeenix> http://people.gnome.org/~zeeshanak/logos/opensuse.svg< <zeenix> http://people.gnome.org/~zeeshanak/logos/opensuse.svg <zeenix> this will redirect you to HTTPS page <danw> oh, REdirection <danw> yes, although you'll get the chunks of both the redirect and the final response <danw> you can call soup_session_would_redirect() on the message to figure out if the session is planning to redirect you, and ignore the chunks of the original message in that case <danw> note that with the stream-based APIs, it skips over the redirect body for you automatically * Notify: mclasen is online (GimpNet). <zeenix> danw: how do i know how many bytes to skip? and what stream-based APIs? <danw> stream-based APIs as in soup_session_send() / soup_session_send_async() <danw> zeenix: just call soup_session_would_redirect(), and if it returns TRUE, ignore that chunk, because it's part of the redirection, not part of the final response <danw> or alternatively, don't connect to got-chunk until you get got-headers, and check would_redirect() from there to decide whether to connect the got-chunk handler or not <zeenix> ah <zeenix> yeah, i like the "new" stream api <zeenix> its easier to use in vala even
I did a manual HTTP request to understand the type of redirection. It's an HTTP 302 response: http://www.wikiwand.com/en/HTTP_302 (Didnt know this one before.)
Created attachment 288529 [details] [review] downloader: Correct documentation comment
Created attachment 288530 [details] [review] downloader: Move downloading to download class The Downloader got rather complex lately. It is time that the Download object takes the part of downloading things while the Downloader only manages several downloads. This widely extends the (now abstract) Download object so that it handles construction of temporary files and the await_download functionality. The actual download is done by the subclasses. This approach allows extending this download system later with other download methods (e.g. samba file shares among others) without changing anything outside the download.vala file.
Review of attachment 288530 [details] [review]: ah sorry... uploaded too early
Created attachment 288531 [details] [review] downloader: Move downloading to download class The Downloader got rather complex lately. It is time that the Download object takes the part of downloading things while the Downloader only manages several downloads. This widely extends the (now abstract) Download object so that it handles construction of temporary files and the await_download functionality. The actual download is done by the subclasses. This approach allows extending this download system later with other download methods (e.g. samba file shares among others) without changing anything outside the download.vala file.
Review of attachment 288531 [details] [review]: ::: src/download.vala @@ +15,3 @@ + public signal void download_failed (GLib.Error error); + + protected Mutex mutex = Mutex (); Why is a mutex needed all of a sudden. I think the first patch should be only about moving neccessary code from Downloader to Download. Also moving Download to another file should be done afterwards in a separate patch. That way it will also be very clear from the patch, what exactly you are moving from Downloader to Download. @@ +147,3 @@ +} + +private class Boxes.FilesystemDownload : Boxes.Download { same applies to having Download as abstract class and having subclasses. You are doing a lot in one patch. @@ +212,3 @@ + + int64 current_num_bytes = 0; + // FIXME: Reduce lambda nesting by splitting out downloading to Download class I guess its now time to get rid of this? :)
Review of attachment 288531 [details] [review]: ::: src/download.vala @@ +15,3 @@ + public signal void download_failed (GLib.Error error); + + protected Mutex mutex = Mutex (); Why is a mutex needed all of a sudden. I think the first patch should be only about moving neccessary code from Downloader to Download. Also moving Download to another file should be done afterwards in a separate patch. That way it will also be very clear from the patch, what exactly you are moving from Downloader to Download. @@ +31,3 @@ + return new FilesystemDownload (remote_file, local_file, cache_file, progress, cancellable); + } + } I don't think this should be here. This should be handled by Downloader. As manager of downloads, it can and should have knowledge of different implementations rather. @@ +147,3 @@ +} + +private class Boxes.FilesystemDownload : Boxes.Download { same applies to having Download as abstract class and having subclasses. You are doing a lot in one patch. @@ +212,3 @@ + + int64 current_num_bytes = 0; + // FIXME: Reduce lambda nesting by splitting out downloading to Download class I guess its now time to get rid of this? :)
Review of attachment 288529 [details] [review]: ::: src/downloader.vala @@ +64,3 @@ * @param remote_file The remote file to download. + * @param cached_paths Array of possible files. The file will be saved as the first given name. The other paths will + * only be used for searching for cached files. "as the first given name" -> "at the first path in the array" I'd rather phrase it as: "Array of possible cached paths. A cached copy is searched in these paths and if one is not found, file is downloaded at the first path in the given array."
Review of attachment 288531 [details] [review]: ::: src/download.vala @@ +31,3 @@ + return new FilesystemDownload (remote_file, local_file, cache_file, progress, cancellable); + } + } I disagree. The problem is probably about the name "Downloader" and "Download". I am currently transforming this into a DownloadManager and a Downloader. The DownloadManager (which is named Downloader here) manages multiple downloads and provides few abstractions like getting os logos and so on while it shouldnt have any knowledge at all about the different implementations of the Downloader (which is the different download classes). GLib.File does the same thing, you have those static methods for creating new File objects for some URIs and I bet they create different implementations then. @@ +147,3 @@ +} + +private class Boxes.FilesystemDownload : Boxes.Download { Working on splitting this up right now. Though it's hard because this was rather a rewrite with the old code in mind than a refactoring. @@ +212,3 @@ + + int64 current_num_bytes = 0; + // FIXME: Reduce lambda nesting by splitting out downloading to Download class I guess...
Review of attachment 288531 [details] [review]: ::: src/download.vala @@ +31,3 @@ + return new FilesystemDownload (remote_file, local_file, cache_file, progress, cancellable); + } + } GLib doesn't have a signletone that apps talk through but rather they directly create instances of File. That is not a good example here. To disagree with me, you need to provide counter arguments and I don't see any.
Due to lack of time, I'd suggest you first get this bug fixed and then do the refactoring. We'd want the former in 3.14 (since its a regression from 3.12) and latter can wait for 3.16.
commit: e4116c5cd0be7b8555e87f02e6dabd87f7b8d4bd downloader,HTTP: Ignore all chunks until redirected In case of HTTP redirection, first chunks we get are of the HTML document we recieve from the original URL we tried to download. Lets ignore those bytes and only save the chunks of the target resource.