GNOME Bugzilla – Bug 738016
Shouldn't use incomplete downloads
Last modified: 2016-03-31 13:22:07 UTC
Currently Boxes only checks if cached file exists to decide whether or not to download. This logic fails us in cases when download was not completed. The attached patches fixes this by only entertaining cached file if its size matches that of its remote copy.
Created attachment 287866 [details] [review] downloader: Separate out fetching of cached file Split out fetching of cached file code into a separate private method.
Created attachment 287867 [details] [review] util-app: Overwrite destination on copying Instead of ignoring 'file exists' error, overwrite destination file on copying.
Created attachment 287868 [details] [review] downloader: Add get_http_download_size() Add a private method to query the size of download using HTTP HEAD.
Created attachment 287869 [details] [review] downloader: Check size before declaring cached Instead of only checking for existance of cached/downloaded copy, compare the size of remote file with it's cached copy before declaring the remote file to be already cached. This fixes the issue of Boxes trying to use partially downloaded files.
Created attachment 287870 [details] [review] downloader: Remove redundant check for network availability Now this check is being done when we send out an HTTP HEAD to get the size of download before doing the actual download so no need to check this again at this point anymore.
Review of attachment 287866 [details] [review]: Description is redundant for me. ACK.
Review of attachment 287867 [details] [review]: What doesn't get clear from the log IMO: - why are you doing this? - would be nice to point out that the downloaded is the only user of this function thus we won't get unintentional side effects. Code: From a generic copy_file function I'd expect to get an error if the file already exists. At least a comment above the signature would be nice to avoid surprises in behaviour IMO.
Review of attachment 287867 [details] [review]: All good points, I'll update the commit log. This change is needed since in a later patch, we download even if cached (destination) already exists (if size isn't correct). I wasn't actually sure if this function should always do that. Maybe I should instead add a new FileCopyFlags param that defaults to OVERWRITE?
Review of attachment 287867 [details] [review]: > already exists (if size isn't correct) Ok. What you're saying is: if the user downloads some arbitrary file that just happens to have the same name as the file we want to download - you just go over it and overwrite it without asking. Not good IMO. Having the file_copy function overwriting any target file seems pretty unintuitive to me. This patch by the way lessens the abstraction provided by the file_copy function.
Review of attachment 287867 [details] [review]: > Ok. What you're saying is: if the user downloads some arbitrary file that just happens to have the same name as the file we want to download - you just go over it and overwrite it without asking. Not good IMO. Yes, when you say it in such a generic way, it sounds a lot worse than it really is. Media files are typically given unique names and actually I can't come up with a single example of a media file that doesn't have a unique enough name. Having said that, it would be safer to either confirm from user or simply append something to cached filename if it already exists. Since its usually better to not annoy users with dialogs, I'd go with latter.
Review of attachment 287867 [details] [review]: > Yes, when you say it in such a generic way, Well this is the only case when the overwrite attribute matters. Anyway since we seem to agree, we need a new solution. Something like appending " (1)" (consecutive numbers). Oh and when we speak of that it would be nice to inform the user about what we're doing! We're downloading ISOs and putting them into his Download folder without telling him. I think a sentence like "Downloading media to {path}" wouldnt do any harm.
Review of attachment 287867 [details] [review]: Shame on me I didnt inform the user in my original patches...
Review of attachment 287867 [details] [review]: No, downloaded location is a detail here and I don't think we need to inform the user.
Review of attachment 287868 [details] [review]: Code and log itself looks good. However there's a lot of code duplication with all the error handling and cancellable and so on. It would be way better for maintainability (, code size and probably readability) to have something more abstract for this, though I don't really know how this could be done right now. So ack, reluctantly.
Review of attachment 287867 [details] [review]: Well, I wouldnt take it for granted if I didn't write the code, that this program doesnt use some temporary file for downloading. (Which is my first intuitive thought.) So the major detail is the fact that the file is stored. The minor detail does matter IMO - at least if we begin to append things to the name because the user has an own one.
Review of attachment 287869 [details] [review]: In which cases does Boxes partially download files? And I don't see how this assures that the size of a newly downloaded file will be checked against information provided in the header. ::: src/downloader.vala @@ +130,3 @@ + var cached_file_stream = yield download.cached_file.replace_async (null, + false, + FileCreateFlags.REPLACE_DESTINATION); after our discussion, this line could remain unchanged and we need to check existence before.
Review of attachment 287870 [details] [review]: hm. Looks good per se. But. The network check is well hidden in get_http_download_size(), while the aim of this method does not really imply this network check (at least not there). If someone fiddles later with that thing it might easily happen that this network check vanishes, so I'd feel better having it in an dedicated function which is invoked in download() itself before yielding get_http_download_size() somewhere deep in get_cached_file(). This is way more clearer and cleaner to me.
Review of attachment 287867 [details] [review]: > Well, I wouldnt take it for granted if I didn't write the code, that this program doesnt use some temporary file for downloading. As user? You don't necessarily need to even know and its not temporary, a file is downloaded into your Downloads dir and if you like to do something else with the media, you can access it from there or delete it if you like. > So the major detail is the fact that the file is stored. The minor detail does matter IMO - at least if we begin to append things to the name because the user has an own one. I don't know what you mean here. Whats the minor detail? Anyway, I'm not changing anything w.r.t providing user input about these details here. You can file a separate bug for this after these patches are in (or even before that maybe) and we can discuss there.
Review of attachment 287868 [details] [review]: Well its not a lot of code and in a later patch I remove the network check part from download_from_http() so not so bad IMHO.
Review of attachment 287869 [details] [review]: > In which cases does Boxes partially download files? And I don't see how this assures that the size of a newly downloaded file will be checked against information provided in the header. E.g if you exit Boxes while downloading. Unfortunately cancellation of cancellable is an async process so its not so easy to guarantee all operations are cancelled before Boxes exit. While I have an idea of how that can be solved but there is still the matter of crashes etc.
Review of attachment 287870 [details] [review]: Hmm.. yeah sounds right.
How about downloading the ISO to a .{isoname}~ (or so) file? If download completes we rename it to the real file - thus we can delete (or even reuse?) partially downloaded ISOs and avoid having partially downloaded ISOs in the users filesystem.
(In reply to comment #22) > How about downloading the ISO to a .{isoname}~ (or so) file? If download > completes we rename it to the real file - thus we can delete (or even reuse?) > partially downloaded ISOs and avoid having partially downloaded ISOs in the > users filesystem. Hmm... sounds like a good idea. I'll give it more thought and implement it if I still think so. :)
Created attachment 288066 [details] [review] util-app: Add flags parameter to copy_file() Add a parameter to specify FileCopyFlags.
Created attachment 288067 [details] [review] downloader: Explicitly close output stream This is currently not required but in the following patches, we'll start to first write to a temporary file and for that its important that all output is written to the file, before it can be copied over to the actual (cached) file.
Created attachment 288068 [details] [review] downloader: Download to temporary location first Instead of directly downloading to the cached path, download files first to a temporary location. The filename has a '~' at the end so its obvious that its a not regular file. This fixes the issue of Boxes trying to use partially downloaded files. Currently Boxes doesn't ensure that partially downloaded file is deleted before exitting but even if it did, there will always been crashes.
Review of attachment 288066 [details] [review]: ::: src/util-app.vala @@ +360,3 @@ File dest_file, ActivityProgress progress, + FileCopyFlags flags, I would default it to FileCopyFlags.NONE. This makes the most usual invocation more simpler and the API better IMO.
Review of attachment 288066 [details] [review]: ::: src/util-app.vala @@ +360,3 @@ File dest_file, ActivityProgress progress, + FileCopyFlags flags, Since this simple function has now only one user, I'm not even sure it should exist anymore (at least in utils) so a default value won't help anyone. I might actually move/remove it later.
Review of attachment 288066 [details] [review]: Ok, I'd pledge for removal of the function.
Review of attachment 288067 [details] [review]: ack
Review of attachment 288068 [details] [review]: ack
Review of attachment 288066 [details] [review]: Sure thing. I was hoping to do it after these patches but there's no point in making changes to functions you'd be removing soon.
Review of attachment 288066 [details] [review]: well, the other patches here are good and independend from this one, you can push them then, not?
Created attachment 288122 [details] [review] util-app: Move copy_file() to Downlaoder This function is no longer used by any module other than Downloader so lets move it there as a private helper.
Created attachment 288123 [details] [review] downloader: copy_file -> download_from_filesystem Now that copy_file is a private method of Downloader, it makes sense to change its name and signature to be consistent with its brother, download_from_http().
Created attachment 288124 [details] [review] downloader: Separate out fetching of cached file Split out fetching of cached file code into a separate private method.
Created attachment 288125 [details] [review] downloader: Explicitly close output stream This is currently not required but in the following patches, we'll start to first write to a temporary file and for that its important that all output is written to the file, before it can be copied over to the actual (cached) file.
Created attachment 288126 [details] [review] downloader: Download to temporary location first Instead of directly downloading to the cached path, download files first to a temporary location. The filename has a '~' at the end so its obvious that its a not regular file. This fixes the issue of Boxes trying to use partially downloaded files. Currently Boxes doesn't ensure that partially downloaded file is deleted before exitting but even if it did, there will always been crashes.
Comment on attachment 288066 [details] [review] util-app: Add flags parameter to copy_file() forgot to obsolete this one?
Review of attachment 288122 [details] [review]: ack
Review of attachment 288123 [details] [review]: ack, though I'm tempted to make these download_from_somewhere functions member functions of the Download object. They use nothing but those member variables - and they dont use members from the Downloader so its technically a performance loss because the Downloader object is passed to them implicitly and that isnt needed.
Review of attachment 288123 [details] [review]: oh, forgot: shortlog isnt very clear to me. I'd at least add brackets so it gets clear that copy_file() and download_from_filesystem() are actually member functions.
Review of attachment 288124 [details] [review]: good one. download() is long enought. ack.
Review of attachment 288125 [details] [review]: ack
Review of attachment 288123 [details] [review]: The performance loss isn't significant at all but yeah, download should be handled by Download class, which we should then put in a separate file. I didn't add () because otherwise it doesn't fit in 50 chars and besides them being functions isn't so critical information that it must be presented in shortlog.
Review of attachment 288126 [details] [review]: I think the patch is good. However the downloader grows a bit more chaotic subce there's the whole temporary file thing around it and functions are rather lengthy. I think it would be better if the Download object manages the temporary file. (And there we are again - Download has to grow up so Downloader gets cleaner.) What do you think?
Review of attachment 288123 [details] [review]: > The performance loss isn't significant at all but yeah, download should be > handled by Download class, which we should then put in a separate file. Completely correct. I'm a bit idealistic sometimes about this but I think it also covers up design flaws, like in this case. I think there are actually quite some functions in Boxes that could be static and where it would be worth thinking about where they belong. Not that I could name examples though... > I didn't add () because otherwise it doesn't fit in 50 chars and besides them > being functions isn't so critical information that it must be presented in > shortlog. I thought so. Somehow they make it clearer to me. I had to read the long description until I got that you wanted to rename them but I think its ok then.
Review of attachment 288126 [details] [review]: When the downloading is moved to Download class, all the associated complexity can move with it but I don't think thats a requirement for this patch.
Review of attachment 288126 [details] [review]: in general I agree. I'm just not sure if this patch is good. But I think it is.