GNOME Bugzilla – Bug 706344
add support for resuming downloads
Last modified: 2013-08-28 19:06:21 UTC
When a download fails because of network error/disconnect, or the user shuts down the system or any number of reasons ostree ought to be able to resume the download on the next pull request using existing partial downloads as a starting point. This requires: Pulls to use a standard file location and naming mechanism for all requests so downloads can be resumed. Pull to move the file after it has been completely downloaded to the right location. I've created a patch to do the above, but the test included has an issue: The first run correctly fails since it only gets half of the repo config file. The second run finishes downloading the config file then fails because the ref is only partially downloaded. Then the third run tries to download config again, since it succeeded and was copied into place. This only grabs half of the config file again, which fails since the file is incomplete. One way we could fix this is to have pull fetch every file, then only copy the temporary files into the final location after all files have successfully downloaded. Thoughts?
Created attachment 252317 [details] [review] First attempt patch described in bug description
Note the changes to ot-trivial-httpd.c are needed because otherwise soup sets the content-length to the length of the data buffer we give it, rather than the length of the file.
Review of attachment 252317 [details] [review]: ::: src/libostree/ostree-fetcher.c @@ +136,3 @@ + */ +static gboolean +open_tmpfile (GFile *tmpdir, You don't need to copy this whole code to create a file with a well-known name; gio has several APIs for that. Just use... @@ +319,3 @@ OstreeFetcherPendingURI *pending; GError *local_error = NULL; + char *uristring = soup_uri_to_string (uri, FALSE); You can prepend this with "gs_free ". @@ +328,3 @@ pending->uri = soup_uri_copy (uri); + printf("requesting uri %s\n", uristring); + pending->filename = g_compute_checksum_for_string (G_CHECKSUM_MD5, uristring, strlen (uristring)); We might as well just be consistent and use SHA256; this isn't security sensitive, but why introduce new uses of md5? @@ +336,3 @@ pending->refcount++; + /* TODO - make this async */ + if (!open_tmpfile (pending->self->tmpdir, pending->filename, 0644, So here is where we want to check whether we're resuming. Something like: if (g_file_test (pending->filename, G_FILE_TEST_EXISTS)) { /* Code to handle resume */ } else { if (!g_file_create (pending->filename, 0, cancellable, error)) goto out; /* Code to handle newly created file */ } (Or if you're feeling particularly elite, just use g_file_create() and look whether it returns G_IO_ERROR_EXISTS, the equivalent of EEXIST, so we only do it in one system call) Oh, I see how your code works now; you changed the open call to have O_APPEND and dropped the O_EXCL. So this API call could just be g_file_append_to() then. @@ +342,3 @@ + g_simple_async_result_take_error (pending->result, local_error); + g_simple_async_result_complete (pending->result); + return; If you were using the gs_free, then the compiler would also g_free (uristring) here for you. As is it leaks. @@ +368,3 @@ soup_request_send_async (pending->request, cancellable, on_request_sent, pending); + g_free (uristring); If you used gs_free above, then this line could be dropped. ::: src/ostree/ot-builtin-trivial-httpd.c @@ +121,2 @@ static void +close_socket (SoupMessage *msg, gpointer user_data) I'm assuming danw said this is the best way to do things? Seems fine if so. But can you break this out as a separate patch like "trivial-httpd: Forcibly close connection for partial requests" ::: tests/test-pull-resume.sh @@ +1,3 @@ +#!/bin/bash +# +# Copyright (C) 2011,2013 Colin Walters <walters@verbum.org> You should add the Collabora copyright here too for new files. You can keep my copyright if you want, or delete it (since it's not much code shared), or say "Based on test-pull by Colin Walters" as Javier did in https://bugzilla.gnome.org/show_bug.cgi?id=706370
Created attachment 252810 [details] [review] Updated patch with updated test case, still not quite there.
Created attachment 252970 [details] [review] Updated test case which usually passes This test mostly passes, but sometimes hits a bug when pulling and then misses two files. It also sometimes hits a crash in libsoup which kills the trivial-httpd server and thus no more fetches work. I'll attach all three logs, working, pull fail, and server crash in case anyone else can make more sense of it than I have so far.
Created attachment 252971 [details] Working test log
Created attachment 252972 [details] Test log when pull fails to fetch two files
Created attachment 252973 [details] Test log when httpd server crashes in libsoup. I've added debug messages and compared this log to the working log, both are giving libsoup a buffer with length 40 when it's crashing in one case, but working in the other with an assertion that the buffer length is 0.
Created attachment 252981 [details] [review] Updated patch that fixes libsoup server crash by checking range requests Since libsoup doesn't check for range requests that are out of bounds for the buffer it is given (until a recent commit that will be in libsoup 2.43.6 but ostree only depends on 2.39.1) I've added a check for out of range requests in trivial-httpd. Once the code is reviewed I'll split it into two patches, one for trivial-httpd changes and another for the rest.
Review of attachment 252981 [details] [review]: ::: src/libostree/ostree-fetcher.c @@ +179,2 @@ static void +rename_part_file (OstreeFetcherPendingURI *pending) Since this function calls API which can throw errors, it should also throw. That means it should return a gboolean, accept GCancellable/GError. @@ +181,3 @@ +{ + GError *local_error = NULL; + gchar *tempfilename = g_file_get_path (pending->tmpfile); Needs gs_free(). Or alternatively, use my helper function in libgsystem: gs_file_get_path_cached(). @@ +184,3 @@ + + // Only rename files that end in .part + if (g_strrstr (tempfilename, partsuffix) != NULL) I'm a bit confused why we do this; aren't callers of this function in a position to know whether or not it's a .part file? @@ +186,3 @@ + if (g_strrstr (tempfilename, partsuffix) != NULL) + { + gchar *finalname = g_strconcat (pending->filename, donesuffix, NULL); Needs gs_free @@ +187,3 @@ + { + gchar *finalname = g_strconcat (pending->filename, donesuffix, NULL); + GFile *donefile = g_file_get_child (pending->self->tmpdir, finalname); I'd recommend a gs_unref_object here. @@ +198,3 @@ + { + g_object_unref (pending->tmpfile); + pending->tmpfile = donefile; ...and then make this: pending->tmpfile = g_object_ref (donefile); @@ +199,3 @@ + g_object_unref (pending->tmpfile); + pending->tmpfile = donefile; + } We need to handle errors here; look at the rest of the code. This should read something like: if (!g_file_move ()) goto out; ret = TRUE; out: return ret; Although, are we attempting to do this only if the file exists? In that case we may need to check for the specific error G_IO_ERROR_NOT_FOUND (or you can just g_file_query_exists() beforehand). @@ +312,3 @@ + gs_free char *uristring = soup_uri_to_string (uri, FALSE); + gs_free char *finalname; + gs_free char *downloadname; These need to be initialized with NULL; otherwise if we exit early we'll be calling g_free() on random data from the stack. @@ +347,3 @@ + else + { + if (g_strrstr (uristring, "objects") ) We'll need to abstract this and make it a bit more precise; something like ostree_fetcher_uri_is_object() which would have to break apart the string and get just the path component. In other words we don't want to have this behave differently if the domain name contains "objects", like www.ostreeobjects.com. @@ +349,3 @@ + if (g_strrstr (uristring, "objects") ) + { + pending->tmpfile = g_file_get_child (pending->self->tmpdir, downloadname); This leaks the previous tmpfile that was assigned in the struct above; need to do g_object_unref (pending->tmpfile) first. ::: src/libostree/ostree-repo-pull.c @@ +404,3 @@ out: +// if (tmpf) +// (void) unlink (gs_file_get_path_cached (tmpf)); Ultimately I think the cleanest fix is going to be pushing the behaviour difference between objects/ versus other things into the fetcher API itself, like: ostree_fetcher_request_uri_with_partial_async () - Does the .part/.done thing ostree_fetcher_request_uri_async() - Just tries to download in one go, fails hard if we only get partial content Then we wouldn't need to look for "objects" in the URI string in the fetcher code, which is intended to just be a plain HTTP wrapper. ::: tests/test-pull-resume.sh @@ +35,3 @@ +${CMD_PREFIX} ostree --repo=repo remote add origin $(cat httpd-address)/ostree/gnomerepo + +for i in {1..50} Did it work to change the 50 here to something like: find ${repopath}/objects | wc -l ? Or is this working around the other bug?
Created attachment 253174 [details] [review] Small changes to trivial-httpd so content-length will be expected filesize
Created attachment 253175 [details] [review] Add support for pull resume
Created attachment 253176 [details] [review] New test to check pull resume functionality
Created attachment 253187 [details] [review] Remove temporary files once pull is done.
Review of attachment 253187 [details] [review]: Can you make the commit message something like: repo: Clean up temporaries after a transaction completes Prevously, we were just leaving temporary files there forever if a transaction was interrupted. ::: src/libostree/ostree-repo.c @@ +998,3 @@ + + nlinks = g_file_info_get_attribute_uint32 (file_info, "unix::nlink"); + if (nlinks == 1) We don't need to check the number of links; that is a special trick for the uncompressed object cache garbage collector. So you should be able to remove this conditional.
Review of attachment 253175 [details] [review]: Is this an old patch? I thought we'd decided to do the separate fetcher API? Also: * The trivial-httpd changes already landed, no? * There's various memory leaks and such that I commented on before * Leftover debug printfs
Review of attachment 253176 [details] [review]: Looks OK except the commit message is all on one line. Something like: tests: Add test for pull resume To implement this, pass --force-range-requests into the trivial-httpd, which will only serve half of the objects to clients at a time.
Created attachment 253263 [details] [review] Updated patch after review.
Created attachment 253264 [details] [review] Updated patch after review.
Review of attachment 253264 [details] [review]: Two quick comments before I head out for a bit...I will look at this more later. ::: src/libostree/ostree-fetcher.c @@ +266,3 @@ + // We already have the whole file, so just use it. + pending->state = OSTREE_FETCHER_STATE_COMPLETE; + rename_part_file (pending, NULL, &local_error); Should be if (!rename_part_file ...) and handle errors, now that it returns a gboolean. Handling errors in async code is a pain for sure =/ @@ +318,3 @@ + pending->refcount = 1; + pending->self = g_object_ref (self); + pending->uri = soup_uri_copy (uri); Is there a way we can avoid duplicating the internals of these functions? Maybe: request_uri_internal (..., gboolean partial...) ?
Created attachment 253324 [details] [review] Updated again after review
Created attachment 253428 [details] [review] Updated patch with commented lines removed and commit message fixed
Review of attachment 253428 [details] [review]: Ok, let's go with this; it looks roughly correct to me. There are some things we could simply in the future (like using one directory for a pull, and delete that in one go), but that can come later.