GNOME Bugzilla – Bug 755224
Pulling with COMMIT_ONLY option ignores depth option
Last modified: 2015-09-22 19:27:51 UTC
I'm trying to recursively pull ancestor commit objects for a given refspec (not the content, just the metadata). I do this by passing the COMMIT_ONLY flag as well as depth: -1. But the recursion doesn't work. In scan_one_metadata_object_c(), when scanning a commit object with the "is_commit_only" flag set, the function just exits immediately and never calls scan_commit_object() regardless of the depth value. I guess this is a new use case. :)
Created attachment 311642 [details] [review] Proposed patch This fixed it for me.
Review of attachment 311642 [details] [review]: ::: src/libostree/ostree-repo-pull.c @@ +1006,3 @@ } + if (pull_data->gpg_verify && !pull_data->is_commit_only) Why turn off GPG verification? @@ +1078,3 @@ + g_variant_get_child (commit, 7, "@ay", &tree_meta_csum); + + queue_scan_one_metadata_object_c (pull_data, ostree_checksum_bytes_peek (tree_contents_csum), Hm, I just noticed that we need to verify the result of this is not NULL before passing it down. It'll just be a NULL deref but still. (Not a new bug in this patch, just noticed)
(In reply to Colin Walters from comment #2) > Review of attachment 311642 [details] [review] [review]: > > ::: src/libostree/ostree-repo-pull.c > @@ +1006,3 @@ > } > > + if (pull_data->gpg_verify && !pull_data->is_commit_only) > > Why turn off GPG verification? I was thinking the COMMIT_ONLY flag didn't fetch detached metadata, but now that I'm looking at the pull logic again I think I misread it. I'll fix that and add NULL checks to the DIR_TREE and DIR_META requests.
Created attachment 311652 [details] [review] Revised patch
Review of attachment 311652 [details] [review]: One comment, feel free to push after addressing one way or the other, thanks! ::: src/libostree/ostree-repo-pull.c @@ +1079,3 @@ + g_variant_get_child (commit, 7, "@ay", &tree_meta_csum); + + if (tree_contents_csum != NULL) Ah, sorry what I meant was that ostree_checksum_bytes_peek (tree_contents_csum) can return NULL when the length of the checksum is invalid. (Which mostly can only happen on disk corruption I can fix this as a separate patch if you prefer.
Got'cha. I'll fix it but I'll split it into separate commits.
Created attachment 311757 [details] [review] Cleanup patch I'm holding off on pushing this bug fix until I get a intermittent hang figured out, but what do you think about this patch for the followup NULL check? I noticed quite a few other places where we embed ostree_checksum_bytes_peek() in calls, so this seemed more foolproof.
If we're doing this right we should likely use the existing `ostree_checksum_bytes_peek_validate` so we can report an error - this is going to happen on disk/network corruption and it'd help to see that instead of silently ignoring it. That said this was an issue I had in the original code, doesn't need to block the other work here.
Created attachment 311809 [details] [review] Fetcher deadlock patch I figured out why my recursive pulls were getting hung up. It's kinda sorta related to this bug so I'm tacking it on here. Explanation in the commit message. See if you agree.
Review of attachment 311809 [details] [review]: The only concern I have is that now we're allowing more concurrency while we're processing the HTTP request body, right? We might be more conservative and just try to decrement the outstanding variable even in error paths?
Well, my first question is what are we trying to guard against by imposing a limit on concurrent requests? (I didn't see that mentioned in the code.) Also, how long is processing a request body going to take relative to waiting on an HTTP response? Certainly local I/O should be relatively instantaneous, which would mean the HTTP requests are the concurrency bottleneck. Or is this another instance of me forgetting to take NFS into account? Decrementing the variable in error paths would work, but it means we have to make sure we cover ALL possible error paths before finish_stream() or there's still potential for deadlock. Just decrementing immediately seemed safer to me.
(In reply to Matthew Barnes from comment #11) > Well, my first question is what are we trying to guard against by imposing a > limit on concurrent requests? (I didn't see that mentioned in the code.) I see now the limit is a function of "max-conns-per-host", presumably to not overwhelm the HTTP server.
Untested, WDYT? diff --git a/src/libostree/ostree-fetcher.c b/src/libostree/ostree-fetcher.c index 37e0f99..2f00095 100644 --- a/src/libostree/ostree-fetcher.c +++ b/src/libostree/ostree-fetcher.c @@ -48,7 +48,9 @@ typedef struct { SoupRequest *request; - gboolean is_stream; + gboolean is_outstanding : 1; + gboolean is_stream : 1; + gboolean unused : 30; GInputStream *request_body; char *out_tmpfile; GOutputStream *out_stream; @@ -81,6 +83,10 @@ pending_uri_free (OstreeFetcherPendingURI *pending) if (pending->refcount > 0) return; + /* Decrement the parent's count of outstanding requests */ + if (pending->is_outstanding) + pending->self->outstanding--; + soup_uri_free (pending->uri); g_clear_object (&pending->self); g_clear_object (&pending->request); @@ -289,6 +295,7 @@ ostree_fetcher_process_pending_queue (OstreeFetcher *self) OstreeFetcherPendingURI *next = g_queue_pop_head (&self->pending_queue); self->outstanding++; + next->is_outstanding = TRUE; soup_request_send_async (next->request, next->cancellable, on_request_sent, next); } @@ -322,7 +329,6 @@ finish_stream (OstreeFetcherPendingURI *pending, /* Now that we've finished downloading, continue with other queued * requests. */ - pending->self->outstanding--; ostree_fetcher_process_pending_queue (pending->self); if (stbuf.st_size < pending->content_length)
Looks like that would work. I think I'd be inclined to take it a step further and track it with a hash table so it's a little more debugging friendly. Since we're not really doing anything with the "message_to_request" table, we could ditch that AND the "outstanding" counter for a hash table of PendingURI structs that just tracks the things for their entire lifetime. Call the hash table "outstanding". Then getting the outstanding request count would be g_hash_table_size(), and the GAsyncResult destroy callback could be something like: static void request_complete (OstreeFetcherPendingURI *pending) { g_hash_table_remove (pending->self->outstanding, pending); pending_uri_free (pending); } Does that make sense?
(In reply to Matthew Barnes from comment #14) > Looks like that would work. I think I'd be inclined to take it a step > further and track it with a hash table so it's a little more debugging > friendly. Yep, sounds good to me!
This bug is getting a little messy. My bad for tacking on what I thought would be a trivial extra patch. So what I've done is I've pushed the first patch; the actual fix for this bug. https://git.gnome.org/browse/ostree/commit/?id=ed861609756202cc0a23a85a5fab3d85c880e1c2 The issue with ostree_checksum_bytes_peek() I'm going to let go for the moment, but I wrote a note to myself to follow up on that. And I'll submit a pull request for the OstreeFetcher changes soon. So closing this as FIXED.