Bug 755224 – Pulling with COMMIT_ONLY option ignores depth option

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 755224 - Pulling with COMMIT_ONLY option ignores depth option


Summary:	Pulling with COMMIT_ONLY option ignores depth option


Status:	RESOLVED FIXED

Product:	ostree
Classification:	Infrastructure
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	OSTree maintainer(s)
QA Contact:	OSTree maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-09-18 14:54 UTC by Matthew Barnes
Modified:	2015-09-22 19:27 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Proposed patch (2.55 KB, patch) 2015-09-18 16:28 UTC, Matthew Barnes	none	Details \| Review
Revised patch (2.75 KB, patch) 2015-09-18 18:46 UTC, Matthew Barnes	committed	Details \| Review
Cleanup patch (1.66 KB, patch) 2015-09-21 14:03 UTC, Matthew Barnes	reviewed	Details \| Review
Fetcher deadlock patch (1.62 KB, patch) 2015-09-21 23:36 UTC, Matthew Barnes	reviewed	Details \| Review

Description Matthew Barnes 2015-09-18 14:54:54 UTC

I'm trying to recursively pull ancestor commit objects for a given refspec (not the content, just the metadata).  I do this by passing the COMMIT_ONLY flag as well as depth: -1.  But the recursion doesn't work.

In scan_one_metadata_object_c(), when scanning a commit object with the "is_commit_only" flag set, the function just exits immediately and never calls scan_commit_object() regardless of the depth value.

I guess this is a new use case.  :)

Comment 1 Matthew Barnes 2015-09-18 16:28:48 UTC

Created attachment 311642 [details] [review]
Proposed patch

This fixed it for me.

Comment 2 Colin Walters 2015-09-18 17:20:52 UTC

Review of attachment 311642 [details] [review]:

::: src/libostree/ostree-repo-pull.c
@@ +1006,3 @@
     }
 
+  if (pull_data->gpg_verify && !pull_data->is_commit_only)

Why turn off GPG verification?

@@ +1078,3 @@
+      g_variant_get_child (commit, 7, "@ay", &tree_meta_csum);
+
+      queue_scan_one_metadata_object_c (pull_data, ostree_checksum_bytes_peek (tree_contents_csum),

Hm, I just noticed that we need to verify the result of this is not NULL before passing it down.  It'll just be a NULL deref but still.

(Not a new bug in this patch, just noticed)

Comment 3 Matthew Barnes 2015-09-18 17:43:23 UTC

(In reply to Colin Walters from comment #2)
> Review of attachment 311642 [details] [review] [review]:
> 
> ::: src/libostree/ostree-repo-pull.c
> @@ +1006,3 @@
>      }
>  
> +  if (pull_data->gpg_verify && !pull_data->is_commit_only)
> 
> Why turn off GPG verification?

I was thinking the COMMIT_ONLY flag didn't fetch detached metadata, but now that I'm looking at the pull logic again I think I misread it.

I'll fix that and add NULL checks to the DIR_TREE and DIR_META requests.

Comment 4 Matthew Barnes 2015-09-18 18:46:15 UTC

Created attachment 311652 [details] [review]
Revised patch

Comment 5 Colin Walters 2015-09-18 19:34:43 UTC

Review of attachment 311652 [details] [review]:

One comment, feel free to push after addressing one way or the other, thanks!

::: src/libostree/ostree-repo-pull.c
@@ +1079,3 @@
+      g_variant_get_child (commit, 7, "@ay", &tree_meta_csum);
+
+      if (tree_contents_csum != NULL)

Ah, sorry what I meant was that ostree_checksum_bytes_peek (tree_contents_csum) can return NULL when the length of the checksum is invalid.  (Which mostly can only happen on disk corruption

I can fix this as a separate patch if you prefer.

Comment 6 Matthew Barnes 2015-09-18 19:52:31 UTC

Got'cha.  I'll fix it but I'll split it into separate commits.

Comment 7 Matthew Barnes 2015-09-21 14:03:43 UTC

Created attachment 311757 [details] [review]
Cleanup patch

I'm holding off on pushing this bug fix until I get a intermittent hang figured out, but what do you think about this patch for the followup NULL check?

I noticed quite a few other places where we embed ostree_checksum_bytes_peek() in calls, so this seemed more foolproof.

Comment 8 Colin Walters 2015-09-21 15:29:42 UTC

If we're doing this right we should likely use the existing `ostree_checksum_bytes_peek_validate` so we can report an error - this is going to happen on disk/network corruption and it'd help to see that instead of silently ignoring it.

That said this was an issue I had in the original code, doesn't need to block the other work here.

Comment 9 Matthew Barnes 2015-09-21 23:36:28 UTC

Created attachment 311809 [details] [review]
Fetcher deadlock patch

I figured out why my recursive pulls were getting hung up.

It's kinda sorta related to this bug so I'm tacking it on here.  Explanation in the commit message.  See if you agree.

Comment 10 Colin Walters 2015-09-22 13:45:44 UTC

Review of attachment 311809 [details] [review]:

The only concern I have is that now we're allowing more concurrency while we're processing the HTTP request body, right?  We might be more conservative and just try to decrement the outstanding variable even in error paths?

Comment 11 Matthew Barnes 2015-09-22 14:51:41 UTC

Well, my first question is what are we trying to guard against by imposing a limit on concurrent requests?  (I didn't see that mentioned in the code.)

Also, how long is processing a request body going to take relative to waiting on an HTTP response?  Certainly local I/O should be relatively instantaneous, which would mean the HTTP requests are the concurrency bottleneck.  Or is this another instance of me forgetting to take NFS into account?

Decrementing the variable in error paths would work, but it means we have to make sure we cover ALL possible error paths before finish_stream() or there's still potential for deadlock.  Just decrementing immediately seemed safer to me.

Comment 12 Matthew Barnes 2015-09-22 15:07:43 UTC

(In reply to Matthew Barnes from comment #11)
> Well, my first question is what are we trying to guard against by imposing a
> limit on concurrent requests?  (I didn't see that mentioned in the code.)

I see now the limit is a function of "max-conns-per-host", presumably to not overwhelm the HTTP server.

Comment 13 Colin Walters 2015-09-22 15:36:30 UTC

Untested, WDYT?

diff --git a/src/libostree/ostree-fetcher.c b/src/libostree/ostree-fetcher.c
index 37e0f99..2f00095 100644
--- a/src/libostree/ostree-fetcher.c
+++ b/src/libostree/ostree-fetcher.c
@@ -48,7 +48,9 @@ typedef struct {
 
   SoupRequest *request;
 
-  gboolean is_stream;
+  gboolean is_outstanding : 1;
+  gboolean is_stream : 1;
+  gboolean unused : 30;
   GInputStream *request_body;
   char *out_tmpfile;
   GOutputStream *out_stream;
@@ -81,6 +83,10 @@ pending_uri_free (OstreeFetcherPendingURI *pending)
   if (pending->refcount > 0)
     return;
 
+  /* Decrement the parent's count of outstanding requests */
+  if (pending->is_outstanding)
+    pending->self->outstanding--;
+
   soup_uri_free (pending->uri);
   g_clear_object (&pending->self);
   g_clear_object (&pending->request);
@@ -289,6 +295,7 @@ ostree_fetcher_process_pending_queue (OstreeFetcher *self)
       OstreeFetcherPendingURI *next = g_queue_pop_head (&self->pending_queue);
 
       self->outstanding++;
+      next->is_outstanding = TRUE;
       soup_request_send_async (next->request, next->cancellable,
                                on_request_sent, next);
     }
@@ -322,7 +329,6 @@ finish_stream (OstreeFetcherPendingURI *pending,
   /* Now that we've finished downloading, continue with other queued
    * requests.
    */
-  pending->self->outstanding--;
   ostree_fetcher_process_pending_queue (pending->self);
 
   if (stbuf.st_size < pending->content_length)

Comment 14 Matthew Barnes 2015-09-22 16:29:08 UTC

Looks like that would work.  I think I'd be inclined to take it a step further and track it with a hash table so it's a little more debugging friendly.

Since we're not really doing anything with the "message_to_request" table, we could ditch that AND the "outstanding" counter for a hash table of PendingURI structs that just tracks the things for their entire lifetime.  Call the hash table "outstanding".

Then getting the outstanding request count would be g_hash_table_size(), and the GAsyncResult destroy callback could be something like:

  static void
  request_complete (OstreeFetcherPendingURI *pending)
  {
    g_hash_table_remove (pending->self->outstanding, pending);
    pending_uri_free (pending);
  }

Does that make sense?

Comment 15 Colin Walters 2015-09-22 17:49:10 UTC

(In reply to Matthew Barnes from comment #14)
> Looks like that would work.  I think I'd be inclined to take it a step
> further and track it with a hash table so it's a little more debugging
> friendly.

Yep, sounds good to me!

Comment 16 Matthew Barnes 2015-09-22 19:27:04 UTC

This bug is getting a little messy.  My bad for tacking on what I thought would be a trivial extra patch.

So what I've done is I've pushed the first patch; the actual fix for this bug.

https://git.gnome.org/browse/ostree/commit/?id=ed861609756202cc0a23a85a5fab3d85c880e1c2

The issue with ostree_checksum_bytes_peek() I'm going to let go for the moment, but I wrote a note to myself to follow up on that.

And I'll submit a pull request for the OstreeFetcher changes soon.

So closing this as FIXED.