GNOME Bugzilla – Bug 762973
Faster access to summary file
Last modified: 2018-08-17 18:59:32 UTC
I have a lot of branches in xdg-app repos (mostly because each locale gets its own branch for the locale data). This makes the summary file pretty large. It would be pretty easy for ostree to look for a compressed version of the summary file, which would halp a lot: -rw-r--r--. 1 alex wheel 597250 27 feb 12.50 summary -rw-r--r--. 1 alex wheel 268152 2 mar 10.08 summary.xz This is nice, because *every* ostree operation currently starts with downloading the summary. Another nice thing would be to have a local copy of the summary file for each remote, so that you can start by just doing a etag comparison and avoid having to download the thing every time if it did not change.
When I was writing the summary code I thought about this...but making things fully generic I think would involve designing basically a metadata filesystem format. One could imagine using the ostree format itself to look up metadata for the target trees, i.e. the summary is itself a tree. The thing is most people doing stuff in this domain end up with a smart server, but I'm trying really hard to avoid that. Needs a bit of research. As far as etags, yes, we definitely need to do that.
Reminder that it's likely Endless wants to remove the summary for xdg-app (and definitely other distribution methods) as well, because it clearly won't scale if we use it for user-published data, like Steam / other centralized app stores. xz is likely too slow for real-time systems, as well.
I think there are two issues with the summary atm. First of all, ostree *always* uses it. If we're resolving a single known ref name, there is no need to download the entire summary, it could just read that ref file. Reading the summary should only be done one operations that list all references (such as a mirroring pull). Secondly, when you do need it, it is too large. For instance, in http://sdk.gnome.org/nightly/repo/ we can see the summary file is 2.3 meg. We need to not download this unnecessary, and do whatever we can to make it smaller. Why is it so big anyway? That repo has 2222 branches, which comes out to 1085 bytes per branch. The name and ref should not be *that* large. I guess it is the per-ref metadata? What is in there?
That's a lot more branches that I personally conceived of a single OSTree repository having originally. The extra data is all available static deltas. Which itself is a scaling problem, but not too much for the use cases I had envisioned originally for base OS trees. I am willing to work on scaling the summary file - but I need to do some research for it. My instinct is to have something very similar to the core OSTree format that we use for metadata (this time with object sizes). There *has* to be some prior art here. One that comes to mind is https://github.com/ipfs/ipfs which basically is a block store - then one can build a filesystem on top. The downside of something like this AIUI is storage complexity, e.g. implementing "prune". One alternative is to just do what git does and create a text file directory listing in refs/ (and I guess deltas/) to solve the enumerate problem, but not the other parts. (The problem with this is that for metalinks we still need a single-file target, so the summary file would have to cover the child directory listings)
The reason there are so many branches is that each app has broken out locales to separately installable parts.
while it won't solve the problem completely, what do you think of this optimization? It caches both summary and summary.sig (so for this to work gpg is required). The pull code will then use the cached summary if summary.sig is not changed, which is much smaller to download than the summary file itself. If you like it, I can clean it up and fix other places as well as repo_remote_fetch_summary. Is "refs/remotes/.%s.summary" the right place to cache the summary file? diff --git a/src/libostree/ostree-repo-pull.c b/src/libostree/ostree-repo-pull.c index 0be1b38..885144b 100644 --- a/src/libostree/ostree-repo-pull.c +++ b/src/libostree/ostree-repo-pull.c @@ -1706,6 +1706,31 @@ process_one_static_delta (OtPullData *pull_data, } static gboolean +openat_allow_noent (int dfd, + const char *path, + int *fd, + GCancellable *cancellable, + GError **error) +{ + GError *temp_error = NULL; + + if (!gs_file_openat_noatime (dfd, path, fd, + cancellable, &temp_error)) + { + if (g_error_matches (temp_error, G_IO_ERROR, G_IO_ERROR_NOT_FOUND)) + { + *fd = -1; + g_clear_error (&temp_error); + } + else + { + g_propagate_error (error, temp_error); + return FALSE; + } + } + return TRUE; +} +static gboolean validate_variant_is_csum (GVariant *csum, GError **error) { @@ -1995,15 +2020,77 @@ ostree_repo_pull_with_options (OstreeRepo *self, g_autoptr(GVariant) refs = NULL; g_autoptr(GVariant) deltas = NULL; g_autoptr(GVariant) additional_metadata = NULL; - - if (!pull_data->summary) + + if (!pull_data->summary_data_sig) + { + uri = suburi_new (pull_data->base_uri, "summary.sig", NULL); + if (!fetch_uri_contents_membuf_sync (pull_data, uri, FALSE, TRUE, + &bytes_sig, cancellable, error)) + goto out; + soup_uri_free (uri); + } + + if (bytes_sig) + { + g_autofree char *summary_sig_ref = g_strdup_printf ("refs/remotes/.%s.summary.sig", pull_data->remote_name); + glnx_fd_close int prev_fd = -1; + g_autoptr(GBytes) old_contents = NULL; + + if (!openat_allow_noent (self->repo_dir_fd, summary_sig_ref, &prev_fd, cancellable, error)) + goto out; + + if (prev_fd >= 0) + { + old_contents = glnx_fd_readall_bytes (prev_fd, cancellable, error); + if (!old_contents) + goto out; + } + + if (!glnx_file_replace_contents_at (self->repo_dir_fd, + summary_sig_ref, + g_bytes_get_data (bytes_sig, NULL), + g_bytes_get_size (bytes_sig), + self->disable_fsync ? GLNX_FILE_REPLACE_NODATASYNC : GLNX_FILE_REPLACE_DATASYNC_NEW, + cancellable, error)) + goto out; + + if (old_contents && g_bytes_compare (old_contents, bytes_sig) == 0) + { + glnx_fd_close int summary_fd = -1; + g_autofree char *summary_file = g_strdup_printf ("refs/remotes/.%s.summary", pull_data->remote_name); + if (!openat_allow_noent (self->repo_dir_fd, summary_file, &summary_fd, cancellable, error)) + goto out; + + if (summary_fd >= 0) + { + bytes_summary = glnx_fd_readall_bytes (summary_fd, cancellable, error); + if (!bytes_summary) + goto out; + } + } + } + + if (!pull_data->summary && !bytes_summary) { uri = suburi_new (pull_data->base_uri, "summary", NULL); if (!fetch_uri_contents_membuf_sync (pull_data, uri, FALSE, TRUE, &bytes_summary, cancellable, error)) goto out; soup_uri_free (uri); - } + + if (bytes_summary) + { + g_autofree char *summary_ref = g_strdup_printf ("refs/remotes/.%s.summary", pull_data->remote_name); + + if (!glnx_file_replace_contents_at (self->repo_dir_fd, + summary_ref, + g_bytes_get_data (bytes_summary, NULL), + g_bytes_get_size (bytes_summary), + self->disable_fsync ? GLNX_FILE_REPLACE_NODATASYNC : GLNX_FILE_REPLACE_DATASYNC_NEW, + cancellable, error)) + goto out; + } + } if (!bytes_summary && pull_data->gpg_verify_summary) { @@ -2012,6 +2099,8 @@ ostree_repo_pull_with_options (OstreeRepo *self, goto out; } + if (bytes_summary) + if (!bytes_summary && require_static_deltas) { g_set_error (error, G_IO_ERROR, G_IO_ERROR_FAILED, @@ -2019,15 +2108,6 @@ ostree_repo_pull_with_options (OstreeRepo *self, goto out; } - if (bytes_summary) - { - uri = suburi_new (pull_data->base_uri, "summary.sig", NULL); - if (!fetch_uri_contents_membuf_sync (pull_data, uri, FALSE, TRUE, - &bytes_sig, cancellable, error)) - goto out; - soup_uri_free (uri); - } - if (!bytes_sig && pull_data->gpg_verify_summary) { g_set_error (error, G_IO_ERROR, G_IO_ERROR_FAILED,
Caching in refs/remotes/ is OK by me. The mildly ugly thing about this is that the only way to delete it would be by hand, unless we added a command line/API. That issue might argue a bit more for keeping it in repo/tmp/summaries/${remotename} or something? (We do have code to prune repo/tmp which should get enhanced)
yes, that is surely a better place to store the cached summary file
PR proposed here: https://github.com/GNOME/ostree/pull/203
OSTree has moved to Github a while ago. Furthermore, GNOME Bugzilla will be shut down and replaced by gitlab.gnome.org. If the problem reported in this Bugzilla ticket is still valid, please report it to https://github.com/ostreedev/ostree/issues instead. Thank you! Closing this report as WONTFIX as part of Bugzilla Housekeeping.