After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 721799 - update deltas
update deltas
Status: RESOLVED FIXED
Product: ostree
Classification: Infrastructure
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: OSTree maintainer(s)
OSTree maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2014-01-08 16:04 UTC by Colin Walters
Modified: 2016-06-23 22:32 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
WIP: Static deltas (115.57 KB, patch)
2014-06-23 12:50 UTC, Colin Walters
none Details | Review
Fix ostree_repo_list_static_delta_names (3.16 KB, patch)
2015-01-13 20:39 UTC, Alexander Larsson
committed Details | Review
Allow creating and pulling static deltas starting from "empty" (13.61 KB, patch)
2015-01-13 20:39 UTC, Alexander Larsson
committed Details | Review
deltas: Do not traverse all objects, only both commits (1.34 KB, patch)
2015-01-14 02:33 UTC, Colin Walters
committed Details | Review
Fix ostree_repo_list_static_delta_names (3.03 KB, patch)
2015-01-14 07:38 UTC, Alexander Larsson
committed Details | Review
Allow creating and pulling static deltas starting from "empty" (13.11 KB, patch)
2015-01-14 07:38 UTC, Alexander Larsson
accepted-commit_now Details | Review
static delta generation: Separate max chunk size from min fallback size (5.83 KB, patch)
2015-01-20 13:21 UTC, Alexander Larsson
committed Details | Review

Description Colin Walters 2014-01-08 16:04:27 UTC
See: https://mail.gnome.org/archives/ostree-list/2013-July/msg00005.html 

Some work lives here:

https://git.gnome.org/browse/ostree/commit/?h=wip/delta

Prior art:

Android: http://stackoverflow.com/questions/12860938/smart-app-updates-on-google-play-store-how-does-it-work

(Basically: Google proprietary, not sure if there are any real docs on it)

Steam:
https://developer.valvesoftware.com/wiki/SteamPipe
https://developer.valvesoftware.com/wiki/GCF

(File-containing-a-filesystem; not a good match for OSTree which is based on immutable objects + Unix hardlinks)

RPM:
http://freecode.com/projects/deltarpm

(My comment: bsdiff is too CPU intensive)
Comment 1 Jasper St. Pierre (not reading bugmail) 2014-01-08 18:27:15 UTC
(In reply to comment #0)
> Steam:
> https://developer.valvesoftware.com/wiki/SteamPipe
> https://developer.valvesoftware.com/wiki/GCF
>
> (File-containing-a-filesystem; not a good match for OSTree which is based on
> immutable objects + Unix hardlinks)

As someone who's reverse engineered Steam a bit (and wrote a library for parsing GCF files at one point), no, SteamPipe is an entirely new system. The disadvantage of GCF files is that games often have their own packfiles, so putting packfiles in a packfile isn't great. They also caused fragmentation since they were a filesystem with chunks and clusters and so on.

Details of the new system are a bit scarce, and Valve is still updating it, but here's a relevant post about it (and about trying to minimize the size of patches):

  http://the-witness.net/news/2012/08/fun-with-package-sorting/

It splits all files submitted into 1MB chunks, gives each of those chunks an ID, and only downloads chunks that you don't have, and reassembles new pieces on the client. It can obviously go very wrong if we end up caring about bundles/packfiles, unless we give gresource files the ability to align file starts to 1MB boundaries.

But the fundamental idea of the new system isn't bad. It's basically a centralized BitTorrent.
Comment 2 Colin Walters 2014-01-08 20:53:36 UTC
(In reply to comment #1)

> It splits all files submitted into 1MB chunks, gives each of those chunks an
> ID, and only downloads chunks that you don't have, and reassembles new pieces
> on the client. It can obviously go very wrong if we end up caring about
> bundles/packfiles, unless we give gresource files the ability to align file
> starts to 1MB boundaries.

Interesting, thanks for the link.  So the app author here ended up having to optimize for the implementation details of Steam, which is the ugly part.

Any implementation of deltas OSTree side is going to be less than optimal for these sorts of containers, unless it gains explicit awareness.

It's not out of the question for OSTree's static delta compiler to explicitly unpack ELF binaries and look for GResource.  But it feels to me like it'd be better for apps that have large and/or frequently changing resources to split them out into separate files.
Comment 3 Colin Walters 2014-01-21 11:54:36 UTC
Something else nice to support here would be "null deltas".  The use case here is when you're installing an entirely new OS, 10000 HTTP requests for even a Fedora "@core" tree is pretty painful.  We could do a lot better with a null delta because it would at least compress similar objects together.
Comment 4 Colin Walters 2014-02-05 09:15:16 UTC
I landed an initial version:

https://git.gnome.org/browse/ostree/commit/?id=2d6374822b481e885aa5f74ef9e6f993d5c792c5

Still TODO:

* Fetch over HTTP
 - GPG signing
* More optimized versions
  - Use bup's rollsum to get closer to rsync prepared deltas
  - Investigate ELF-specific compression
* Recursive deltas
Comment 5 Colin Walters 2014-04-20 21:50:30 UTC
Latest branch:

https://git.gnome.org/browse/ostree/log/?h=wip/delta2

Now supports HTTP fetching, GPG signing.

Also, we now support a hybrid of deltas with plain HTTP object fetches.  This is useful for large files, particularly ones that are already compressed and won't deltify.  For example, the initramfs.

Still TODO:

* Move new commit object into superblock, so we can do version comparisions
* Freeze file format
* Remove debug prints
* Allow tuning delta size (32MB might be a good default, particularly with XZ)
* Fix pull code to interpret compressed sizes and show them
* "Useless delta" detection, when the client might as well just fetch objects individually (but is the total size estimation still useful?)

* Attempt to group "similar" objects which will compress well together
* Implement XZ compression
* Implement rollsum for large files
Comment 6 Colin Walters 2014-04-27 18:05:08 UTC
Work that landed:

* XZ compression (appears to be 30% better than gzip on average)
* Pack metadata objects together first - this can be a large win, even though they don't compress well, we end up with more efficient packing since they're small
* --max-usize optional parameter
Comment 7 Colin Walters 2014-04-27 18:06:16 UTC
Oh and most importantly:

* "fallback" fetching - we detect large objects, and automatically fall back to fetching them from the repo.

We should do this too for large already-compressed objects.  The main pain point here is the initramfs.
Comment 8 Colin Walters 2014-04-27 21:03:28 UTC
Major TODO that could require a format change:

* Stream objects into repo rather than doing write tmpfile + close + read + write again =/

We may need to include the size of each individual object in order to efficiently stream it.
Comment 9 Colin Walters 2014-05-03 14:44:59 UTC
TODO:

* rollsum for modified files
* Put deltas in a prefixed subdir like we do with objects, e.g. af/492a88
* prune should also prune deltas
* fsck should verify them?
Comment 10 Colin Walters 2014-05-12 21:36:25 UTC
Another TODO is add expected size to delta packs.  See https://bugzilla.gnome.org/show_bug.cgi?id=725921
Comment 11 Colin Walters 2014-05-24 21:49:50 UTC
* Look more closely at deltarpm - does it have any intelligence for ELF files say?  Internet reports claim it uses bsdiff, but I can't tell from the source.
Comment 13 Jasper St. Pierre (not reading bugmail) 2014-05-25 00:09:46 UTC
I think we can spend a large amount of research looking up the best possible delta binary compression algorithm, and maybe implement our own, because it's fun, but just getting *something* out the door is better than waiting for the perfect code for years to come.

If a large update currently takes 1 hour, and if we use bsdiff and it gets down to 10 minutes, that's a lot better. If we then get it down to 5 minutes with Courgette or bup or Chromium Updates or something else, then that's cool, but the first is a massive improvement over everything else.
Comment 14 Colin Walters 2014-05-25 13:15:46 UTC
The problem is that once delta formats are out, it becomes hard to add new things because old clients will only understand old formats.  The format will need to be versioned, and OS distributors will need to be careful not to generate deltas that existing clients can't understand.
Comment 15 Colin Walters 2014-05-25 13:17:00 UTC
So I'm looking for one inter-file delta algorithm to start to credibly handle cases like what we have in GNOME apps with GResource, where you have potentially large static objects embedded in one single file.

The current ostree deltas is just "tar of new files".
Comment 16 Colin Walters 2014-06-04 00:17:12 UTC
https://github.com/xtraeme/bsdiff-chromium
Comment 17 Colin Walters 2014-06-23 12:50:02 UTC
Created attachment 278997 [details] [review]
WIP: Static deltas

This is a large code drop to support static deltas.  To be rebased.
Comment 18 Colin Walters 2014-06-23 14:28:42 UTC
New git branch for this is: https://git.gnome.org/browse/ostree/log/?h=wip/staticdeltas
Comment 19 Giuseppe Scrivano 2014-11-03 13:08:38 UTC
what about adding a version at the beginning of the delta file for now (or it can be part of the filename), so clients can check if the version is supported before downloading the file.  If we decide to go for a version in the file itself, clients will need a range request to fetch the first part of the file, with the second solution, they can just ask for the file and see if it exists on the server.

With the latter approach, it will be possible to have different delta files on the server, so newer clients can use the most recent version (and as fallback re-try with an older version).

In the first version we can go for no inter-file delta support, and if we come out with something good, we can make a version 2 of the delta file.

Rebased version of staticdeltas:

https://github.com/giuseppe/ostree/tree/giuseppe/staticdeltas

What do you think?
Comment 20 Colin Walters 2014-11-03 20:52:16 UTC
(In reply to comment #19)
> what about adding a version at the beginning of the delta file for now (or it
> can be part of the filename), so clients can check if the version is supported
> before downloading the file.  

metadata for delta is already separate from the payload, right?

> With the latter approach, it will be possible to have different delta files on
> the server, so newer clients can use the most recent version (and as fallback
> re-try with an older version).

Having multiple delta versions would get quite expensive I'd think.  It'd have to be *really* worth it.

It's tricky as you'd have to know when all clients have upgraded far enough to be able to read the new deltas.

> In the first version we can go for no inter-file delta support, and if we come
> out with something good, we can make a version 2 of the delta file.

What makes me uncertain about the whole thing is - subtrees would be almost as efficient as static deltas, but a lot more flexible.  For example it would support downgrades as well.

Right?
Comment 21 Colin Walters 2014-11-03 21:27:49 UTC
Of course the static deltas code mostly works now.  We know it solves real problems.  I'm just debating.

Do you get what I'm proposing with subtrees?  It'd basically have almost the same tradeoffs as packages, without deltarpms.

Except that with packages, if someone changes %description, you end up redownloading the whole RPM.  With subtrees that wouldn't happen, you'd just download a new rpmdb in /usr/share/rpm.

The challenge with subtrees is that build scripts can't just "ostree commit" on a whole tree, they'll need to explicitly tell ostree how to construct the subtrees.

Which shouldn't be hard for projects like rpm-ostree which already have a chunking.
Comment 22 Jasper St. Pierre (not reading bugmail) 2014-11-03 21:53:00 UTC
What would we do for the EndlessOS case where we ship an entire filesystem root with OSTree?
Comment 23 Colin Walters 2014-11-03 21:55:54 UTC
Incidentally, very interesting discussion:

https://groups.google.com/forum/#!msg/docker-user/CWc5HB6kANA/Jxi9jG9tUSQJ
Comment 24 Colin Walters 2014-11-03 22:00:53 UTC
(In reply to comment #22)
> What would we do for the EndlessOS case where we ship an entire filesystem root
> with OSTree?

First, obviously the current exploded archive-z2 would stay around, and you could continue to use that.

Second it'd be an API, so a build system could choose to chunk however it chose (and change that over time).  Presumably your build system *does* take as input multiple components.
Comment 25 Colin Walters 2014-11-03 22:45:51 UTC
Slightly more fleshed out subtree proposal:

 - We keep the .commit object, except we define a new metadata key "subtreemeta" which has a checksum
 - Introduce a new repo/subtrees/${checksum:0-2}/${checksum:3-64}.subtreemeta

A .subtreemeta file would be:
  - array of <sha256 of subtree>
  - metadata a{sv} for extensibility

A .subtree file would be:
  - array of dirtree/dirmeta/file objects, compressed as XZ

Or hm, that introduces tricky ordering constraints as a package chunking would cross multiple directories (and thus dirtrees).  Well, you know we could just put all of the metadata objects in the front, and eat the cost of storing/downloading them per commit.  For a current Fedora tree, a gzip compressed tarball is 656k:

$ find repo/objects -name '*.dirtree' -o -name '*.dirmeta' | xargs tar czf meta.tar.gz
$ du -sh meta.tar.gz 
656K	meta.tar.gz

Client would also store the .subtree files.  The pull process would skip downloading any subtrees it already has, just like it skips downloading objects it has.

So, concrete example, let's say the kernel RPM changes, which on Fedora would be 3000 new objects, and in exploded archive-z2 3000 HTTP requests.  If we've chunked the kernel into its own subtree, we'd have:

 - A new ostree commit object
 - A new subtreemeta file, with at least two changes in the subtree array: One for /usr/share/rpm, one for the kernel content in /usr/lib/ostree-boot, /lib/modules, etc.
Comment 26 Colin Walters 2014-11-03 22:52:40 UTC
And thus, the client only downloads the 656k of new metadata objects, and the two subtrees it doesn't have.

Again at a high level, this is basically what package systems do - you download the packages you don't have already installed.


(And if we were willing to do more work here, we could ditch the current ostree Merkle tree with dirtree/dirmeta and have the subtree format be canonical, and probably trim a substantial portion of the 656k)
Comment 27 Colin Walters 2014-11-03 23:12:10 UTC
I was arguing that deltas are only really interesting when one is using Courgette, but I realized I failed to link to it.  Here it is:  http://lwn.net/Articles/359939/
Comment 28 Colin Walters 2014-11-04 00:15:43 UTC
Or, we could punt on subtrees in ostree, and patch e.g. rpm-ostree to support:

 - On a compose server, taking a set of .rpms and postprocessing them to a delta on top of the RPMs.  
 - On the client, support downloading the rpms, stream unpacking them to regenerate the binary level state

Would be kind of a hack...but OTOH would allow RPM distributions to reuse all of their infrastructure around management of a set of RPMs.

Concretely in this model, SELinux labeling would still happen on the server, and we would *not* run %posts on the client.
Comment 29 Giuseppe Scrivano 2014-12-02 15:36:50 UTC
If I understand it correctly, the most common use case will be upgrading to a newer version and when it comes to security issue this should be done as quickly as possible, so having to download a smaller amount of data helps with this.

the cost of handling subtrees for ostree users (rpm-ostree in this case) seems quite high compared to the flexibility that it introduces.  Do you think scenarios other than moving to $CURRENT_VERSION+1 will be common?
Comment 30 Colin Walters 2014-12-02 19:45:44 UTC
(In reply to comment #29)

> the cost of handling subtrees for ostree users (rpm-ostree in this case) seems
> quite high compared to the flexibility that it introduces.  Do you think
> scenarios other than moving to $CURRENT_VERSION+1 will be common?

One thing to note here - there's no reason that deltas have to be forwards-only; the current static deltas design would allow for downgrade deltas too.

Of course, doing this starts to quickly consume storage.  Let's do a bit of back-of-the-envelope calculation.  Say:

 - Content size 600MB
 - Update releases once a month
 - Releases change 15% of the content on average
 - Product lifecycle is 4 years

In that scenario, unidirectional deltas are ~5GB.  Which is quite acceptable.  Bidirectional would take it to ~10GB.  Which is still OK.

Implicit in this of course is that we only have "individual deltas".  Let me attach some "back of the envelope" code here:

#!/usr/bin/env gjs

const PARAMS = {size: 600,
		releasesPerYear: 12,
		avgContentDelta: 0.15,
		lifecycleYears: 4};

function unidirectionalSingleForwardSize(params) {
  return params.size + (params.size * params.avgContentDelta) * params.releasesPerYear * params.lifecycleYears;
}

function unidirectionalAnyForwardSize(params) {
    let delta = (params.size * params.avgContentDelta);
    let releases = params.releasesPerYear * params.lifecycleYears;
    let totalDeltaSize = 0;
    for (let i = 0; i < releases; i++)
	totalDeltaSize += delta * i;
    return params.size + totalDeltaSize;
}

function unidirectionalLogForwardSize(params) {
    let delta = (params.size * params.avgContentDelta);
    let releases = params.releasesPerYear * params.lifecycleYears;
    let totalDeltaSize = 0;
    for (let i = 1; i < releases; i++)
	totalDeltaSize += delta * Math.ceil(Math.log(i) / Math.LN2);
    return params.size + totalDeltaSize;
}

print("unidirectionalSingleForward: " + unidirectionalSingleForwardSize(PARAMS));
print("unidirectionalAnyForwardSize: " + unidirectionalAnyForwardSize(PARAMS));
print("unidirectionalLogForwardSize: " + unidirectionalLogForwardSize(PARAMS));
Comment 31 Colin Walters 2014-12-02 19:49:14 UTC
Here are some numbers from that:

$ gjs ~/tmp/contentsize.js
unidirectionalSingleForward: 4920
unidirectionalAnyForwardSize: 102120
unidirectionalLogForwardSize: 20310
$

So what we're seeing here is if we supported deltas between *any* revision going forwards, the repo would be 100GB, which is not trivial.

The current OSTree static deltas design supports recursion - deltas can refer to deltas to be applied before.  That means it's possible to do "rolling deltas", where to go A -> B -> C -> D, the C->D delta refers you to say A->B, then maybe B->D.  This skips C->D, which might have its own delta that largely duplicates B->D, and the server side decided to pre-compute it.
Comment 32 Giuseppe Scrivano 2014-12-09 16:00:54 UTC
I have experimented a bit with Courgette trying to add support for x86_64, it doesn't seem like a trivial effort (need to cleanup courgette a bit to support 64 bits in a cleaner way, add support for more ELF64 relocations).

my current WIP is here (it is not really anything useful, I've used it to make some experiments to get a clue of what is needed): https://github.com/giuseppe/courgette

If we want to have something quicker, my suggestion is to start using bsdiff, and add a new metadata field that specifies what compression algorithm is used, so maybe we can re-evaluate this choice in future and use courgette.

Old clients that don't support the specified algorithm, can fallback to retrieving each file separately (as done now), so it will be easier to change the algorithm in future.
Comment 33 Colin Walters 2014-12-09 17:13:44 UTC
Thanks a ton for investigating this!  I'm OK with bsdiff as an intermediate solution.  My main issue with it is it's *very* CPU intensive relative to the benefit (I mentioned this in https://bugzilla.gnome.org/show_bug.cgi?id=721799#c0 )

But it does have the benefit that it works generically across all uncompressed data.
Comment 34 Colin Walters 2014-12-09 19:09:22 UTC
I see a commit "Port to x64 Android", and that's presumably ELF, right?

https://codereview.chromium.org/222293002

Hm, probably from the commit they were just getting it to *build* but weren't actually using it yet.
Comment 35 Giuseppe Scrivano 2014-12-09 19:17:42 UTC
I am not sure why it says x64 Android, but all I can see is ELF-32 here, while x86_64 is using ELF-64 and that is what I was trying to add.

bsdiff should not be worse than courgette as CPU intensive, since courgette is basically executable preprocessing + bsdiff ("should" as the preprocessing phase might trim a bit of the executable size).
Comment 36 Colin Walters 2014-12-09 19:33:12 UTC
Ah, I see.  Indeed, shrinking the input size provided to bsdiff probably makes a huge difference, from  http://www.daemonology.net/bsdiff/ :


    bsdiff is quite memory-hungry. It requires max(17*n,9*n+m)+O(1) bytes of memory, where n is the size of the old file and m is the size of the new file. bspatch requires n+m+O(1) bytes.

    bsdiff runs in O((n+m) log n) time; on a 200MHz Pentium Pro, building a binary patch for a 4MB file takes about 90 seconds. bspatch runs in O(n+m) time; on the same machine, applying that patch takes about two seconds. "


My bad experience with bsdiff is from deltarpm, which uses it in a quite naive way - runs bsdiff over the complete uncompressed cpio stream.  A smarter use seems like it would be to only use bsdiff *individually* on nontrivial files that we know changed.

That would preclude optimizing the case where a large file got split in two or more pieces, that but that seems relatively unlikely.  Well, I guess it'd happen if golang learned to not statically link, and we did an upgrade where all those 15MB + statically linked binaries started sharing code in /usr/lib64.  Eh.  I'd still lean towards avoiding inter-object bsdiff unless we can think of another case.

BTW, I think the static deltas code today has "rsync-style" support where it dedups common object data on the wire at least.  That might kind of conflict with bsdiff.  I'm OK unwinding it if so.
Comment 37 Alexander Larsson 2014-12-18 19:05:55 UTC
A lot of discussions here are about deltas. But a really bad problem with ostree over http right now is the initial fetch where all the http request kills you.

It would be nice if we (in addition to the exploded files) could store files in combined files + an index. Then one could use byte range request to download multiple files in single requests.

A very simple approach would be that each new commit would put all object that are new in a single file, then we could make a new index file for that and put a reference to that index file in another list-of-indexes object (similar to a dir object).
Comment 38 Colin Walters 2014-12-19 03:22:42 UTC
(In reply to comment #37)
> A lot of discussions here are about deltas. But a really bad problem with
> ostree over http right now is the initial fetch where all the http request
> kills you.

See comment #3

The reason the "initial fetch" problem wasn't a high priority for me is for the OS itself, one uses an install media/cloud image to get the initial content.  But I definitely see how it's a lot worse for your use case.

And even for the base OS case, it's nice to be able to use ostree's parallel install capabilities.

> It would be nice if we (in addition to the exploded files) could store files in
> combined files + an index. Then one could use byte range request to download
> multiple files in single requests.

This is possible, but proxy support is iffy.  See:
https://moz.com/devblog/how-to-cache-http-range-requests/
And if range request fails, all of a sudden you're downloading a *lot* more data.
 
> A very simple approach would be that each new commit would put all object that
> are new in a single file, then we could make a new index file for that and put
> a reference to that index file in another list-of-indexes object (similar to a
> dir object).

Where that gets ugly is say you're upgrading from an initial release 0 to N commits later, and the latest tree happens to need objects from the commits between 0-N.  Say the kernel package adds a /usr/share/doc/kernel/README.blah file that changes at N-2 and N-1, you will download 3 full kernels (plus everything else that changed in those commits!)


Basically I'm worried about approaches that have unpredictable performance.

Now if you're saying that (let's call it "newblobs") would be in *addition* to the exploded model, that would allow the server side to choose.

Which is getting closer to the static deltas proposal.
Comment 39 Alexander Larsson 2014-12-19 08:16:52 UTC
> Where that gets ugly is say you're upgrading from an initial release 0 to N
> commits later, and the latest tree happens to need objects from the commits
> between 0-N.  Say the kernel package adds a /usr/share/doc/kernel/README.blah
> file that changes at N-2 and N-1, you will download 3 full kernels (plus
> everything else that changed in those commits!)

Well, the idea would be that you *don't* need to download all of the 
data from the intermediate files. Instead it just looks at the index file and requests the right ranges from each of the files: "bytes=0-1024,10000-12000"

Of course, if byte ranges don't work, then that is a no-go.
Comment 40 Giuseppe Scrivano 2014-12-19 11:51:12 UTC
Would it make sense to consider a delta also for the first commit?

Now if I try to generate a delta for the first commit, I get the error "error: Commit $COMMIT_ID has no parent".  Could we instead extend it to be the delta between the empty set and any object added by the commit?
Comment 41 Colin Walters 2015-01-12 16:34:04 UTC
Yes, initial fetch deltas would make sense.

However to be usable, I think more is required.  For example, actually fetching it would presently require clients to fetch the empty parent explicitly, and then fetch the delta.

But supporting this more cleanly shouldn't be too hard, just teach static-delta to have an --empty, and the fetch code to look for deltas/$target instead of deltas/$from-$to.
Comment 42 Alexander Larsson 2015-01-13 20:39:27 UTC
Created attachment 294460 [details] [review]
Fix ostree_repo_list_static_delta_names

The current layout uses a prefix of two bytes as the initial dir
and a second directory inside that with the superblock. This
updates the list code to handle that.
Comment 43 Alexander Larsson 2015-01-13 20:39:34 UTC
Created attachment 294461 [details] [review]
Allow creating and pulling static deltas starting from "empty"

You create these with something like:
  ostree static-delta generate --empty --to=master

These will be automatically used during pull if no previous revision
exists in the target repo.

These work very much like the normal static deltas except they
are named just by the "to" revision. I.e:

deltas/94/f7d2dc23759dd21f9bd01e6705a8fdf98f90cad3e0109ba3f6c091c1a3774d

for a from-scratch to 94f7d2dc23759dd21f9bd01e6705a8fdf98f90cad3e0109ba3f6c091c1a3774d delta.
Comment 44 Alexander Larsson 2015-01-13 20:46:12 UTC
Isn't there a problem with deltas that they contain *all* objects inbetween two revisions. This means that a limited pull depth won't do you any good.
Comment 45 Colin Walters 2015-01-13 21:08:11 UTC
(In reply to comment #44)
> Isn't there a problem with deltas that they contain *all* objects inbetween two
> revisions. This means that a limited pull depth won't do you any good.

The delta contains all objects that are in the new revision and not in the old, or simply new objects.

I'm not sure what you mean wrt the concern with pull depth - can you elaborate?
Comment 46 Alexander Larsson 2015-01-13 22:06:14 UTC
Basically it does:
ostree_repo_traverse_commit (repo, from, -1, &from_reachable_objects, cancellable, error):
ostree_repo_traverse_commit (repo, to, -1, &to_reachable_objects, cancellable, error);

and then it puts all to_reachable_objects that are not in from_reachable_objects in the delta.

So, consider a delta from a to c, where a->b added one large file, and b->c replaced the same file.
Since the above passed -1 for maxdepth, both the added file in b and the replaced file in c will end up in the delta.

However, if you're at a and do "ostree pull --depth=0 c" you don't expect to download the large file from rev b as it was replaced and the max depth does not include b.
However, it will be downloaded if we use the delta, no?
Comment 47 Colin Walters 2015-01-14 02:33:12 UTC
Created attachment 294484 [details] [review]
deltas: Do not traverse all objects, only both commits

That's the way they were designed.  We really don't want to include
all intervening objects.
Comment 48 Colin Walters 2015-01-14 02:43:43 UTC
Review of attachment 294460 [details] [review]:

LGTM after one fix.

::: src/libostree/ostree-repo-static-delta-core.c
@@ +71,3 @@
   gs_unref_ptrarray GPtrArray *ret_deltas = NULL;
   gs_unref_object GFileEnumerator *dir_enum = NULL;
+  gs_unref_object GFileEnumerator *dir_enum2 = NULL;

Shouldn't this declaration be inside the inner loop?  Otherwise we leak every instance except the last AFAICS.
Comment 49 Colin Walters 2015-01-14 02:50:31 UTC
Review of attachment 294461 [details] [review]:

Looks fairly straightforward, just one comment about the diff to ostree-diff.c.

::: src/libostree/ostree-diff.c
@@ +232,3 @@
   gs_unref_object GFileInfo *child_b_info = NULL;
 
+  child_b_info = g_file_query_info (b, OSTREE_GIO_FAST_QUERYINFO,

Is this related to null deltas or an unrelated fix?  Can you split this out as a separate patch with bug explanation?

::: src/libostree/ostree-repo-pull.c
@@ +1913,3 @@
         goto out;
 
+      if (from_revision == NULL || g_strcmp0 (from_revision, to_revision) != 0)

I'm thinking recently that it might be nicer if the summary file included deltas.  But no need to fix that in this bug.
Comment 50 Colin Walters 2015-01-14 02:51:46 UTC
(In reply to comment #44)
> Isn't there a problem with deltas that they contain *all* objects inbetween two
> revisions. This means that a limited pull depth won't do you any good.

Indeed, you found a rather bad bug...and I need to revisit some size data I was looking at for deltas with this knowledge now.  Can you review my patch for this?
Comment 51 Giuseppe Scrivano 2015-01-14 07:09:06 UTC
I had already sent a merge request on github to fix the problem with --list here:

https://github.com/GNOME/ostree/pull/33
Comment 52 Alexander Larsson 2015-01-14 07:37:32 UTC
Review of attachment 294461 [details] [review]:

::: src/libostree/ostree-diff.c
@@ +232,3 @@
   gs_unref_object GFileInfo *child_b_info = NULL;
 
+  child_b_info = g_file_query_info (b, OSTREE_GIO_FAST_QUERYINFO,

I just switched the order of the query_infos in an earlier attempt and it was left over.
Comment 53 Alexander Larsson 2015-01-14 07:38:51 UTC
Created attachment 294493 [details] [review]
Fix ostree_repo_list_static_delta_names

The current layout uses a prefix of two bytes as the initial dir
and a second directory inside that with the superblock. This
updates the list code to handle that.
Comment 54 Alexander Larsson 2015-01-14 07:38:57 UTC
Created attachment 294494 [details] [review]
Allow creating and pulling static deltas starting from "empty"

You create these with something like:
  ostree static-delta generate --empty --to=master

These will be automatically used during pull if no previous revision
exists in the target repo.

These work very much like the normal static deltas except they
are named just by the "to" revision. I.e:

deltas/94/f7d2dc23759dd21f9bd01e6705a8fdf98f90cad3e0109ba3f6c091c1a3774d

for a from-scratch to 94f7d2dc23759dd21f9bd01e6705a8fdf98f90cad3e0109ba3f6c091c1a3774d delta.
Comment 55 Alexander Larsson 2015-01-14 07:40:49 UTC
Updated patches from review comments.
Also i changed "scratch" to "empty" in a few places to make sure we're consistenly using the same term (from empty, not from scratch).
Comment 56 Alexander Larsson 2015-01-14 09:04:44 UTC
Review of attachment 294484 [details] [review]:

This looks good to me. I verified that after the fix the delta between two "problematic" commits indeed got a lot smaller.
Comment 57 Colin Walters 2015-01-14 13:13:58 UTC
Comment on attachment 294484 [details] [review]
deltas: Do not traverse all objects, only both commits

Attachment 294484 [details] pushed as 97fbd87 - deltas: Do not traverse all objects, only both commits
Comment 58 Colin Walters 2015-01-14 13:20:07 UTC
(In reply to comment #51)
> I had already sent a merge request on github to fix the problem with --list
> here:
> 
> https://github.com/GNOME/ostree/pull/33

Indeed, sorry about that Giuseppe!  It looks like your test code is a bit better as it's verifying the actual output.  Alex, can you incorporate that change from https://github.com/giuseppe/ostree/commit/7db13bcf3be5805bba84e822088f100769a5529c and then go ahead and push?
Comment 59 Alexander Larsson 2015-01-14 13:44:19 UTC
Attachment 294460 [details] pushed as 82ed6c4 - Fix ostree_repo_list_static_delta_names
Attachment 294461 [details] pushed as 5b721a5 - Allow creating and pulling static deltas starting from "empty"
Attachment 294493 [details] pushed as 82ed6c4 - Fix ostree_repo_list_static_delta_names
Attachment 294494 [details] pushed as 5b721a5 - Allow creating and pulling static deltas starting from "empty"
Comment 60 Colin Walters 2015-01-14 13:59:24 UTC
One thing I'm thinking about here WRT Courgette and my concerns about format versioning.  One thing we could do around versioning of the format is to have any later extensions be in secondary chunks.

Say we release the current state (without Courgette).  If ostree later gains it, we could move the delta chunks which use it to the end.  So the delta would be:

[v0] [v0] [v1 using Courgette just for the useful targets]

Clients would iterate in the delta until they got to a chunk version that was too new.  In that case, they just fall back to individual fetches for those objects.

This would work quite well I imagine because doing a fetch for the 20MB+ golang binaries is a reasonable thing to do with HTTP.

And at the moment, deltas are an *addition* to the loose format, so the fallback fetch always works.

Eventually, I think we'll want to support the concept of delta-only repositories.  In that case, we'd have a --min-version argument to the delta generator, which would unlink loose objects referenced by delta chunks <= that version.
Comment 61 Colin Walters 2015-01-14 16:01:38 UTC
Review of attachment 294494 [details] [review]:

Ok, this looks reasonable to me.
Comment 62 Alexander Larsson 2015-01-15 10:28:57 UTC
I like the versioning. They don't even necessarily have to be at the end, we can just skip whichever delta parts have a too new version (if we store the version in the superblock).
Comment 63 Alexander Larsson 2015-01-15 10:32:04 UTC
If we want to make a delta-only repository, could we not also add a reverse map from each object csum to a list of (delta, chunk nr, offset, size) that store the object. Then you can do a sort of fallback from any version to any version via the deltas. It may not be optimal (if the web server doesn't support ranges), but it would at least *work*.
Comment 64 Colin Walters 2015-01-15 13:44:52 UTC
Yeah, I was just thinking it's easiest if we think of them as at the end.  The initial delta support is general purpose "tar of new files".  Then we should probably add BSDIFF for files where that works well.  And after that, we add courgette? 

Mmmm...that reverse map could be quite difficult to compute in general.  

You understand that at the moment there's a separation between a "compiler" and a "processor", right?  Right now the compiler is dead simple, it outputs an opcode WRITE(<contents>, <len>) for each object.

However, there are many files which are optimized very well "rsync style" by rolling checksums.  We can implement that purely in the compiler by reading chunks from the *previous commit* on disk, and copying them into the new object, without transmitting them at all.
Comment 65 Alexander Larsson 2015-01-20 12:53:04 UTC
Why is the delta chunk size and the minimum size to use fallback for a file the same property? It seems to me to say, want to use fallbacks for all files > 2 megs, but use larger chunk sizes for the smaller files. This way large files where the overhead of the http request is low, could be shared between deltas. (For intstance, if you have multiple from-empty deltas).
Comment 66 Alexander Larsson 2015-01-20 13:21:05 UTC
Created attachment 294976 [details] [review]
static delta generation: Separate max chunk size from min fallback size

There is no particular reason these have to be the same.
Comment 67 Colin Walters 2015-01-20 13:29:00 UTC
Review of attachment 294976 [details] [review]:

One small doc change request, otherwise looks good.

::: src/libostree/ostree-repo-static-delta-compilation.c
@@ -466,3 @@
   builder.fallback_objects = g_ptr_array_new_with_free_func ((GDestroyNotify)g_variant_unref);
 
-  if (!g_variant_lookup (params, "max-usize", "u", &max_usize))

Theoretically, this is API, so we couldn't remove max-usize.  But practically speaking I've been saying deltas are unstable WIP, and I'm fairly sure no one is yet calling the API.

::: src/ostree/ot-builtin-static-delta.c
@@ +57,3 @@
   { "gpg-sign", 0, 0, G_OPTION_ARG_STRING_ARRAY, &opt_key_ids, "GPG Key ID to sign the delta with", "key-id"},
   { "gpg-homedir", 0, 0, G_OPTION_ARG_STRING, &opt_gpg_homedir, "GPG Homedir to use when looking for keyrings", "homedir"},
+  { "min-fallback-size", 0, 0, G_OPTION_ARG_STRING, &opt_min_fallback_size, "Minimum uncompressed size in megabytes for fallback use", NULL},

I'd s/fallback/individual HTTP request/ or something to help clarify.
Comment 68 Alexander Larsson 2015-01-20 13:45:07 UTC
Review of attachment 294976 [details] [review]:

::: src/libostree/ostree-repo-static-delta-compilation.c
@@ -466,3 @@
   builder.fallback_objects = g_ptr_array_new_with_free_func ((GDestroyNotify)g_variant_unref);
 
-  if (!g_variant_lookup (params, "max-usize", "u", &max_usize))

Also, its not gonna "break" anything. Just work slightly differently.
Comment 69 Alexander Larsson 2015-01-20 13:48:16 UTC
Comment on attachment 294976 [details] [review]
static delta generation: Separate max chunk size from min fallback size

Attachment 294976 [details] pushed as 6384310 - static delta generation: Separate max chunk size from min fallback size
Comment 70 Colin Walters 2015-01-22 02:52:09 UTC
I investigated briefly the applicability of "rsync style" rolling checksums to a real-world content set I have.  The first thing I looked at was golang ELF files (for exmaple, kube-apiserver).  As expected, rollsums were useless.

Next, I looked at /usr/share/rpm/Packages.  Sadly, they don't work well there either.  I didn't investigate why; probably berkeley DB has sufficient randomness.  However, I suspect we could teach rpm/bdb how to canonicalize/sort entries to make it rollsum friendly.

An example win though would be selinux policy in /usr/etc/selinux/targeted/policy/policy.29.  It's a 4MB object that rollsums well (well, policy didn't change, but we rebuilt the file).

Beyond that though, in this content set the other changes were lots of pretty small files.  Some shell scripts, some time zone files.  Maybe we'd get some mileage out of rollsums for .py files, but I doubt much, and because they're text they're going to compress really well anyways.

Where rollsum works out well is binary data that isn't subject to cascading changes; the /usr/lib/locale/locale-archive would be a perfect example, except this tree isn't shipping it.
Comment 71 Colin Walters 2015-01-22 04:19:31 UTC
Some investigation of bsdiff is that it *might* be worth including.

Again looking at this 10MB kube-apiserver ELF file I have.  bsdiff generates a 1Mb patch.  But a simple xz of the new file is just 2MB!  Obviously, bsdiff is a win here...but I'm debating whether it's *enough* of a win over xz that it's worth the engineering effort to support.

Similarly, a pathological case is /usr/lib/locale/locale-archive, which clocks in at 100MB in Fedora right now.  However, with xz it's just 3.5MB.  Ok, so bsdiff takes about 20 seconds and 1.6GB of resident memory to come up with a 12278 patch.  That's about 1% of the compressed XZ size, and obviously a notable win on space.  bspatch claims to only use O(n+m) memory, which would be a lot better, but still too high for small devices.

I think if we were to use bsdiff there would have to be some upper bound on the input chunk size.  In fact...I bet bsdiff would work about equally well if one divided up the input file into chunks.
Comment 72 Colin Walters 2015-01-22 14:31:00 UTC
bsdiff does pretty well on the rpmdb though:

-rw-r--r--. 1 root root 18M Dec 31  1969 t.bare/objects/c1/fb302402336c6486a4d5b4a5c19429e82e8193483eeb9fb9ba1df448d936fc.file
-rw-r--r--. 1 root root 17M Dec 31  1969 t.bare/objects/1d/0eb42bda8f3e3ffcf8df84de247ae7107e35d5d3039653a251fd87df4338d5.file
-rw-r--r-- 1 root root 3.2M Jan 22 09:12 Packages.xz
-rw-r--r-- 1 root root 992K Jan 22 09:11 Packages.bspatch

The delta is 1/3 the size of pure compression.  I'd say that'd probably getting into the "worth using" territory.
Comment 73 Colin Walters 2015-01-26 17:16:42 UTC
My offhand deltas v0 checklist;

 - versioning: This is started in https://github.com/cgwalters/ostree/tree/wip/deltas

 - Change the format so that content objects can be streamed directly instead of having to do write-a-temp-file.  I think the easiest might be to have metadata and content objects be separated in the deltaparts, and we have the content metadata in the deltapart or so?  I can take a look at this.  [walters]

 - Consider dropping GPG signatures of deltas, and instead rely solely on GPG signatures of the summary file

 - Support mirroring of deltas directly, rather than having pull --mirror explode them and have to regenerate them client side (this is a general pull issue, we should support passthrough of archive-z2 -> archive-z2).

 - Make a decision on having bsdiff or not, and which implementation.  Witness bugs like this:  https://code.google.com/p/chromium/issues/detail?id=260366  Deltarpm also has a completely rewritten implementation and they've ended up fixing several critical bugs over time, just recently: https://gitorious.org/deltarpm/deltarpm/commit/619eaf318b3420056c14933bd513201bfb8af494
Comment 74 Giuseppe Scrivano 2015-01-26 18:16:59 UTC
Another thing I had in mind is how older versions of ostree will manage newer deltapart files.

My idea is that after the deltaparts are processed in a pull request, ostree should traverse the commit and be sure that there are no missing objects.  In case there are, they should be downloaded individually.

In this way we can have more deltaparts each of them can have a different compression, and still the older clients can take advantage of the deltaparts that are supported.

RFC path is here:

https://github.com/giuseppe/ostree/tree/giuseppe/add-pull-enable-syncfs

In future we can think about adding the compression (or the version required to handle it) in the superblock so that the unsupported ones will not be downloaded.
Comment 75 Colin Walters 2015-02-02 16:10:40 UTC
Ok, a few updates from the current:

https://github.com/GNOME/ostree/pull/54

- rsync style deltas shave ~6MB uncompressed on my test content set (probably ~2MB compressed).  Not bad, but not really great either.  Will still be helpful though for anyone who ships data sets like the locale-archive that are rollsum friendly.

- I did some more analysis of bsdiff - before, i was only looking at modified files with the same path, but of course we also have things like /usr/lib/modules/$version which change path, but are conceptually the same file.

Here for example is the ext4 kernel module; this kernel update I believe didn't change the code (or compiler, etc.)

-00644 0 0 890897 745c78072dd6b5c0f14e811b225c3fc0e1945a7ba46b194c6089ac0252c1355c /usr/lib/modules/3.10.0-123.13.1.el7.x86_64/kernel/fs/ext4/ext4.ko
-00644 0 0 890897 92c3484834f613490c7f844770efbd6e470816f2b0f43f5f80db95198e1653d8 /usr/lib/modules/3.10.0-123.9.2.el7.x86_64/kernel/fs/ext4/ext4.ko

There's tiny changes scattered between the two files; for example, every kernel module embeds its source kernel version.  It also looks like the module is signed, so there's a difference chunk at the end.


TL;DR:
 - ext4.ko.xz is 199532 bytes
 - ext4.bsdiff is 843 bytes

So bsdiff is 0.004% of the size!  That's amazing.  We saw that level of compression with locale-archive, but the fact that we get that for kernel modules too is a much stronger argument for supporting it out of the gate.

Uncompressed, all kernel modules are 111MB (wow), xz compressed tar is 21MB.  I just ran a script to test bsdiff against all modules, and I get a 310K tar of diffs, for an average of 0.1% of the size.  That's *really* good for a real world change.

So the next action item is to pick one of the bsdiff implementations:
 - deltarpm
 - chromium
 - mozilla
 - "upstream" (probably not, everyone's forked it for a reason)
Comment 76 Giuseppe Scrivano 2015-02-11 14:08:09 UTC
would the rollsum compression perform better if it uses the raw file content instead of using the compressed files (if I am not misunderstanding the code)?

How would bsdiff and rollsum co-exist in ostree?  Maybe we can use the rollsum as an heuristic to detect similar "enough" files and then use bsdiff?

To proceed with bsdiff support, is it ok to add these two new instructions: OSTREE_STATIC_DELTA_OP_SET_READ_RAW_SOURCE[1] and OSTREE_STATIC_DELTA_OP_OPEN_BSDIFF_SPLICE_AND_CLOSE[2]?

1) Read and copy the origin file in a temporary buffer.
2) Apply bspatch to the temporary buffer, open the destination file, write to it and close.
Comment 77 Colin Walters 2015-02-11 19:53:57 UTC
It's going to be really hard to support pre-compressed content.  Which files are you thinking of?  Like how in Fedora 21+ the kernel modules are .xz compressed?
 https://lists.fedoraproject.org/pipermail/devel/2014-May/199323.html

But in general most of the things we want to bsdiff (executables) are not compressed, right?  

Or are you thinking of the server side?  That's handled for rollsums today with static gboolean
get_unpacked_unlinked_content()
Comment 78 Giuseppe Scrivano 2015-02-12 00:36:12 UTC
No, sorry, I was talking server side.

I've missed that when I have looked at the ostree-repo-static-delta-processing.c file instead.  It seems that "static gboolean dispatch_set_read_source()" uses _ostree_repo_read_bare_fd so does it mean it works only with OSTREE_REPO_MODE_BARE and OSTREE_REPO_MODE_BARE_USER repos?

Since OSTREE_STATIC_DELTA_OP_SET_READ_SOURCE works on the uncompressed content, there won't be any need of an additional OSTREE_STATIC_DELTA_OP_SET_READ_RAW_SOURCE instruction as it would work in a similar way.
Does a new instruction OSTREE_STATIC_DELTA_OP_OPEN_BSDIFF_SPLICE_AND_CLOSE look good to you?  It will have only two arguments, origin checksum and destination file length (so to allocate it before executing bspatch).
Comment 79 Colin Walters 2015-02-12 14:59:32 UTC
Let's not try to support applying deltas between archive-z2 repositories - I can't think of an immediate use case.  I think most people doing archive->archive will want to replicate exactly.

I'd call the instruction OPEN_BSPATCH_AND_CLOSE (shorter and clearer right?)  And wouldn't the instruction need an argument for the offset of the patch in the payload?
Comment 80 Colin Walters 2015-03-06 14:35:55 UTC
Ok, we landed bsdiff.  This all needs more testing, but I think as far as the mechanics of generating a delta content, it will work.

Before we call this v0 done though, I'd like to consider the delta metadata (and metadata in general).

See GPG signature discussion https://mail.gnome.org/archives/ostree-list/2015-February/msg00004.html

I'm also thinking that things might be cleaner if the deltas became content-addressed objects.
Comment 81 Colin Walters 2015-05-14 00:35:49 UTC
Giuseppe landed the GPG signatures on the summary file.

We still need some more "real world" testing of the current deltas on real content sets.
Comment 82 Colin Walters 2016-06-23 22:32:19 UTC
I think we can call this done.