After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 759442 - Transactions not safe against concurrent prunes
Transactions not safe against concurrent prunes
Status: RESOLVED OBSOLETE
Product: ostree
Classification: Infrastructure
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: OSTree maintainer(s)
OSTree maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2015-12-14 11:46 UTC by Alexander Larsson
Modified: 2017-04-26 14:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
repo: After renaming in all loose objects, ensure metadata is stable (3.51 KB, patch)
2015-12-14 11:51 UTC, Alexander Larsson
none Details | Review
repo: Add a file lock to protect operations against concurrent prune ops In general, we're pretty robust when there are multiple ongoing operations, because transactions are isolated to per-transaction staging dirs, and the transaction commit is a last-wr (8.77 KB, patch)
2015-12-14 11:51 UTC, Alexander Larsson
none Details | Review
prune: Add --non-blocking argument (3.26 KB, patch)
2015-12-14 11:51 UTC, Alexander Larsson
none Details | Review

Description Alexander Larsson 2015-12-14 11:46:20 UTC
In general, we're pretty robust when there are multiple ongoing operations, because transactions are isolated to per-transaction staging dirs, and the transaction commit is a last-writer-win replace operation on immutable data.

However, any non-atomic read operation can fail if it is concurrent with a prune, as then objects may disappear under the operations feet half-way.

This patch-set makes this robust by having a repo lock that is taken in a shared fashion for all transactions, and for checkouts, but is taken in exclusive mode for prune. It also adds a non-blocking prune mode that allows you to do opportunistic prunes that don't block if there is an ongoing operation.

There are a bunch of operations that are still unsafe (does not block prune), such as: cat, ls, show, log, diff, generate-delta, rev-parse. I don't think that is necessarily a huge problem, as these are mainly used in development or debugging, and a failure here will just be an error printed, it will never cause a broken repo.

This is a simpler version of what i proposed at:
https://mail.gnome.org/archives/ostree-list/2015-December/msg00024.html
(it takes a lock during the whole transaction rather than retrying the transaction under a lock when commiting it.)
Comment 1 Alexander Larsson 2015-12-14 11:51:19 UTC
Created attachment 317351 [details] [review]
repo: After renaming in all loose objects, ensure metadata is stable

When a transaction is finished and we have moved all the staged loose
objects into the repo we fsync all the object directory, to ensure the
filenames are stable before we update the refs files to point to the
new commits.

With out this an unclean shutdown after the transaction is finished
could result in a refs file that points to an incomplete commit.
Comment 2 Alexander Larsson 2015-12-14 11:51:24 UTC
Created attachment 317352 [details] [review]
repo: Add a file lock to protect operations against concurrent prune ops In general, we're pretty robust when there are multiple ongoing operations, because transactions are isolated to per-transaction staging dirs, and the transaction commit is a last-wr

However, any non-atomic read operation can fail if it is concurrent with a prune, as then objects may disappear under the operations feet half-way.

This patch-set makes this robust by having a repo lock that is taken in a shared fashion for all transactions, and for checkouts, but is taken in exclusive mode for prune. It also adds a non-blocking prune mode that allows you to do opportunistic prunes that don't block if there is an ongoing operation.

There are a bunch of operations that are still unsafe (does not block prune), such as: cat, ls, show, log, diff, generate-delta, rev-parse. I don't think that is necessarily a huge problem, as these are mainly used in development or debugging, and a failure here will just be an error printed, it will never cause a broken repo.

This is a simpler version of what i proposed at:
https://mail.gnome.org/archives/ostree-list/2015-December/msg00024.html
(it takes a lock during the whole transaction rather than retrying the transaction under a lock when commiting it.)
Comment 3 Alexander Larsson 2015-12-14 11:51:29 UTC
Created attachment 317353 [details] [review]
prune: Add --non-blocking argument
Comment 4 Dan Nicholson 2016-12-27 23:03:04 UTC
Any chance we can get this in? I think we're starting to hit this at Endless where we're sharing /ostree/repo with flatpak. Our OS updater and flatpak might run concurrently, and only the OS updater is using the sysroot lock to handle concurrency.
Comment 5 Joaquim Rocha 2017-04-26 14:09:07 UTC
Hi, any update on this?
We've recently got more reports about this so it'd be nice to get this in soon. Please let me know if there's something we can do to accelerate the integration.
Comment 6 Colin Walters 2017-04-26 14:35:14 UTC
Sorry, this fell off the radar; haven't been tracking things via bugzilla.  I rebased these patches, fixed some of the conflicts (although that needs auditing), and submitted a PR to github:

https://github.com/ostreedev/ostree/pull/813

Let's move the conversation there, thanks!