GNOME Bugzilla – Bug 759442
Transactions not safe against concurrent prunes
Last modified: 2017-04-26 14:35:14 UTC
In general, we're pretty robust when there are multiple ongoing operations, because transactions are isolated to per-transaction staging dirs, and the transaction commit is a last-writer-win replace operation on immutable data. However, any non-atomic read operation can fail if it is concurrent with a prune, as then objects may disappear under the operations feet half-way. This patch-set makes this robust by having a repo lock that is taken in a shared fashion for all transactions, and for checkouts, but is taken in exclusive mode for prune. It also adds a non-blocking prune mode that allows you to do opportunistic prunes that don't block if there is an ongoing operation. There are a bunch of operations that are still unsafe (does not block prune), such as: cat, ls, show, log, diff, generate-delta, rev-parse. I don't think that is necessarily a huge problem, as these are mainly used in development or debugging, and a failure here will just be an error printed, it will never cause a broken repo. This is a simpler version of what i proposed at: https://mail.gnome.org/archives/ostree-list/2015-December/msg00024.html (it takes a lock during the whole transaction rather than retrying the transaction under a lock when commiting it.)
Created attachment 317351 [details] [review] repo: After renaming in all loose objects, ensure metadata is stable When a transaction is finished and we have moved all the staged loose objects into the repo we fsync all the object directory, to ensure the filenames are stable before we update the refs files to point to the new commits. With out this an unclean shutdown after the transaction is finished could result in a refs file that points to an incomplete commit.
Created attachment 317352 [details] [review] repo: Add a file lock to protect operations against concurrent prune ops In general, we're pretty robust when there are multiple ongoing operations, because transactions are isolated to per-transaction staging dirs, and the transaction commit is a last-wr However, any non-atomic read operation can fail if it is concurrent with a prune, as then objects may disappear under the operations feet half-way. This patch-set makes this robust by having a repo lock that is taken in a shared fashion for all transactions, and for checkouts, but is taken in exclusive mode for prune. It also adds a non-blocking prune mode that allows you to do opportunistic prunes that don't block if there is an ongoing operation. There are a bunch of operations that are still unsafe (does not block prune), such as: cat, ls, show, log, diff, generate-delta, rev-parse. I don't think that is necessarily a huge problem, as these are mainly used in development or debugging, and a failure here will just be an error printed, it will never cause a broken repo. This is a simpler version of what i proposed at: https://mail.gnome.org/archives/ostree-list/2015-December/msg00024.html (it takes a lock during the whole transaction rather than retrying the transaction under a lock when commiting it.)
Created attachment 317353 [details] [review] prune: Add --non-blocking argument
Any chance we can get this in? I think we're starting to hit this at Endless where we're sharing /ostree/repo with flatpak. Our OS updater and flatpak might run concurrently, and only the OS updater is using the sysroot lock to handle concurrency.
Hi, any update on this? We've recently got more reports about this so it'd be nice to get this in soon. Please let me know if there's something we can do to accelerate the integration.
Sorry, this fell off the radar; haven't been tracking things via bugzilla. I rebased these patches, fixed some of the conflicts (although that needs auditing), and submitted a PR to github: https://github.com/ostreedev/ostree/pull/813 Let's move the conversation there, thanks!