GNOME Bugzilla – Bug 757611
concurrent pulls can break due to shared tmp/
Last modified: 2015-12-14 11:34:03 UTC
<jhiesey> walters: it looks like most of the ostree admin functions use the sysroot lock for exclusive access, but what about other operations like ostree pull? <jhiesey> I can run multiple of those simultaneously and get errors about corrupted objects <jhiesey> is there any mechanism in ostree to prevent that, or is this expected behavior? <walters> mmm...yes, i think the only thing that's breaking that is the shared caching in repo/tmp <walters> for partial objects <walters> we could generate a random tmpdir for those per pull run, but that would lose out on resumes
To clarify on this bug, concurrent pulls were intended to be supported - basically during the pull process, we only download objects we don't have, and the repo object store supports multiple writers with first-one-wins semantics. I think it's just this HTTP caching layer. One thing that'd be slightly more elaborate is: - Scan for tmp/partial-objects-XXXXX/ - For each one, try to lock it (lockfile) - If we can't lock any, create a new one This way if you start a pull then ctrl-c, we'll correctly resume the next one. Truly concurrent pulls may do repeated downloads and won't share partial objects.
Created attachment 317210 [details] [review] Add glnx_steal_fd This is very useful in combination with glnx_close_fd
Created attachment 317211 [details] [review] Add glnx_mkdtempat Create a temporary directory using mkdirat.
Created attachment 317212 [details] [review] repo: Use per-transaction staging dir Concurrent pulls break since we're sharing the staging directory for all transactions in the repo. This makes us use a per-transaction directory. However, in order for resumes to work we first look for existing staging directories and try to aquire an exclusive lock for them. If we can't find any staging directory or they are all already locked, then we create a new one.
Review of attachment 317210 [details] [review]: OK.
Review of attachment 317211 [details] [review]: ::: glnx-dirfd.c @@ +235,3 @@ + * @error: Error + * + * Similar to g_mkdtemp_full, but using openat. Right. Though it looks like that code was based on an older version of glibc's glibc/sysdeps/posix/__gen_tempname(). Which looking at it has since evolved, e.g. on x86_64 it mixes in `rdtsc` via this hp-timing.h abstraction, and also the pid. I wonder if there's really any value in all of this versus just a `getrandom()`/read(/dev/urandom). Anyways...code looks OK for now, maybe at some point we can drive this into glib or glibc.
Well, it was based on get_tmp_file() in glib, it was just that glib didn't have the "at" version.
Attachment 317210 [details] pushed as 4e99699 - Add glnx_steal_fd Attachment 317211 [details] pushed as 0313864 - Add glnx_mkdtempat
Review of attachment 317212 [details] [review]: Hum...this doesn't fix the race for what is basically the "HTTP cache" that allows us to resume interrupted pulls of individual objects/delta parts etc., right? The bootid/staging directory is the second phase where we only put complete files, but not necessarily fsync'd. If you look at write_object() you'll see we make another temp file (basically uncompressing for content), and then in _ostree_repo_commit_loose_final() we just rename() them into staging. But any process writing into staging should only see complete files. However...if they both try to commit they may see ENOENT as one or the other wins rename(). So indeed we might need separate staging directories too, but it might be possible to sprinkle some if (ENOENT && object_exists ()) in rename_pending_loose_objects(). /* ok */ Anyways going back to the HTTP cache, the "find or create a temporary locked directory" code is likely going to need to be split into reusable functions so it can be shared with ostree-fetcher.c. Right?
Argh, you're right, we need separate directories for the downloading case too, I'll have a look at that. For the staging directories, I think the "if (ENOENT && object_exists ())" sounds not very robust. We can run into all sort of weird issues if the staging directory is shared. For instance you might be ready with one staging operation, then syncfs(), then start moving over the files to the repo, but at the same time someone adds in more un-synced files into the staging dir which gets moved into the repo.
(In reply to Alexander Larsson from comment #10) > For the staging directories, I think the "if (ENOENT && object_exists ())" > sounds not very robust. We can run into all sort of weird issues if the > staging directory is shared. For instance you might be ready with one > staging operation, then syncfs(), then start moving over the files to the > repo, but at the same time someone adds in more un-synced files into the > staging dir which gets moved into the repo. Yeah, good point. Separate dirs in both cases is probably sanest.
Created attachment 317225 [details] [review] repo: Add _ostree_repo_allocate_tmpdir helper This creates a subdirectory of the tmp dir with a selected prefix, and takes a lockfile to ensure that nobody else is using the same directory. However, if a directory with the same prefix already exists and is not locked that is used instead. The later is useful if you want to support some kind of resumed operation on the tmpdir. touch reused dirs
Created attachment 317226 [details] [review] repo: Use per-transaction staging dir Concurrent pulls break since we're sharing the staging directory for all transactions in the repo. This makes us use a per-transaction directory. However, in order for resumes to work we first look for existing staging directories and try to aquire an exclusive lock for them. If we can't find any staging directory or they are all already locked, then we create a new one.
Created attachment 317227 [details] [review] repo: Allocate a tmpdir for each OstreeFetcher to isolate concurrent downloads This way two pulls will not use the same tmpdir and accidentally overwrite each other. However, consecutive OstreeFetchers will reuse the tmpdirs, so that we can properly resume downloading large objects.
New versions with separate dirs for staging and fetching.
Review of attachment 317225 [details] [review]: LGTM. ::: src/libostree/ostree-repo.c @@ +4694,3 @@ + * remove it due to being old when cleaning up the tmpdir + */ + (void)futimens (existing_tmpdir_fd, NULL); Ooh yeah, that would be an interesting race.
Review of attachment 317226 [details] [review]: Nice and clean now.
Review of attachment 317227 [details] [review]: One really minor question, LGTM though as is. ::: src/libostree/ostree-fetcher.c @@ +193,3 @@ + + self->tmpdir_dfd = -1; + self->tmpdir_lock = empty_lockfile; Doesn't it work to just do `self->tmpdir_lock = GLNX_LOCK_FILE_INIT`? Why the extra indirection?
You can't use inline struct initialization like that: src/libostree/ostree-fetcher.c: In function '_ostree_fetcher_init': src/libostree/ostree-fetcher.c:195:23: error: expected expression before '{' token self->tmpdir_lock = GLNX_LOCK_FILE_INIT; ^
Attachment 317225 [details] pushed as be19c88 - repo: Add _ostree_repo_allocate_tmpdir helper Attachment 317226 [details] pushed as f771461 - repo: Use per-transaction staging dir Attachment 317227 [details] pushed as 96eed95 - repo: Allocate a tmpdir for each OstreeFetcher to isolate concurrent downloads