Bug 757611 – concurrent pulls can break due to shared tmp/

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 757611 - concurrent pulls can break due to shared tmp/


Summary:	concurrent pulls can break due to shared tmp/


Status:	RESOLVED FIXED

Product:	ostree
Classification:	Infrastructure
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	OSTree maintainer(s)
QA Contact:	OSTree maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-11-04 21:56 UTC by Colin Walters
Modified:	2015-12-14 11:34 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Add glnx_steal_fd (752 bytes, patch) 2015-12-11 14:57 UTC, Alexander Larsson	committed	Details \| Review
Add glnx_mkdtempat (3.16 KB, patch) 2015-12-11 14:58 UTC, Alexander Larsson	committed	Details \| Review
repo: Use per-transaction staging dir (8.78 KB, patch) 2015-12-11 14:58 UTC, Alexander Larsson	reviewed	Details \| Review
repo: Add _ostree_repo_allocate_tmpdir helper (6.45 KB, patch) 2015-12-11 18:35 UTC, Alexander Larsson	accepted-commit_now	Details \| Review
repo: Use per-transaction staging dir (5.66 KB, patch) 2015-12-11 18:35 UTC, Alexander Larsson	accepted-commit_now	Details \| Review
repo: Allocate a tmpdir for each OstreeFetcher to isolate concurrent downloads (11.01 KB, patch) 2015-12-11 18:36 UTC, Alexander Larsson	accepted-commit_now	Details \| Review

Description Colin Walters 2015-11-04 21:56:21 UTC

<jhiesey> walters: it looks like most of the ostree admin functions use the sysroot lock for exclusive access, but what about other operations like ostree pull?
<jhiesey> I can run multiple of those simultaneously and get errors about corrupted objects
<jhiesey> is there any mechanism in ostree to prevent that, or is this expected behavior?
<walters> mmm...yes, i think the only thing that's breaking that is the shared caching in repo/tmp
<walters> for partial objects
<walters> we could generate a random tmpdir for those per pull run, but that would lose out on resumes

Comment 1 Colin Walters 2015-11-05 15:29:44 UTC

To clarify on this bug, concurrent pulls were intended to be supported - basically during the pull process, we only download objects we don't have, and the repo object store supports multiple writers with first-one-wins semantics.

I think it's just this HTTP caching layer.  One thing that'd be slightly more elaborate is:

 - Scan for tmp/partial-objects-XXXXX/
 - For each one, try to lock it (lockfile)
 - If we can't lock any, create a new one
 
This way if you start a pull then ctrl-c, we'll correctly resume the next one.  Truly concurrent pulls may do repeated downloads and won't share partial objects.

Comment 2 Alexander Larsson 2015-12-11 14:57:55 UTC

Created attachment 317210 [details] [review]
Add glnx_steal_fd

This is very useful in combination with glnx_close_fd

Comment 3 Alexander Larsson 2015-12-11 14:58:09 UTC

Created attachment 317211 [details] [review]
Add glnx_mkdtempat

Create a temporary directory using mkdirat.

Comment 4 Alexander Larsson 2015-12-11 14:58:36 UTC

Created attachment 317212 [details] [review]
repo: Use per-transaction staging dir

Concurrent pulls break since we're sharing the staging directory for
all transactions in the repo. This makes us use a per-transaction directory.

However, in order for resumes to work we first look for existing
staging directories and try to aquire an exclusive lock for them. If
we can't find any staging directory or they are all already locked,
then we create a new one.

Comment 5 Colin Walters 2015-12-11 14:59:48 UTC

Review of attachment 317210 [details] [review]:

OK.

Comment 6 Colin Walters 2015-12-11 15:17:20 UTC

Review of attachment 317211 [details] [review]:

::: glnx-dirfd.c
@@ +235,3 @@
+ * @error: Error
+ *
+ * Similar to g_mkdtemp_full, but using openat.

Right.  Though it looks like that code was
based on an older version of glibc's
glibc/sysdeps/posix/__gen_tempname().


Which looking at it has since evolved, e.g. on
x86_64 it mixes in `rdtsc` via
this hp-timing.h abstraction,
and also the pid.

I wonder if there's really any value in all
of this versus just a `getrandom()`/read(/dev/urandom).

Anyways...code looks OK for now, maybe
at some point we can drive this into glib or glibc.

Comment 7 Alexander Larsson 2015-12-11 15:20:56 UTC

Well, it was based on get_tmp_file() in glib, it was just that glib didn't have the "at" version.

Comment 8 Alexander Larsson 2015-12-11 15:23:10 UTC

Attachment 317210 [details] pushed as 4e99699 - Add glnx_steal_fd
Attachment 317211 [details] pushed as 0313864 - Add glnx_mkdtempat

Comment 9 Colin Walters 2015-12-11 16:12:16 UTC

Review of attachment 317212 [details] [review]:

Hum...this doesn't fix the race for what is basically the "HTTP cache" that allows us to resume interrupted pulls of individual objects/delta parts etc., right?

The bootid/staging directory is the second phase where we only put complete files, but not necessarily fsync'd.  If you
look at write_object() you'll see we make another temp file (basically uncompressing for content), and then in
_ostree_repo_commit_loose_final() we just rename() them into staging.

But any process writing into staging should only see complete files.  However...if they both try to commit
they may see ENOENT as one or the other wins rename().

So indeed we might need separate staging directories too, but it might be possible to sprinkle some
if (ENOENT && object_exists ()) in rename_pending_loose_objects().
  /* ok */

Anyways going back to the HTTP cache, the "find or create a temporary locked directory" code is likely going to need to be
split into reusable functions so it can be shared with ostree-fetcher.c.

Right?

Comment 10 Alexander Larsson 2015-12-11 16:40:30 UTC

Argh, you're right, we need separate directories for the downloading case too, I'll have a look at that.

For the staging directories, I think the "if (ENOENT && object_exists ())" sounds not very robust. We can run into all sort of weird issues if the staging directory is shared. For instance you might be ready with one staging operation, then syncfs(), then start moving over the files to the repo, but at the same time someone adds in more un-synced files into the staging dir which gets moved into the repo.

Comment 11 Colin Walters 2015-12-11 16:55:33 UTC

(In reply to Alexander Larsson from comment #10)

> For the staging directories, I think the "if (ENOENT && object_exists ())"
> sounds not very robust. We can run into all sort of weird issues if the
> staging directory is shared. For instance you might be ready with one
> staging operation, then syncfs(), then start moving over the files to the
> repo, but at the same time someone adds in more un-synced files into the
> staging dir which gets moved into the repo.

Yeah, good point.  Separate dirs in both cases is probably sanest.

Comment 12 Alexander Larsson 2015-12-11 18:35:17 UTC

Created attachment 317225 [details] [review]
repo: Add _ostree_repo_allocate_tmpdir helper

This creates a subdirectory of the tmp dir with a selected prefix,
and takes a lockfile to ensure that nobody else is using the same directory.
However, if a directory with the same prefix already exists and is
not locked that is used instead.

The later is useful if you want to support some kind of resumed operation
on the tmpdir.

touch reused dirs

Comment 13 Alexander Larsson 2015-12-11 18:35:56 UTC

Created attachment 317226 [details] [review]
repo: Use per-transaction staging dir

Concurrent pulls break since we're sharing the staging directory for
all transactions in the repo. This makes us use a per-transaction directory.

However, in order for resumes to work we first look for existing
staging directories and try to aquire an exclusive lock for them. If
we can't find any staging directory or they are all already locked,
then we create a new one.

Comment 14 Alexander Larsson 2015-12-11 18:36:20 UTC

Created attachment 317227 [details] [review]
repo: Allocate a tmpdir for each OstreeFetcher to isolate concurrent downloads

This way two pulls will not use the same tmpdir and accidentally
overwrite each other. However, consecutive OstreeFetchers will reuse
the tmpdirs, so that we can properly resume downloading large objects.

Comment 15 Alexander Larsson 2015-12-11 18:37:36 UTC

New versions with separate dirs for staging and fetching.

Comment 16 Colin Walters 2015-12-13 23:17:12 UTC

Review of attachment 317225 [details] [review]:

LGTM.

::: src/libostree/ostree-repo.c
@@ +4694,3 @@
+       *   remove it due to being old when cleaning up the tmpdir
+       */
+      (void)futimens (existing_tmpdir_fd, NULL);

Ooh yeah, that would be an interesting race.

Comment 17 Colin Walters 2015-12-13 23:17:14 UTC

Review of attachment 317225 [details] [review]:

LGTM.

::: src/libostree/ostree-repo.c
@@ +4694,3 @@
+       *   remove it due to being old when cleaning up the tmpdir
+       */
+      (void)futimens (existing_tmpdir_fd, NULL);

Ooh yeah, that would be an interesting race.

Comment 18 Colin Walters 2015-12-13 23:18:39 UTC

Review of attachment 317226 [details] [review]:

Nice and clean now.

Comment 19 Colin Walters 2015-12-13 23:22:01 UTC

Review of attachment 317227 [details] [review]:

One really minor question, LGTM though as is.

::: src/libostree/ostree-fetcher.c
@@ +193,3 @@
+
+  self->tmpdir_dfd = -1;
+  self->tmpdir_lock = empty_lockfile;

Doesn't it work to just do

`self->tmpdir_lock = GLNX_LOCK_FILE_INIT`?

Why the extra indirection?

Comment 20 Alexander Larsson 2015-12-14 07:33:11 UTC

You can't use inline struct initialization like that:

src/libostree/ostree-fetcher.c: In function '_ostree_fetcher_init':
src/libostree/ostree-fetcher.c:195:23: error: expected expression before '{' token
   self->tmpdir_lock = GLNX_LOCK_FILE_INIT;
                       ^

Comment 21 Alexander Larsson 2015-12-14 08:05:49 UTC

Attachment 317225 [details] pushed as be19c88 - repo: Add _ostree_repo_allocate_tmpdir helper
Attachment 317226 [details] pushed as f771461 - repo: Use per-transaction staging dir
Attachment 317227 [details] pushed as 96eed95 - repo: Allocate a tmpdir for each OstreeFetcher to isolate concurrent downloads