After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 706456 - pull is still racy
pull is still racy
Product: ostree
Classification: Infrastructure
Component: general
Other Linux
: Normal normal
: ---
Assigned To: OSTree maintainer(s)
OSTree maintainer(s)
: 724115 (view as bug list)
Depends on:
Reported: 2013-08-21 01:56 UTC by Colin Walters
Modified: 2014-02-21 20:14 UTC
See Also:
GNOME target: ---
GNOME version: ---

race (14.34 KB, text/plain)
2014-01-19 23:17 UTC, Colin Walters
pull: Remove explicit threading (26.91 KB, patch)
2014-02-21 18:03 UTC, Colin Walters
none Details | Review

Description Colin Walters 2013-08-21 01:56:24 UTC
I haven't debugged this, but it appears to happen occasionally on the loaded build server.

The original commit was:

Then some follow up fixes which helped but didn't quite fix it:
Comment 1 Colin Walters 2014-01-19 23:17:45 UTC
Created attachment 266688 [details]

I caught an instance of this in action with newly added debugging prints.

We can see here that the problem is that there are pending content requests in flight from the metadata scan thread to the main thread, but the main thread has received idle status and chosen to exit.
Comment 2 Colin Walters 2014-01-20 08:44:06 UTC
Going to mark this one closed, will reopen if it reoccurs.
Comment 3 Colin Walters 2014-02-11 14:01:44 UTC
Sadly, still happening =(


$ ln -s blah /ostree/repo/transaction
$ rm /ostree/repo/refs/remotes -rf

Then do a pull again.
Comment 4 Colin Walters 2014-02-11 14:02:13 UTC
*** Bug 724115 has been marked as a duplicate of this bug. ***
Comment 5 Colin Walters 2014-02-21 18:03:52 UTC
Created attachment 269936 [details] [review]
pull: Remove explicit threading

Mixing async and threads has proved to be too much for my little mind.
It has race conditions that I've tried repeatedly to fix, but failed.

The threading here was scanning metadata objects - and there are
two parts to that:

1) Physically loading them from disk
2) Parsing them

Now #1 has been partially addressed by avoiding a storm of lstat() if
we're starting from a known working state.  If pull gets interrupted,
then we do need to rescan all objects.  Also, we can address this with
local metadata packfiles.

deep recursion though while we also have outstanding HTTP.

Anyways, let's move the needle back to reliability, and readd speed
more carefully.
Comment 6 Sjoerd Simons 2014-02-21 19:52:16 UTC
Looks ok to me, the one thing that did confuse me for a while is that the phase variable implicitely is set to OSTREE_PULL_PHASE_FETCHING_REFS (by assuming the first enum field will be 0).
Comment 7 Colin Walters 2014-02-21 20:14:51 UTC
Good point, fixed.  Anything to make this code clearer for the inevitable time when I have to debug it two years on and I've forgotten how it works...

Thanks for looking at this!