Bug 705152 – Race in glib/task.test

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 705152 - Race in glib/task.test


Summary:	Race in glib/task.test


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	gio
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-07-30 14:57 UTC by Colin Walters
Modified:	2013-08-15 22:22 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
gio/tests/task: fix a race condition in test_run_in_thread() (1.93 KB, patch) 2013-08-15 15:36 UTC, Dan Winship	reviewed	Details \| Review
gio/tests/task: fix a race condition in test_run_in_thread() (4.15 KB, patch) 2013-08-15 16:46 UTC, Dan Winship	committed	Details \| Review
test program (445 bytes, text/plain) 2013-08-15 20:52 UTC, Dan Winship		Details

Description Colin Walters 2013-07-30 14:57:16 UTC

"GLib-GIO:ERROR:../../../gio/tests/task.c:719:test_run_in_thread: assertion failed: (task == NULL)

The race is fairly obvious; mainloop quits before the object gets finalized in the other thread.

We need a condition variable here, or some other form of synchronization.

Comment 1 Dan Winship 2013-08-15 15:36:59 UTC

Created attachment 251748 [details] [review]
gio/tests/task: fix a race condition in test_run_in_thread()

====

A condition variable won't work because the ref is being held by GTask
iternals that we don't have access to from here. But we can wait for
the thread to drop its ref rather than just asserting that it already
has.

Comment 2 Colin Walters 2013-08-15 15:41:22 UTC

Review of attachment 251748 [details] [review]:

::: gio/tests/task.c
@@ +723,3 @@
+   */
+  while (task != NULL)
+    g_usleep (100);

Ugly =/

If you're doing stuff like this, I think "task" should be declared volatile.

What about using g_object_weak_ref() and a condition variable?

Comment 3 Dan Winship 2013-08-15 16:46:00 UTC

Created attachment 251751 [details] [review]
gio/tests/task: fix a race condition in test_run_in_thread()

When running a task in a thread, GTask may still be internally holding
a ref on the task in that thread even after the callback is called in
the original thread (depending on thread scheduling). Fix the test to
handle that by using a weak notify that signals a GCond, and wait for
that GCond from the main thread. (And add a corresponding check to
test_return_on_cancel().)

Comment 4 Colin Walters 2013-08-15 16:57:15 UTC

Review of attachment 251751 [details] [review]:

Looks a lot better, thanks!  Just one thing, but feel free to keep as is too:

::: gio/tests/task.c
@@ +1069,3 @@
   GCancellable *cancellable;
   volatile ThreadState thread_state;
+  volatile gboolean weak_notify_ran = FALSE;

Now conversely, I don't see a need for this to be volatile.  We have one read outside of a mutex/memory barrier, but that's fine sine we're asserting it's not written yet.

Comment 5 Dan Winship 2013-08-15 17:48:45 UTC

(In reply to comment #4)
> Now conversely, I don't see a need for this to be volatile.  We have one read
> outside of a mutex/memory barrier, but that's fine since we're asserting it's
> not written yet.

That doesn't follow; if weak_notify_ran is not volatile, then gcc might store it in a register, in which case the memory barrier is irrelevant because the code in the loop would only be looking at the value in the register, not the value in memory.

(Except... actually, we *don't* need volatile here, because gcc can't prove that g_cond_wait() doesn't have access to the pointer to weak_notify_ran that we passed into g_object_weak_ref() earlier, so it is forced to reload weak_notify_ran from memory after each loop iteration. But this was true of "task" in the original patch too. [Although that patch didn't have a memory barrier, so it was wrong anyway.] Anyway, this feels like a technicality, and I think I'm happier keeping the variable volatile.)

Comment 6 Dan Winship 2013-08-15 17:50:16 UTC

Attachment 251751 [details] pushed as a93d373 - gio/tests/task: fix a race condition in test_run_in_thread()

Comment 7 Colin Walters 2013-08-15 18:31:45 UTC

(In reply to comment #5)
> (In reply to comment #4)
> > Now conversely, I don't see a need for this to be volatile.  We have one read
> > outside of a mutex/memory barrier, but that's fine since we're asserting it's
> > not written yet.
> 
> That doesn't follow; if weak_notify_ran is not volatile, then gcc might store
> it in a register, in which case the memory barrier is irrelevant because the
> code in the loop would only be looking at the value in the register, not the
> value in memory.

I'm not a guru in this, but my understanding is that you don't need to declare variables read/written while holding a mutex as volatile.  The memory barrier will force a flush from the register to memory (and cache synchronization).

Comment 8 Dan Winship 2013-08-15 20:50:38 UTC

("Concurrency is hard; let's go shopping!")

(In reply to comment #7)
> I'm not a guru in this, but my understanding is that you don't need to declare
> variables read/written while holding a mutex as volatile.  The memory barrier
> will force a flush from the register to memory (and cache synchronization).

The last sentence is wrong. The memory barrier instruction doesn't know what registers the compiler is using to store what variables, and the compiler doesn't know that calling g_mutex_lock() is going to result in a memory barrier.

However, it seems like unless you're trying really hard to write perverse code, the compiler will always be able to tell that a piece of shared data is potentially shared (as with weak_notify_ran above), and so any (non-inlinable, non-const/pure) function call will result in it being forced to reload the data in case the function call changed it. So if you lock a mutex around reading a shared variable, then the variable doesn't need to be volatile... because g_mutex_lock() is a function call. Wacky.

Comment 9 Dan Winship 2013-08-15 20:52:28 UTC

Created attachment 251773 [details]
test program

compiled with -O0, this exits immediately, compiled with -O3, it hangs forever, unless you uncomment either of the two commented-out lines.

Comment 10 Colin Walters 2013-08-15 22:22:52 UTC

(In reply to comment #8)
> ("Concurrency is hard; let's go shopping!")
> 
> (In reply to comment #7)
> > I'm not a guru in this, but my understanding is that you don't need to declare
> > variables read/written while holding a mutex as volatile.  The memory barrier
> > will force a flush from the register to memory (and cache synchronization).
> 
> The last sentence is wrong. The memory barrier instruction doesn't know what
> registers the compiler is using to store what variables, and the compiler
> doesn't know that calling g_mutex_lock() is going to result in a memory
> barrier.

http://stackoverflow.com/questions/3208060/does-guarding-a-variable-with-a-pthread-mutex-guarantee-its-also-not-cached
http://stackoverflow.com/questions/10997636/clarifications-on-full-memory-barriers-involved-by-pthread-mutexes
http://stackoverflow.com/questions/6837699/are-mutex-lock-functions-sufficient-without-volatile

is useful discussion on the topic.  In particular in the last link:

  "As I said, pthread_mutex_*() includes a compiler barrier, e.g: on glibc/x86 pthread_mutex_lock() ends up calling the macro lll_lock(), which has a memory clobber, forcing the compiler to reload variables."

However...in glib we're wrapping pthread_mutex_lock so the memory barrier isn't inserted into the current compilation unit =(  This seems like a bug to me; we need to do the same thing glibc does.