GNOME Bugzilla – Bug 705152
Race in glib/task.test
Last modified: 2013-08-15 22:22:52 UTC
"GLib-GIO:ERROR:../../../gio/tests/task.c:719:test_run_in_thread: assertion failed: (task == NULL) The race is fairly obvious; mainloop quits before the object gets finalized in the other thread. We need a condition variable here, or some other form of synchronization.
Created attachment 251748 [details] [review] gio/tests/task: fix a race condition in test_run_in_thread() ==== A condition variable won't work because the ref is being held by GTask iternals that we don't have access to from here. But we can wait for the thread to drop its ref rather than just asserting that it already has.
Review of attachment 251748 [details] [review]: ::: gio/tests/task.c @@ +723,3 @@ + */ + while (task != NULL) + g_usleep (100); Ugly =/ If you're doing stuff like this, I think "task" should be declared volatile. What about using g_object_weak_ref() and a condition variable?
Created attachment 251751 [details] [review] gio/tests/task: fix a race condition in test_run_in_thread() When running a task in a thread, GTask may still be internally holding a ref on the task in that thread even after the callback is called in the original thread (depending on thread scheduling). Fix the test to handle that by using a weak notify that signals a GCond, and wait for that GCond from the main thread. (And add a corresponding check to test_return_on_cancel().)
Review of attachment 251751 [details] [review]: Looks a lot better, thanks! Just one thing, but feel free to keep as is too: ::: gio/tests/task.c @@ +1069,3 @@ GCancellable *cancellable; volatile ThreadState thread_state; + volatile gboolean weak_notify_ran = FALSE; Now conversely, I don't see a need for this to be volatile. We have one read outside of a mutex/memory barrier, but that's fine sine we're asserting it's not written yet.
(In reply to comment #4) > Now conversely, I don't see a need for this to be volatile. We have one read > outside of a mutex/memory barrier, but that's fine since we're asserting it's > not written yet. That doesn't follow; if weak_notify_ran is not volatile, then gcc might store it in a register, in which case the memory barrier is irrelevant because the code in the loop would only be looking at the value in the register, not the value in memory. (Except... actually, we *don't* need volatile here, because gcc can't prove that g_cond_wait() doesn't have access to the pointer to weak_notify_ran that we passed into g_object_weak_ref() earlier, so it is forced to reload weak_notify_ran from memory after each loop iteration. But this was true of "task" in the original patch too. [Although that patch didn't have a memory barrier, so it was wrong anyway.] Anyway, this feels like a technicality, and I think I'm happier keeping the variable volatile.)
Attachment 251751 [details] pushed as a93d373 - gio/tests/task: fix a race condition in test_run_in_thread()
(In reply to comment #5) > (In reply to comment #4) > > Now conversely, I don't see a need for this to be volatile. We have one read > > outside of a mutex/memory barrier, but that's fine since we're asserting it's > > not written yet. > > That doesn't follow; if weak_notify_ran is not volatile, then gcc might store > it in a register, in which case the memory barrier is irrelevant because the > code in the loop would only be looking at the value in the register, not the > value in memory. I'm not a guru in this, but my understanding is that you don't need to declare variables read/written while holding a mutex as volatile. The memory barrier will force a flush from the register to memory (and cache synchronization).
("Concurrency is hard; let's go shopping!") (In reply to comment #7) > I'm not a guru in this, but my understanding is that you don't need to declare > variables read/written while holding a mutex as volatile. The memory barrier > will force a flush from the register to memory (and cache synchronization). The last sentence is wrong. The memory barrier instruction doesn't know what registers the compiler is using to store what variables, and the compiler doesn't know that calling g_mutex_lock() is going to result in a memory barrier. However, it seems like unless you're trying really hard to write perverse code, the compiler will always be able to tell that a piece of shared data is potentially shared (as with weak_notify_ran above), and so any (non-inlinable, non-const/pure) function call will result in it being forced to reload the data in case the function call changed it. So if you lock a mutex around reading a shared variable, then the variable doesn't need to be volatile... because g_mutex_lock() is a function call. Wacky.
Created attachment 251773 [details] test program compiled with -O0, this exits immediately, compiled with -O3, it hangs forever, unless you uncomment either of the two commented-out lines.
(In reply to comment #8) > ("Concurrency is hard; let's go shopping!") > > (In reply to comment #7) > > I'm not a guru in this, but my understanding is that you don't need to declare > > variables read/written while holding a mutex as volatile. The memory barrier > > will force a flush from the register to memory (and cache synchronization). > > The last sentence is wrong. The memory barrier instruction doesn't know what > registers the compiler is using to store what variables, and the compiler > doesn't know that calling g_mutex_lock() is going to result in a memory > barrier. http://stackoverflow.com/questions/3208060/does-guarding-a-variable-with-a-pthread-mutex-guarantee-its-also-not-cached http://stackoverflow.com/questions/10997636/clarifications-on-full-memory-barriers-involved-by-pthread-mutexes http://stackoverflow.com/questions/6837699/are-mutex-lock-functions-sufficient-without-volatile is useful discussion on the topic. In particular in the last link: "As I said, pthread_mutex_*() includes a compiler barrier, e.g: on glibc/x86 pthread_mutex_lock() ends up calling the macro lll_lock(), which has a memory clobber, forcing the compiler to reload variables." However...in glib we're wrapping pthread_mutex_lock so the memory barrier isn't inserted into the current compilation unit =( This seems like a bug to me; we need to do the same thing glibc does.