GNOME Bugzilla – Bug 733500
Vala's handling of GMutex/GCond structs causes deadlocks
Last modified: 2014-09-07 02:28:14 UTC
After uploading 2.41.2 to Utopic an automatically triggered package test started failing with a timeout. https://jenkins.qa.ubuntu.com/view/Utopic/view/AutoPkgTest/job/utopic-adt-shotwell/65/ARCH=amd64,label=adt/console This is far from a minimal testcase - this testcases uses the autopilot functional testing framework to poke at Shotwell, and autopilot uses D-Bus to communicate with the application it's testing. GLib's testsuite itself passes. Can walk you through reproducing this failure if you want. strace log attached. I bisected 2.41.1 to 2.41.2 and the offending commit is 49b59e5ac4428a6a99a85d699c3662f96efc4e9d: GLib: implement GMutex natively on Linux.
Created attachment 281307 [details] strace -f -tt -s1024
Created attachment 281478 [details] trace
Without autopilot, you can reproduce this using Shotwell: - Plug in a camera, or use something like umockdev to emulate one - Open Shotwell - Click the camera - Click a picture - "Import selected" Shotwell has its own semaphore implementation, and it is hanging inside here.
Can you attach a backtrace with all threads? Thanks.
Created attachment 281483 [details] thread apply all bt full Sure, here you go.
Thanks for that. Confirms my suspicion that this might have had something to do with the condition variables rather than the locks...
Behold, generated vala code: void abstract_semaphore_wait (AbstractSemaphore* self) { g_return_if_fail (self != NULL); g_mutex_lock (&self->priv->mutex); while (TRUE) { AbstractSemaphoreWaitAction _tmp0_ = 0; GMutex _tmp1_ = {0}; _tmp0_ = abstract_semaphore_do_wait (self); if (!(_tmp0_ == ABSTRACT_SEMAPHORE_WAIT_ACTION_SLEEP)) { break; } _tmp1_ = self->priv->mutex; g_cond_wait (&self->priv->monitor, &_tmp1_); } g_mutex_unlock (&self->priv->mutex); }
more verbosely: vala is doing a by-value copy of the mutex and unlocking the copy rather than the original one. When GMutex was implemented via pthreads it contained only a pointer, so this was OK (since it was still a pointer to the same thing). Now that we put state directly into the struct, this sort of behaviour is causing problems...
It looks like a duplicate of 690686.
No it's the same as the weakref bug that we did fix. The fix is to simply add lvalue_access = false to Mutex and Cond structs.
commit 7616e0339c2099243acaba7dd3cc47210db97bdd Author: Luca Bruno <lucabru@src.gnome.org> Date: Sun Aug 3 20:42:09 2014 +0200 Add lvalue_access = false to Mutex and Cond Fixes bug 733500 Please reopen if needed.
*** Bug 734262 has been marked as a duplicate of this bug. ***