GNOME Bugzilla – Bug 679439
Single-threaded clutter programs fail on platforms which don't permit unlocking an already unlocked mutex
Last modified: 2012-10-05 16:45:11 UTC
Clutter documentation seems to say that it is safe for a single-threaded program to call clutter_main() without wrapping it in clutter_threads_enter() / clutter_threads_leave() [1], and many of the supplied test programs follow that pattern. However, on a platform which performs strict checking on mutex operations, or when compiled with G_ERRORCHECK_MUTEXES, this is not the case. When clutter_main() calls clutter_threads_leave(), if there has not been a preceding clutter_threads_enter() to claim the mutex, an error is reported for trying to unlock an already unlocked mutex. e.g. $ ./test-actor [...] GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_unlock': Invalid argument. Aborting. [1] The section "Threading Model" in http://developer.gnome.org/clutter/1.10/clutter-General.html
to recap the discussion on IRC: requirements of Clutter: a) only one thread can initialize and use the GL context; b) existing code in either the form of: clutter_init(...); ... clutter_main(); or in the form of: clutter_init(...); clutter_threads_enter(); ... clutter_main(); clutter_threads_leave(); have to continue working; c) the clutter_threads_enter() and clutter_threads_leave() API has to continue working for developers writing X11-only applications, where said API can be used to denote critical sections inside a multi-threaded code; d) on the other hand, clutter_threads_enter() and clutter_threads_leave() are not portable, and won't do anything on non-X11 platforms. as a result of (a), the main loop inside Clutter needs to acquire a global lock during frame processing, i.e. for every iteration that can cause Clutter API to be called either internally or externally (cfr. the dispatch code inside clutter-master-clock.c). this means that everything that happens between clutter_init() and clutter_main() *has* to happen inside the same thread, if it involves Clutter API. because of (b), we need to release any lock held between init() and main(); as soon as g_main_loop_run() — which is a blocking function — has been called, we need to be able to re-acquire the lock from within the Master Clock, in order to do the dispatch of frame-related code like animation advancement, event processing, and scene drawing. given that both code examples in (b) are valid, we don't know if the lock was being held by the user prior to calling clutter_main(); thus we unilaterally release the lock if any was held, so that we can call g_main_loop_run() outside the lock. if that did not happen, the code would deadlock immediately, as the user has no way to release the lock. the (c) requirement of API and ABI compatibility makes it impossible for us to remove the implementation of clutter_threads_enter() and clutter_threads_leave() without effectively breaking existing apps, even if (d) is factored in. we cannot implicitly call clutter_threads_enter() as part of clutter_init(), because clutter_init() may fail or may not be called at all, or the app developer could be calling clutter_threads_enter() right after clutter_init(), thus deadlocking the application. this is not a trivial issue, and as such it won't probably be solved by a trivial patch; it's also a condition shared with GTK+, if gdk_threads_init() is called.
We talked this morning IRC with Emm.Bassi about this problem with the *BSD platforms. We at gnome@freebsd thought that it will be enough with something like: g_mutex_trylock g_mutex_unlock at clutter_threads_impl_unlock. If you hold the mutex already, g_mutex_trylock will return with an error, so the mutex can be unlocked. If we don't hold the mutex, trylock will lock it so it could be unlocked. That seems to allow us to use clutter (the mx's examples seem to work).
*** Bug 684771 has been marked as a duplicate of this bug. ***
I think this ought to be fixed in GLib, since it's the source of the API we're using.
fixed in glib how?
either g_mutex_unlock() fails on every platform (e.g. by setting strict policies) in which case we can code defensively around it (e.g. using trylock()+unlock()) everywhere, or by making sure that calling unlock() does not fail on all platforms. I'm not sure, but I don't seem to recall GMutex failing for an unlock() without a lock() before the GThread refactoring either, so this could also be defined as a regression.
It's not a regression, it's always been that way. (Or at least, nothing changed in glib; it's possible that glibc's behavior changed.) The glib docs say: Calling g_mutex_unlock() on a mutex that is not locked by the current thread leads to undefined behaviour. so this is a clutter bug, even if it happens to work on Linux. Bug 674822 suggests using ERRORCHECK mutexes by default (or at least making it easier to use them). Feel free to chime in there.
Dan, are there any particular tests in the GLib test suite to run to confirm this issue is or is not a GLib problem? Also for completeness' sake OpenBSD's PTHREAD_MUTEX_DEFAULT is set to PTHREAD_MUTEX_STRICT_NP (strict error checking).
You don't need a test case; the (current) definition of g_mutex_unlock() says that calling it on a non-locked mutex causes undefined behavior, so it is not even logically possible for there to be a bug; no matter what happens, it is behaving as documented.
Created attachment 225194 [details] [review] main: Do not release the lock if it hasn't been acquired On various systems, trying to release a mutex that hasn't been acquired will result in a run-time error. In order to avoid this, we trylock() the Big Clutter Lock™ and immediately unlock() it, regardless of the result; if the lock was already acquired, trylock() will immediately fail, and we can release it; if the lock was not acquired, trylock() will succeed, and we can release the lock immediately. This is necessary to maintain binary compatibility and invariants for Clutter applications doing: clutter_init() clutter_threads_enter() ... clutter_main() ... clutter_threads_leave() instead of the correct: clutter_init() clutter_threads_enter() ... clutter_threads_leave() clutter_main() clutter_threads_enter() ... clutter_threads_leave() With Clutter ≥ 1.12, the idiomatic form is: clutter_init() ... clutter_main() given that the public Big Clutter Lock™ acquire/release API has been deprecated, and nobody should take the lock outside of Clutter itself.
With this patch I can run all the examples and toys without them crashing on OpenBSD.
Attachment 225194 [details] pushed as 0da0e51 - main: Do not release the lock if it hasn't been acquired
Thanks a lot.