Bug 679439 – Single-threaded clutter programs fail on platforms which don't permit unlocking an already unlocked mutex

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 679439 - Single-threaded clutter programs fail on platforms which don't permit unlocking an already unlocked mutex


Summary:	Single-threaded clutter programs fail on platforms which don't permit unlocki...


Status:	RESOLVED FIXED

Product:	clutter
Classification:	Platform
Component:	general
Version:	1.10.x
Hardware:	Other Cygwin

Importance:	Normal normal
Target Milestone:	---
Assigned To:	clutter-maint
QA Contact:	clutter-maint

URL:
Whiteboard:

Duplicates:	684771 (view as bug list)
Depends on:
Blocks:

Reported:	2012-07-05 10:54 UTC by Jon Turney
Modified:	2012-10-05 16:45 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
main: Do not release the lock if it hasn't been acquired (2.21 KB, patch) 2012-09-26 08:54 UTC, Emmanuele Bassi (:ebassi)	committed	Details \| Review

Description Jon Turney 2012-07-05 10:54:45 UTC

Clutter documentation seems to say that it is safe for a single-threaded program to call clutter_main() without wrapping it in clutter_threads_enter() / clutter_threads_leave() [1], and many of the supplied test programs follow that pattern.

However, on a platform which performs strict checking on mutex operations, or when compiled with G_ERRORCHECK_MUTEXES, this is not the case.

When clutter_main() calls clutter_threads_leave(), if there has not been a preceding clutter_threads_enter() to claim the mutex, an error is reported for trying to unlock an already unlocked mutex.

e.g.
$ ./test-actor
[...]
GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_unlock': Invalid argument.  Aborting.

[1] The section "Threading Model" in http://developer.gnome.org/clutter/1.10/clutter-General.html

Comment 1 Emmanuele Bassi (:ebassi) 2012-07-05 13:51:55 UTC

to recap the discussion on IRC:

requirements of Clutter:

a) only one thread can initialize and use the GL context;
b) existing code in either the form of:

    clutter_init(...);
    ...
    clutter_main();

   or in the form of:

    clutter_init(...);
    clutter_threads_enter();
    ...
    clutter_main();
    clutter_threads_leave();

   have to continue working;
c) the clutter_threads_enter() and clutter_threads_leave() API has to continue working for developers writing X11-only applications, where said API can be used to denote critical sections inside a multi-threaded code;
d) on the other hand, clutter_threads_enter() and clutter_threads_leave() are not portable, and won't do anything on non-X11 platforms.

as a result of (a), the main loop inside Clutter needs to acquire a global lock during frame processing, i.e. for every iteration that can cause Clutter API to be called either internally or externally (cfr. the dispatch code inside clutter-master-clock.c). this means that everything that happens between clutter_init() and clutter_main() *has* to happen inside the same thread, if it involves Clutter API.

because of (b), we need to release any lock held between init() and main(); as soon as g_main_loop_run() — which is a blocking function — has been called, we need to be able to re-acquire the lock from within the Master Clock, in order to do the dispatch of frame-related code like animation advancement, event processing, and scene drawing. given that both code examples in (b) are valid, we don't know if the lock was being held by the user prior to calling clutter_main(); thus we unilaterally release the lock if any was held, so that we can call g_main_loop_run() outside the lock. if that did not happen, the code would deadlock immediately, as the user has no way to release the lock.

the (c) requirement of API and ABI compatibility makes it impossible for us to remove the implementation of clutter_threads_enter() and clutter_threads_leave() without effectively breaking existing apps, even if (d) is factored in.

we cannot implicitly call clutter_threads_enter() as part of clutter_init(), because clutter_init() may fail or may not be called at all, or the app developer could be calling clutter_threads_enter() right after clutter_init(), thus deadlocking the application.

this is not a trivial issue, and as such it won't probably be solved by a trivial patch; it's also a condition shared with GTK+, if gdk_threads_init() is called.

Comment 2 Gustau Pérez i Querol 2012-07-06 06:54:49 UTC

We talked this morning IRC with Emm.Bassi about this problem with the *BSD platforms. 

We at gnome@freebsd thought that it will be enough with something like:

     g_mutex_trylock
     g_mutex_unlock

at clutter_threads_impl_unlock. 

If you hold the mutex already, g_mutex_trylock will return with an error, so the mutex can be unlocked. If we don't hold the mutex, trylock will lock it so it could be unlocked. That seems to allow us to use clutter (the mx's examples seem to work).

Comment 3 Emmanuele Bassi (:ebassi) 2012-09-25 10:16:34 UTC

*** Bug 684771 has been marked as a duplicate of this bug. ***

Comment 4 Emmanuele Bassi (:ebassi) 2012-09-25 10:17:34 UTC

I think this ought to be fixed in GLib, since it's the source of the API we're using.

Comment 5 Dan Winship 2012-09-25 13:01:22 UTC

fixed in glib how?

Comment 6 Emmanuele Bassi (:ebassi) 2012-09-25 13:23:09 UTC

either g_mutex_unlock() fails on every platform (e.g. by setting strict policies) in which case we can code defensively around it (e.g. using trylock()+unlock()) everywhere, or by making sure that calling unlock() does not fail on all platforms.

I'm not sure, but I don't seem to recall GMutex failing for an unlock() without a lock() before the GThread refactoring either, so this could also be defined as a regression.

Comment 7 Dan Winship 2012-09-25 13:57:35 UTC

It's not a regression, it's always been that way. (Or at least, nothing changed in glib; it's possible that glibc's behavior changed.)

The glib docs say:

    Calling g_mutex_unlock() on a mutex that is not locked by the current
    thread leads to undefined behaviour.

so this is a clutter bug, even if it happens to work on Linux.

Bug 674822 suggests using ERRORCHECK mutexes by default (or at least making it easier to use them). Feel free to chime in there.

Comment 8 Jasper Lievisse Adriaanse 2012-09-25 15:21:15 UTC

Dan, are there any particular tests in the GLib test suite to run to confirm this issue is or is not a GLib problem?

Also for completeness' sake OpenBSD's PTHREAD_MUTEX_DEFAULT is set to PTHREAD_MUTEX_STRICT_NP (strict error checking).

Comment 9 Dan Winship 2012-09-25 16:03:04 UTC

You don't need a test case; the (current) definition of g_mutex_unlock() says that calling it on a non-locked mutex causes undefined behavior, so it is not even logically possible for there to be a bug; no matter what happens, it is behaving as documented.

Comment 10 Emmanuele Bassi (:ebassi) 2012-09-26 08:54:33 UTC

Created attachment 225194 [details] [review]
main: Do not release the lock if it hasn't been acquired

On various systems, trying to release a mutex that hasn't been acquired
will result in a run-time error.

In order to avoid this, we trylock() the Big Clutter Lock™ and
immediately unlock() it, regardless of the result; if the lock was
already acquired, trylock() will immediately fail, and we can release
it; if the lock was not acquired, trylock() will succeed, and we can
release the lock immediately.

This is necessary to maintain binary compatibility and invariants for
Clutter applications doing:

  clutter_init()
  clutter_threads_enter()
  ...
  clutter_main()
  ...
  clutter_threads_leave()

instead of the correct:

  clutter_init()
  clutter_threads_enter()
  ...
  clutter_threads_leave()
  clutter_main()
  clutter_threads_enter()
  ...
  clutter_threads_leave()

With Clutter ≥ 1.12, the idiomatic form is:

  clutter_init()
  ...
  clutter_main()

given that the public Big Clutter Lock™ acquire/release API has been
deprecated, and nobody should take the lock outside of Clutter itself.

Comment 11 Jasper Lievisse Adriaanse 2012-09-27 11:29:26 UTC

With this patch I can run all the examples and toys without them crashing on OpenBSD.

Comment 12 Emmanuele Bassi (:ebassi) 2012-10-05 16:36:27 UTC

Attachment 225194 [details] pushed as 0da0e51 - main: Do not release the lock if it hasn't been acquired

Comment 13 Jasper Lievisse Adriaanse 2012-10-05 16:45:11 UTC

Thanks a lot.