After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 735428 - [regression] gtk_main() tries to unlock an unlocked mutex
[regression] gtk_main() tries to unlock an unlocked mutex
Status: RESOLVED FIXED
Product: gtk+
Classification: Platform
Component: .General
2.24.x
Other Linux
: Normal major
: ---
Assigned To: gtk-bugs
gtk-bugs
Depends on:
Blocks:
 
 
Reported: 2014-08-26 05:49 UTC by Andreas Eibach
Modified: 2014-08-30 23:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
threads: Do not release the GDK lock if it hasn't been acquired yet (2.43 KB, patch)
2014-08-26 11:12 UTC, Emmanuele Bassi (:ebassi)
committed Details | Review
threads: Do not release the GDK lock if it hasn't been acquired yet (2.28 KB, patch)
2014-08-26 11:16 UTC, Emmanuele Bassi (:ebassi)
committed Details | Review

Description Andreas Eibach 2014-08-26 05:49:08 UTC
This happened to me with deadbeef (audio player) and here is the same with psensor:

https://bugs.launchpad.net/ubuntu/+source/psensor/+bug/1346299

"Attempt to unlock mutex that was not locked" is the most important thing here. I got the very same when launching deadbeef from console.

This was originally built under glib < 2.41, and failed once I updated to 2.41 yesterday.

Fortunately, the developer on Launchpad had the right idea, and I could fix it, which, however, was not accepted by deadbeef's main programmer.

It's indeed this line:
------
gdk_threads_init()
------

He let me know that this line must be in for older glib/gtk versions.
Whilst it's theoretically possible to "#ifdef out" the line for glib >= 2.41, he didn't seem too enthusiastic about that.
He pointed out that this is an obvious regression and that builds created under glib < 2.41 still have to work under new 2.41 no matter what.

At least removing that line was a welcome "fix" for me to be able to use deadbeef at all, since having this line active would have made deadbeef unavailable for the time being.
Comment 1 Alexey Yakovenko 2014-08-26 08:39:39 UTC
Hello, dear GLIB developers.

I'm a developer of deadbeef player, mentioned in the summary.

This bug is a duplicate of #735141, but I would like to explain the situation, before the bug is closed.

Your fix prevents any build of my project made in the past 5 years from running. Which means you broke the binary backwards compatibility. App developers can't change the binary builds which have already been released. My code doesn't contain anything criminal, and the crash happens deep inside of gtk_main, both in GTK2 and GTK3.

As many users have reported, the crash goes away, if gdk_threads_init() call is removed from my initialization function. So, if this function should not be called -- maybe GTK devs should make it NOP.

I need to repeat this again: changing the already released builds is NOT AN OPTION. You should not be breaking binary compatibility like that.

My use of the gdk_threads_init is in conformance with GTK documentation, and is called before gtk_init.
Comment 2 Alexey Yakovenko 2014-08-26 08:44:32 UTC
@Andreas: Adding #ifdef is not an option, because it's compile time option. Binary builds are built against GTK2.12 and GTK3.0.0, where this function was not declared deprecated.
Comment 3 Andreas Eibach 2014-08-26 09:24:35 UTC
Erm I was referring to future builds, of course. git of today and later. So if this tree is built henceforth, AND the #ifdef is in, it could be determined which glib is installed is in the system and reacted appropriately.

I'm aware that this won't fix the builds that exist already, of course; only those to come/to build.
Comment 4 Andreas Eibach 2014-08-26 09:32:30 UTC
@Alexey ... well this does not look good at all.
bug 735141 is marked NOTABUG, and this will mean there will not be a solution so soon. In other words, I might have to comment out gdk_threads_init() in the future many more times when compiling your upcoming trees because 735141 cheekily plays the ball back to the application developers, saying "fix your software; glib does it right; not a bug".
Comment 5 Alexey Yakovenko 2014-08-26 09:34:25 UTC
@Andreas: Programs don't work this way. The builds I make must work on systems with GTK2.12 and up, with corresponding versions of GLIB. Of course, I can check which version of GLIB is installed at runtime. But I can't fix the existing builds this way. And they are very important for many things, such as bug tracking (bisect), regression testing, and of course many people still prefer to run older versions, just because they don't like some of the changes in the new versions, or while they are waiting for a fix of a bug.

I hope that GLIB devs will be able to help find the solution to this problem.
Comment 6 Alexey Yakovenko 2014-08-26 09:39:49 UTC
@Andreas, can't you just roll-back the GLIB to previous version?
Comment 7 Emmanuele Bassi (:ebassi) 2014-08-26 10:42:30 UTC
trying to unlock mutexes that are not locked is not a regression, per se: it's always been an unportable behaviour, and the only reason it worked on Linux is because the pthreads implementation in the GNU libc is permissive by default (even though they strongly suggest to change the setting).

the fact that gdk_threads_init() triggers the error is the actual issue, and should be fixed in GDK.
Comment 8 Alexey Yakovenko 2014-08-26 10:50:25 UTC
Hi Emmanuele.

Can you then reassign the bug to GTK/GDK?
Comment 9 Alexey Yakovenko 2014-08-26 10:50:48 UTC
Oh, sorry, I didn't notice you've done that already. Thanks a lot.
Comment 10 Alexey Yakovenko 2014-08-26 10:54:11 UTC
Btw, you are correct that it always been unportable, because this call was always producing this behaviour on FreeBSD, and I have ifdef for that case, which disables gdk_threads_init there. But I can't remove the call entirely, because without it other issues appear on certain combinations of GTK/GLIB on linux.
Comment 11 Emmanuele Bassi (:ebassi) 2014-08-26 11:00:24 UTC
to be *absolutely* fair, calling gdk_threads_init() and then nothing else until gtk_main() is wrong, and it never should have worked in the first place.

the only reason why somebody would call gdk_threads_init() is if they want to use threads with GDK (let's forget for a moment that it is not portable), which means that the proper way to initialize a GTK application becomes:


  main {
    gdk_threads_init ();

    gdk_threads_enter ();

    gtk_init ();

    // set up your UI

    gtk_main ();

    // tear down your UI

    gdk_threads_leave ();

    exit();
  }

so that you acquire the lock to build your UI, then call gtk_main() which will release the lock, spin the main loop, and then re-acquire the lock before returning control to you.

this means that the mutex would be initialized and locked by the time gtk_main() would try to unlock it.

sadly, none of this was correctly documented, and is also not portable; people just started calling gdk_threads_init() at random, and it magically fixed threading bugs, and nobody actually cared about this stuff on Linux, because GNU libc is permissive by default.

by the looks of it, the gdk_threads_init() call was added to work around another bug, so obviously stuff changes underneath and what was a workaround is now breaking.

anyway, the fix is pretty trivial in GDK — it turns out that I already had to fix Clutter for a similar issue; the default GDK mutex unlock should trylock() and unlock(). I'll cook up a patch for both GTK+ 2 and GTK+ 3.

the *proper* fix, though, would be to either drop gdk_threads_init() from your code because a) it's deprecated; b) it's used as a workaround; and c) it's used in the wrong way.
Comment 12 Emmanuele Bassi (:ebassi) 2014-08-26 11:07:03 UTC
... the other option would be to fix the code to actually do the correct initialization.

(In reply to comment #10)
> Btw, you are correct that it always been unportable, because this call was
> always producing this behaviour on FreeBSD, and I have ifdef for that case,
> which disables gdk_threads_init there. But I can't remove the call entirely,
> because without it other issues appear on certain combinations of GTK/GLIB on
> linux.

that would have been a strong indication that you were doing a very wrong thing.

as I said, our documentation on the issue has always been fairly bad, so it's not entirely your fault.
Comment 13 Emmanuele Bassi (:ebassi) 2014-08-26 11:12:52 UTC
Created attachment 284493 [details] [review]
threads: Do not release the GDK lock if it hasn't been acquired yet

Since GLib ≥ 2.41, attempting to release an unlocked mutex will abort(),
as it happens on most systems already.

Given the lack of proper documentation on how to use GDK with threads,
there is code in the wild that does:

    gdk_threads_init ();
    gdk_init ();

    ...

    gtk_main ();

instead of the idiomatically correct:

    gdk_threads_init ();
    gdk_threads_enter ();

    gtk_init ();

    ...

    gtk_main ();

    ...

    gdk_threads_leave ();

Which means that gtk_main() will try to release the GDK lock, and thus
trigger an error from GLib.

we cannot really fix all the wrong code everywhere, and since it does
not cost us anything, we can work around the issue inside GDK itself, by
trying to acquire the GDK lock inside gdk_threads_leave() with
trylock().
Comment 14 Emmanuele Bassi (:ebassi) 2014-08-26 11:16:13 UTC
Created attachment 284494 [details] [review]
threads: Do not release the GDK lock if it hasn't been acquired yet

Since GLib ≥ 2.41, attempting to release an unlocked mutex will abort(),
as it happens on most systems already.

Given the lack of proper documentation on how to use GDK with threads,
there is code in the wild that does:

    gdk_threads_init ();
    gdk_init ();

    ...

    gtk_main ();

instead of the idiomatically correct:

    gdk_threads_init ();
    gdk_threads_enter ();

    gtk_init ();

    ...

    gtk_main ();

    ...

    gdk_threads_leave ();

Which means that gtk_main() will try to release the GDK lock, and thus
trigger an error from GLib.

we cannot really fix all the wrong code everywhere, and since it does
not cost us anything, we can work around the issue inside GDK itself, by
trying to acquire the GDK lock inside gdk_threads_leave() with
trylock().
Comment 15 Emmanuele Bassi (:ebassi) 2014-08-26 11:18:09 UTC
two patches:

 * attachment 284493 [details] [review] is the patch for GTK+ 2.24.
 * attachment 284494 [details] [review] is the patch for GTK+ master.
Comment 16 Emmanuele Bassi (:ebassi) 2014-08-26 11:23:52 UTC
by the by, if you're in the process of fixing your code, you may also want to look at the discussion happened here:

https://bugs.launchpad.net/ubuntu/+source/gnome-do/+bug/1344386

which shows that gnome-do had the same issue.
Comment 17 Alexey Yakovenko 2014-08-26 11:43:38 UTC
Indeed, my code looks like this:

g_thread_init (NULL);
gdk_threads_init ();
gtk_init (&argc, (char ***)&argv);
.......
gtk_main ();

so I don't have the gdk_threads_enter/leave, and this is wrong.

You're also right about the lack of documentation about this.

Thanks for the explanation, I will add the necessary fix in the next release of my app.

And it would be really great if your patch gets accepted, and the users can still run the current and old releases of my app.
Comment 18 Matthias Clasen 2014-08-26 12:19:08 UTC
Review of attachment 284493 [details] [review]:

Should we limit this to just the first call ? Thats the one where people get it wrong, around gtk_main(). In any case, +1
Comment 19 Matthias Clasen 2014-08-26 12:19:23 UTC
Review of attachment 284494 [details] [review]:

ok
Comment 20 Andreas Eibach 2014-08-26 17:53:26 UTC
>Thanks for the explanation, I will add the necessary fix in the next release of
my app.

Great! Looking forward to that. So I am *not* going to downgrade my GLIB as you suggested, since 

- even current ddb trunk works with my "workaround" 

- I want to keep an up-to-date testing system _with_ 2.41 for your upcoming builds that contain the fix
Comment 21 Matthias Clasen 2014-08-27 00:08:03 UTC
Comment on attachment 284494 [details] [review]
threads: Do not release the GDK lock if it hasn't been acquired yet

Attachment 284493 [details] pushed as 79c3ff3 - threads: Do not release the GDK lock if it hasn't been acquired yet
Attachment 284494 [details] pushed as 79c3ff3 - threads: Do not release the GDK lock if it hasn't been acquired yet
Comment 22 Matthias Clasen 2014-08-27 00:09:12 UTC
Attachment 284493 [details] pushed as fbf38d1 - threads: Do not release the GDK lock if it hasn't been acquired yet
Comment 23 Allison Karlitskaya (desrt) 2014-08-30 03:57:37 UTC
Sorry that I didn't notice this earlier, but a small comment:

The trylock approach here may introduce a theoretical regression: I think that posix is within its rights to hand you a recursive mutex by default.  On  non-Linux where we still use POSIX, attempting to relock that same mutex from the same thread (which is an undefined operation) could result in you successfully acquiring the lock for the second time.
Comment 24 Emmanuele Bassi (:ebassi) 2014-08-30 10:36:29 UTC
(In reply to comment #23)
> Sorry that I didn't notice this earlier, but a small comment:
> 
> The trylock approach here may introduce a theoretical regression: I think that
> posix is within its rights to hand you a recursive mutex by default.  On 
> non-Linux where we still use POSIX, attempting to relock that same mutex from
> the same thread (which is an undefined operation) could result in you
> successfully acquiring the lock for the second time.

the pthreads docs say that the PTHREAD_MUTEX_DEFAULT value *may* be mapped to any of the NORMAL, RECURSIVE, or ERRORCHECK values (or to something else entirely, just to make it even less useful). in our implementation we use PTHREAD_MUTEX_ADAPTIVE_NP if it's available, otherwise we fall back to what I assume is the default implementation.

to be fair, having pthread return a recursive mutex as an implementation detail would be very weird to me; recursive mutexes have more side effects than normal ones, and require a fairly different coding style. "Programming with POSIX Threads" seem to imply that DEFAULT will likely be an vendor/implementation specific setting mapping to either NORMAL or ERRORCHECK.

we could ask for a PTHREAD_MUTEX_NORMAL by default, if ADAPTIVE is not available, if we want to be on the safe side.
Comment 25 Emmanuele Bassi (:ebassi) 2014-08-30 10:48:14 UTC
(In reply to comment #24)
 
> we could ask for a PTHREAD_MUTEX_NORMAL by default, if ADAPTIVE is not
> available, if we want to be on the safe side.

or we could always request an ERRORCHECK mutex, since:

 - the futex-based implementation we use on Linux does error checking
 - the other POSIX platforms we support (i.e. *BSD and Apple) seem to
   prefer the ERRORCHECK value as the PTHREAD_MUTEX_DEFAULT

in any case, I'd probably wait and see if something bad happens. after all, the API in question (gdk_threads_*) is deprecated and people should be moving away from it.
Comment 26 Andreas Eibach 2014-08-30 23:56:33 UTC
Certainly, but this won't happen overnight for sure.