GNOME Bugzilla – Bug 735428
[regression] gtk_main() tries to unlock an unlocked mutex
Last modified: 2014-08-30 23:56:33 UTC
This happened to me with deadbeef (audio player) and here is the same with psensor: https://bugs.launchpad.net/ubuntu/+source/psensor/+bug/1346299 "Attempt to unlock mutex that was not locked" is the most important thing here. I got the very same when launching deadbeef from console. This was originally built under glib < 2.41, and failed once I updated to 2.41 yesterday. Fortunately, the developer on Launchpad had the right idea, and I could fix it, which, however, was not accepted by deadbeef's main programmer. It's indeed this line: ------ gdk_threads_init() ------ He let me know that this line must be in for older glib/gtk versions. Whilst it's theoretically possible to "#ifdef out" the line for glib >= 2.41, he didn't seem too enthusiastic about that. He pointed out that this is an obvious regression and that builds created under glib < 2.41 still have to work under new 2.41 no matter what. At least removing that line was a welcome "fix" for me to be able to use deadbeef at all, since having this line active would have made deadbeef unavailable for the time being.
Hello, dear GLIB developers. I'm a developer of deadbeef player, mentioned in the summary. This bug is a duplicate of #735141, but I would like to explain the situation, before the bug is closed. Your fix prevents any build of my project made in the past 5 years from running. Which means you broke the binary backwards compatibility. App developers can't change the binary builds which have already been released. My code doesn't contain anything criminal, and the crash happens deep inside of gtk_main, both in GTK2 and GTK3. As many users have reported, the crash goes away, if gdk_threads_init() call is removed from my initialization function. So, if this function should not be called -- maybe GTK devs should make it NOP. I need to repeat this again: changing the already released builds is NOT AN OPTION. You should not be breaking binary compatibility like that. My use of the gdk_threads_init is in conformance with GTK documentation, and is called before gtk_init.
@Andreas: Adding #ifdef is not an option, because it's compile time option. Binary builds are built against GTK2.12 and GTK3.0.0, where this function was not declared deprecated.
Erm I was referring to future builds, of course. git of today and later. So if this tree is built henceforth, AND the #ifdef is in, it could be determined which glib is installed is in the system and reacted appropriately. I'm aware that this won't fix the builds that exist already, of course; only those to come/to build.
@Alexey ... well this does not look good at all. bug 735141 is marked NOTABUG, and this will mean there will not be a solution so soon. In other words, I might have to comment out gdk_threads_init() in the future many more times when compiling your upcoming trees because 735141 cheekily plays the ball back to the application developers, saying "fix your software; glib does it right; not a bug".
@Andreas: Programs don't work this way. The builds I make must work on systems with GTK2.12 and up, with corresponding versions of GLIB. Of course, I can check which version of GLIB is installed at runtime. But I can't fix the existing builds this way. And they are very important for many things, such as bug tracking (bisect), regression testing, and of course many people still prefer to run older versions, just because they don't like some of the changes in the new versions, or while they are waiting for a fix of a bug. I hope that GLIB devs will be able to help find the solution to this problem.
@Andreas, can't you just roll-back the GLIB to previous version?
trying to unlock mutexes that are not locked is not a regression, per se: it's always been an unportable behaviour, and the only reason it worked on Linux is because the pthreads implementation in the GNU libc is permissive by default (even though they strongly suggest to change the setting). the fact that gdk_threads_init() triggers the error is the actual issue, and should be fixed in GDK.
Hi Emmanuele. Can you then reassign the bug to GTK/GDK?
Oh, sorry, I didn't notice you've done that already. Thanks a lot.
Btw, you are correct that it always been unportable, because this call was always producing this behaviour on FreeBSD, and I have ifdef for that case, which disables gdk_threads_init there. But I can't remove the call entirely, because without it other issues appear on certain combinations of GTK/GLIB on linux.
to be *absolutely* fair, calling gdk_threads_init() and then nothing else until gtk_main() is wrong, and it never should have worked in the first place. the only reason why somebody would call gdk_threads_init() is if they want to use threads with GDK (let's forget for a moment that it is not portable), which means that the proper way to initialize a GTK application becomes: main { gdk_threads_init (); gdk_threads_enter (); gtk_init (); // set up your UI gtk_main (); // tear down your UI gdk_threads_leave (); exit(); } so that you acquire the lock to build your UI, then call gtk_main() which will release the lock, spin the main loop, and then re-acquire the lock before returning control to you. this means that the mutex would be initialized and locked by the time gtk_main() would try to unlock it. sadly, none of this was correctly documented, and is also not portable; people just started calling gdk_threads_init() at random, and it magically fixed threading bugs, and nobody actually cared about this stuff on Linux, because GNU libc is permissive by default. by the looks of it, the gdk_threads_init() call was added to work around another bug, so obviously stuff changes underneath and what was a workaround is now breaking. anyway, the fix is pretty trivial in GDK — it turns out that I already had to fix Clutter for a similar issue; the default GDK mutex unlock should trylock() and unlock(). I'll cook up a patch for both GTK+ 2 and GTK+ 3. the *proper* fix, though, would be to either drop gdk_threads_init() from your code because a) it's deprecated; b) it's used as a workaround; and c) it's used in the wrong way.
... the other option would be to fix the code to actually do the correct initialization. (In reply to comment #10) > Btw, you are correct that it always been unportable, because this call was > always producing this behaviour on FreeBSD, and I have ifdef for that case, > which disables gdk_threads_init there. But I can't remove the call entirely, > because without it other issues appear on certain combinations of GTK/GLIB on > linux. that would have been a strong indication that you were doing a very wrong thing. as I said, our documentation on the issue has always been fairly bad, so it's not entirely your fault.
Created attachment 284493 [details] [review] threads: Do not release the GDK lock if it hasn't been acquired yet Since GLib ≥ 2.41, attempting to release an unlocked mutex will abort(), as it happens on most systems already. Given the lack of proper documentation on how to use GDK with threads, there is code in the wild that does: gdk_threads_init (); gdk_init (); ... gtk_main (); instead of the idiomatically correct: gdk_threads_init (); gdk_threads_enter (); gtk_init (); ... gtk_main (); ... gdk_threads_leave (); Which means that gtk_main() will try to release the GDK lock, and thus trigger an error from GLib. we cannot really fix all the wrong code everywhere, and since it does not cost us anything, we can work around the issue inside GDK itself, by trying to acquire the GDK lock inside gdk_threads_leave() with trylock().
Created attachment 284494 [details] [review] threads: Do not release the GDK lock if it hasn't been acquired yet Since GLib ≥ 2.41, attempting to release an unlocked mutex will abort(), as it happens on most systems already. Given the lack of proper documentation on how to use GDK with threads, there is code in the wild that does: gdk_threads_init (); gdk_init (); ... gtk_main (); instead of the idiomatically correct: gdk_threads_init (); gdk_threads_enter (); gtk_init (); ... gtk_main (); ... gdk_threads_leave (); Which means that gtk_main() will try to release the GDK lock, and thus trigger an error from GLib. we cannot really fix all the wrong code everywhere, and since it does not cost us anything, we can work around the issue inside GDK itself, by trying to acquire the GDK lock inside gdk_threads_leave() with trylock().
two patches: * attachment 284493 [details] [review] is the patch for GTK+ 2.24. * attachment 284494 [details] [review] is the patch for GTK+ master.
by the by, if you're in the process of fixing your code, you may also want to look at the discussion happened here: https://bugs.launchpad.net/ubuntu/+source/gnome-do/+bug/1344386 which shows that gnome-do had the same issue.
Indeed, my code looks like this: g_thread_init (NULL); gdk_threads_init (); gtk_init (&argc, (char ***)&argv); ....... gtk_main (); so I don't have the gdk_threads_enter/leave, and this is wrong. You're also right about the lack of documentation about this. Thanks for the explanation, I will add the necessary fix in the next release of my app. And it would be really great if your patch gets accepted, and the users can still run the current and old releases of my app.
Review of attachment 284493 [details] [review]: Should we limit this to just the first call ? Thats the one where people get it wrong, around gtk_main(). In any case, +1
Review of attachment 284494 [details] [review]: ok
>Thanks for the explanation, I will add the necessary fix in the next release of my app. Great! Looking forward to that. So I am *not* going to downgrade my GLIB as you suggested, since - even current ddb trunk works with my "workaround" - I want to keep an up-to-date testing system _with_ 2.41 for your upcoming builds that contain the fix
Comment on attachment 284494 [details] [review] threads: Do not release the GDK lock if it hasn't been acquired yet Attachment 284493 [details] pushed as 79c3ff3 - threads: Do not release the GDK lock if it hasn't been acquired yet Attachment 284494 [details] pushed as 79c3ff3 - threads: Do not release the GDK lock if it hasn't been acquired yet
Attachment 284493 [details] pushed as fbf38d1 - threads: Do not release the GDK lock if it hasn't been acquired yet
Sorry that I didn't notice this earlier, but a small comment: The trylock approach here may introduce a theoretical regression: I think that posix is within its rights to hand you a recursive mutex by default. On non-Linux where we still use POSIX, attempting to relock that same mutex from the same thread (which is an undefined operation) could result in you successfully acquiring the lock for the second time.
(In reply to comment #23) > Sorry that I didn't notice this earlier, but a small comment: > > The trylock approach here may introduce a theoretical regression: I think that > posix is within its rights to hand you a recursive mutex by default. On > non-Linux where we still use POSIX, attempting to relock that same mutex from > the same thread (which is an undefined operation) could result in you > successfully acquiring the lock for the second time. the pthreads docs say that the PTHREAD_MUTEX_DEFAULT value *may* be mapped to any of the NORMAL, RECURSIVE, or ERRORCHECK values (or to something else entirely, just to make it even less useful). in our implementation we use PTHREAD_MUTEX_ADAPTIVE_NP if it's available, otherwise we fall back to what I assume is the default implementation. to be fair, having pthread return a recursive mutex as an implementation detail would be very weird to me; recursive mutexes have more side effects than normal ones, and require a fairly different coding style. "Programming with POSIX Threads" seem to imply that DEFAULT will likely be an vendor/implementation specific setting mapping to either NORMAL or ERRORCHECK. we could ask for a PTHREAD_MUTEX_NORMAL by default, if ADAPTIVE is not available, if we want to be on the safe side.
(In reply to comment #24) > we could ask for a PTHREAD_MUTEX_NORMAL by default, if ADAPTIVE is not > available, if we want to be on the safe side. or we could always request an ERRORCHECK mutex, since: - the futex-based implementation we use on Linux does error checking - the other POSIX platforms we support (i.e. *BSD and Apple) seem to prefer the ERRORCHECK value as the PTHREAD_MUTEX_DEFAULT in any case, I'd probably wait and see if something bad happens. after all, the API in question (gdk_threads_*) is deprecated and people should be moving away from it.
Certainly, but this won't happen overnight for sure.