Bug 674822 – Use ERRORCHECK mutexs

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 674822 - Use ERRORCHECK mutexs


Summary:	Use ERRORCHECK mutexs


Status:	RESOLVED OBSOLETE

Product:	glib
Classification:	Platform
Component:	gthread
Version:	2.32.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2012-04-25 18:33 UTC by Kurt Miller
Modified:	2018-05-24 14:03 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Use PTHREAD_MUTEX_ERRORCHECK type mutexes by default (715 bytes, patch) 2012-04-25 18:33 UTC, Kurt Miller	none	Details \| Review

Description Kurt Miller 2012-04-25 18:33:46 UTC

Created attachment 212829 [details] [review]
Use PTHREAD_MUTEX_ERRORCHECK type mutexes by default

glibc's implementation of PTHREAD_MUTEX_ADAPTIVE_NP and PTHREAD_MUTEX_DEFAULT allows many application bugs to go unchecked and hide them by not failing. For example, unlocking a mutex that is not owned by the current thread is allowed, unlocking a mutex that is unowned is allowed.

As a result of glibc's implementation, applications that use glib will not know when there are application bugs which can have unpredictable consequences such has one thread unlocking another threads lock. These type of bugs lead to crashes that are very difficult to find. In addition, both PTHREAD_MUTEX_ADAPTIVE_NP and PTHREAD_MUTEX_DEFAULT type mutexes can be implemented differently across platforms because PTHREAD_MUTEX_ADAPTIVE_NP is non-portable and because PTHREAD_MUTEX_DEFAULT is not strictly defined by Posix.

With the introduction of checking the return values of pthread_mutex_lock/unlock in version 2.31.0 of glib, applications now behave differently across platforms.

I am proposing that glib explicitly set the mutex type to PTHREAD_MUTEX_ERRORCHECK so that serious application bugs are caught and reported on Linux as well as for consistant behavior across all platforms.

The following applications have issues that are hidden by the use of non-error check type mutexes: sonata, subtitleeditor, gftp, gajim

Comment 1 Antoine Jacoutot 2012-04-27 13:35:40 UTC

Hi.

I can confirm the behavior difference between Linux and OpenBSD. The aforementioned applications abort/segfault here.
I'm expecting the same will happen to at least NetBSD and FreeBSD as well when they update to a newer glib.
It would be nice to have a consistent behavior between glib-supported platforms no matter what the default mutex type is on the operating system (and it would probably help catch bugs earlier on in the process).

Comment 2 Allison Karlitskaya (desrt) 2012-04-27 13:43:18 UTC

I don't think I'm happy with turning this on by default.  Perhaps we could have a debugging environment variable that does it, though.

Another thing worth mentioning is that I plan on rewriting the mutex implementation, at least on Linux to use atomics and futex.  I tried it out in a branch and the performance was better than what we get with glibc with less memory used as well.  This new mutex implementation would likely be extremely 'permissive' in terms of not catching errors...

Comment 3 Kurt Miller 2012-04-27 14:21:50 UTC

(In reply to comment #2)
> I don't think I'm happy with turning this on by default.  Perhaps we could have
> a debugging environment variable that does it, though.

I think a debugging environment variable is a cop-out. It is unlikely that developers using glib will know about it and take the time to test with it. These application bugs will continue to go unnoticed and cause intermittent and variable crashes.

Since glib is a core library that many applications build upon, why would the choice be made for speed over correctness? Writing threaded application is hard for many programmers. They will make mistakes that are easy to catch with a few extra asm instructions and branches.  

> Another thing worth mentioning is that I plan on rewriting the mutex
> implementation, at least on Linux to use atomics and futex.  I tried it out in
> a branch and the performance was better than what we get with glibc with less
> memory used as well.

> This new mutex implementation would likely be extremely
> 'permissive' in terms of not catching errors...

I highly recommend that you don't allow completely ridiculous actions like one thread unlocking another threads lock. The speed cost of the check is minimal and the benefit great.

Comment 4 Dan Winship 2012-04-27 14:35:17 UTC

We already have a knob for fast-vs-safe. So we could do:

#ifndef G_DISABLE_CHECKS
#define g_mutex_init g_errorcheck_mutex_init
#define g_mutex_lock g_errorcheck_mutex_lock
#endif

Comment 5 Antoine Jacoutot 2012-04-27 15:23:30 UTC

(In reply to comment #4)
> We already have a knob for fast-vs-safe. So we could do:
> 
> #ifndef G_DISABLE_CHECKS
> #define g_mutex_init g_errorcheck_mutex_init
> #define g_mutex_lock g_errorcheck_mutex_lock
> #endif

I think this would a very nice way to go actually. I'll also make it easier for people running into crashes to report the bug to the corresponding upstream because they would then easily be reproducible by unsetting G_DISABLE_CHECKS.

Using an environment variable would be even simpler but if the whole mutex handling is rewritten, I'm not sure how this would fit in.

To summarize I think having an easy way for people running Linux to reproduce the bugs we see because we don't run with default mutex type set to PTHREAD_MUTEX_NORMAL would be a great benefit.

Comment 6 Matthias Clasen 2012-04-27 15:26:55 UTC

No, I don't think that is good - we basically recommend to never disable checks, so this is pretty similar to a 'default on'

Comment 7 Antoine Jacoutot 2012-04-27 15:28:51 UTC

(In reply to comment #6)
> No, I don't think that is good - we basically recommend to never disable
> checks, so this is pretty similar to a 'default on'

In that case, a separate runtime env variable maybe?

Comment 8 Kurt Miller 2012-04-30 14:30:30 UTC

(In reply to comment #6)
> we basically recommend to never disable checks

If the default is to check for errors instead of for speed, then I don't understand the resistance to select the mutex type that checks for errors.

It comes down to what is the philosophy of glib: correctness over speed; or speed over correctness?

Due to the poor choices made by glibc to not check for errors, glib needs to decide on its philosophy. Checking for errors will result in more stable applications because problems will be found by the developers of the applications upfront.

Comment 9 GNOME Infrastructure Team 2018-05-24 14:03:34 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/540.