Bug 697316 – abort trying to unlock an non existent mutex

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 697316 - abort trying to unlock an non existent mutex


Summary:	abort trying to unlock an non existent mutex


Status:	RESOLVED FIXED

Product:	tracker
Classification:	Core
Component:	Miners
Version:	0.16.x
Hardware:	Other OpenBSD

Importance:	Normal major
Target Milestone:	---
Assigned To:	Martyn Russell
QA Contact:

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-04-05 09:10 UTC by Antoine Jacoutot
Modified:	2013-05-01 08:45 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Suggest patch to fix this bug (1.86 KB, patch) 2013-04-30 19:53 UTC, Martyn Russell	none	Details \| Review

Description Antoine Jacoutot 2013-04-05 09:10:04 UTC

Hi.

I'm having a crash in tracker due to what it seems like unlocking a non existing or already unlocked mutex...
On BSD systems, this is an issue because PTHREAD_MUTEX_DEFAULT is set to PTHREAD_MUTEX_STRICT_NP which means we end up with an abort().

$ /usr/local/libexec/tracker-miner-fs                                                                                                  

(tracker-miner-fs:14000): Tracker-WARNING **: Locale '0' is not set, defaulting to C locale
Abort trap (core dumped)

(gdb) bt

+ Trace 231730

#0 kill
at <stdin> line 2
#1 abort
at /usr/src/lib/libc/stdlib/abort.c line 70
#2 pthread_mutex_unlock
at /usr/src/lib/librthread/rthread_sync.c line 218
#3 g_mutex_unlock
from /usr/local/lib/libglib-2.0.so.3600.0
#4 tracker_sparql_backend_get
from /usr/local/lib/libtracker-sparql-0.16.so.0.0
#5 tracker_sparql_backend_get_internal
from /usr/local/lib/libtracker-sparql-0.16.so.0.0
#6 tracker_sparql_connection_get
from /usr/local/lib/libtracker-sparql-0.16.so.0.0
#7 tracker_file_notifier_init
from /usr/local/lib/libtracker-miner-0.16.so.0.0
#8 g_type_create_instance
from /usr/local/lib/libgobject-2.0.so.3600.0
#9 g_object_constructor
from /usr/local/lib/libgobject-2.0.so.3600.0
#10 g_object_newv
from /usr/local/lib/libgobject-2.0.so.3600.0
#11 g_object_new_valist
from /usr/local/lib/libgobject-2.0.so.3600.0
#12 g_object_new
from /usr/local/lib/libgobject-2.0.so.3600.0
#13 tracker_miner_fs_init
from /usr/local/lib/libtracker-miner-0.16.so.0.0
#14 g_type_create_instance
from /usr/local/lib/libgobject-2.0.so.3600.0
#15 g_object_constructor
from /usr/local/lib/libgobject-2.0.so.3600.0
#16 g_object_newv
from /usr/local/lib/libgobject-2.0.so.3600.0
#17 g_object_new_valist
from /usr/local/lib/libgobject-2.0.so.3600.0
#18 g_initable_new_valist
from /usr/local/lib/libgio-2.0.so.3600.0
#19 g_initable_new
from /usr/local/lib/libgio-2.0.so.3600.0
#20 tracker_miner_files_new
from /usr/local/libexec/tracker-miner-fs
#21 main
from /usr/local/libexec/tracker-miner-fs

Comment 1 Antoine Jacoutot 2013-04-09 07:59:12 UTC

Ok it seems that the issue is that tracker_sparql_backend_get() does
this:

g_mutex_lock (&tracker_sparql_backend_door);
{
	... if something bad, then goto __finally3
}
	result = _result_;
	{
		g_mutex_unlock (&tracker_sparql_backend_door);
	}

	return result;

{
__finally3:
{
	g_mutex_unlock (&tracker_sparql_backend_door);
}

So when tracker_sparql_backend_get behaves correctly, you will hit the
second g_mutex_unlock() in __finally3 on a mutex that has already been
unlocked.

I am not familiar with vala so I cannot provide a patch for
tracker-backend.vala, but directly removing the first g_mutex_unlock()
call from tracker-backend.c does the trick for me and AFAIU, __finally3
is always reached so there is no chance to leave a locked mutex around.

At least, that's my understanding...

Comment 2 Jürg Billeter 2013-04-09 08:08:03 UTC

(In reply to comment #1)
> So when tracker_sparql_backend_get behaves correctly, you will hit the
> second g_mutex_unlock() in __finally3 on a mutex that has already been
> unlocked.

There is a return statement before __finally3, which means that __finally3 will never be reached without 'goto __finally3'.

Comment 3 Antoine Jacoutot 2013-04-09 08:10:14 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > So when tracker_sparql_backend_get behaves correctly, you will hit the
> > second g_mutex_unlock() in __finally3 on a mutex that has already been
> > unlocked.
> 
> There is a return statement before __finally3, which means that __finally3 will
> never be reached without 'goto __finally3'.

Ah right, I missed that. The C file is quite hard to read with all the line numbers comments :-/
The issue still stands though; I'm not quite sure why I'm getting this double unlock.

Comment 4 Antoine Jacoutot 2013-04-10 16:42:52 UTC

So, I recompiled our thread library using
#define PTHREAD_MUTEX_DEFAULT PTHREAD_MUTEX_NORMAL

With this, the crash does not occur. So there is indeed an issue with unlocking an unlocked/uninitialized mutex...
I am really lost in vala, and could use some help :-/

Thanks.

Comment 5 Martyn Russell 2013-04-12 10:42:09 UTC

After some investigation with Antoine, this is down to this in Glib:

#ifdef PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP
  {
    pthread_mutexattr_t attr;
    pthread_mutexattr_init (&attr);
    pthread_mutexattr_settype (&attr, PTHREAD_MUTEX_ADAPTIVE_NP);
    pattr = &attr;
  }
#endif

  if G_UNLIKELY ((status = pthread_mutex_init (mutex, pattr)) != 0)
    g_thread_abort (status, "pthread_mutex_init");

#ifdef PTHREAD_ADAPTIVE_MUTEX_NP
  pthread_mutexattr_destroy (&attr);
#endif

--

On Linux systems, the definition _INITIALIZER_NP exists so we don't have a problem with the _unlock() issue because it just fails if it can't _unlock(). For BSD (and likely freebsd, netbsd) it seems _DEFAULT is set to PTHREAD_MUTEX_STRICT_NP which will abort on _unlock() failures. That's why Antoine's #define fixes it for him in libpthread.

I would suggest that we add a line there after the if G_UNLIKELY (...), something like:

+ #ifndef PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP
+   pthread_mutexattr_settype (&attr, PTHREAD_MUTEX_NORMAL);
+ #endif

Otherwise, the behaviour is largely inconsistent across GLib platforms.

Jürg, any thoughts before I see what the GLib team have to say about this idea?

Comment 6 Martyn Russell 2013-04-12 10:43:53 UTC

> On Linux systems, the definition _INITIALIZER_NP exists so we don't have a
> problem with the _unlock() issue because it just fails if it can't _unlock().
> For BSD (and likely freebsd, netbsd) it seems _DEFAULT is set to
> PTHREAD_MUTEX_STRICT_NP which will abort on _unlock() failures. That's why
> Antoine's #define fixes it for him in libpthread.

I forgot to add in case it wasn't obvious, but it seems _INITIALIZER_NP is undefined, hence _DEFAULT being used, which then infers _STRICT_NP.

Comment 7 Antoine Jacoutot 2013-04-12 14:55:45 UTC

To extend a bit, if you set the mutex type to PTHREAD_MUTEX_ERRORCHECK on Linux, you should see the same error as I do:

GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_unlock': Operation not permitted.  Aborting.

Comment 8 Kurt Miller 2013-04-15 19:08:27 UTC

Hi Martyn,

Antoine asked me to take a look at this. It appears to be a bug in vala, but I'm not 100% on that since this my first time looking at a vala program. In anycase, what is happening is the static initilization of the 'door' mutex is occuring *after* it has been locked in the get() method. This has the effect of wiping out the existing lock and leaving the lock in an unlocked state.

Antoine is correct, this bug should be visable when the mutex type is set to PTHREAD_MUTEX_ERRORCHECK on any platform.

Take a look at the following stack trace which occurs after the 'door' lock has been locked (more comments after trace below):

+ Trace 231798

#0 tracker_sparql_backend_class_init
at tracker-backend.c line 3824
#1 g_type_class_ref
from /usr/local/lib/libgobject-2.0.so.3600.0
#2 g_object_newv
from /usr/local/lib/libgobject-2.0.so.3600.0
#3 g_object_new
from /usr/local/lib/libgobject-2.0.so.3600.0
#4 tracker_sparql_backend_construct
at /home/martyn/Source/tracker/src/libtracker-sparql-backend/tracker-backend.vala line 30
#5 tracker_sparql_backend_get
at /home/martyn/Source/tracker/src/libtracker-sparql-backend/tracker-backend.vala line 224
#6 tracker_sparql_backend_get_internal
at /home/martyn/Source/tracker/src/libtracker-sparql-backend/tracker-backend.vala line 242
#7 tracker_sparql_connection_get
at /home/martyn/Source/tracker/src/libtracker-sparql-backend/tracker-backend.vala line 382
#8 tracker_file_notifier_init
at tracker-file-notifier.c line 1270
#9 g_type_create_instance
from /usr/local/lib/libgobject-2.0.so.3600.0
#10 g_object_constructor
from /usr/local/lib/libgobject-2.0.so.3600.0
#11 g_object_newv
from /usr/local/lib/libgobject-2.0.so.3600.0
#12 g_object_new_valist
from /usr/local/lib/libgobject-2.0.so.3600.0
#13 g_object_new
from /usr/local/lib/libgobject-2.0.so.3600.0
#14 tracker_miner_fs_init
at tracker-miner-fs.c line 584
#15 g_type_create_instance
from /usr/local/lib/libgobject-2.0.so.3600.0
#16 g_object_constructor
from /usr/local/lib/libgobject-2.0.so.3600.0
#17 g_object_newv
from /usr/local/lib/libgobject-2.0.so.3600.0
#18 g_object_new_valist
from /usr/local/lib/libgobject-2.0.so.3600.0
#19 g_initable_new_valist
from /usr/local/lib/libgio-2.0.so.3600.0
#20 g_initable_new
from /usr/local/lib/libgio-2.0.so.3600.0
#21 tracker_miner_files_new
at tracker-miner-files.c line 2646
#22 main
at tracker-main.c line 935


In frame 6 calls tracker_sparql_connection_get(), door is locked (however nothing has called tracker_sparql_backend_class_init() yet. Then at line 224 the static initialization of the class is triggered:

 #5  0x000013490ee2e83b in tracker_sparql_backend_get (cancellable=0x0, error=0x7f7ffffccd18)
    at /home/martyn/Source/tracker/src/libtracker-sparql-backend/tracker-backend.vala:224
224                                     result = new Tracker.Sparql.Backend ();

This leads down to tracker_sparql_backend_class_init() being called after the lock was taken. tracker_sparql_backend_class_init() initializes the mutex and leaves the mutex unlocked. Later then it is unlocked but it is already unlocked.

It seems to me vala should automatically trigger static initialization of static member variables as the first step of executing a static class function.

Regards,
-Kurt

Comment 9 Martyn Russell 2013-04-23 15:25:58 UTC

I am not so sure this is Vala.
I recently changed this code in commit 1e1d8b0cd8996702695eabf9073810e88257195e

I did this because the upstream GLib mutex APIs have been deprecated (GStaticMutex), so I switched to GMutex. Previously (as you can see), we were using GStaticMutex and it was using static initialisation with G_STATIC_MUTEX_INIT (which AFAICS) just sets the structure to NULL (or 0s it). However, when we called g_static_mutex_lock(), that API creates the new lock _BEFORE_ locking it. You can see this with the g_static_mutex_get_mutex_impl() function and the functions defined in glib/gthread.h like G_LOCK_DEFINE_STATIC.

So my question was during all this, is this really a Vala thing (i.e. race condition or sequencing issue) or is it a GMutex vs GStaticMutex thing. The answer is definitely the later. I don't believe this is a Vala thing at all, but what's clear is, we were abusing the static mutex code paths before and now we can't rely on those.

It seems to me that the g_mutex_lock() API we're now using has no checks for initialisation which the former g_static_mutex_lock() had. Sadly, the documentation states we should use g_mutex_lock() but doesn't make any point about race conditions or initialisation cases like this one that may occur. So we're probably not the only ones seeing this issue.

The solution here is to either add code to check for initialisation before using the door.lock() calls or add some way to have a GStaticMutex type functionality in again. These are really the same solution. The GStaticMutex functionality was exactly what we needed here because we have a singleton and don't want the creation of that to be a race condition. We could potentially use some g_atomic_pointer_compare_and_exchange() type API here to fix this (which is replicating what some of the mutex APIs were/are doing anyway).

I don't consider this a Vala issue though, this is all about the underlying GLib mutex APIs we're using and the _get() API for the connection doesn't require an existing singleton or TrackerSparqlConnection object/class to exist before calling _get().

Jürg any comments?

Comment 10 Allison Karlitskaya (desrt) 2013-04-24 15:51:20 UTC

hi Martyn,

GMutex indeed handles the 'static initialisation' case and is documented as such, even including example code to demonstrate:

"""

Example 3. Using GMutex to protected a shared variable

int
give_me_next_number (void)
{
  static GMutex mutex;
  static int current_number = 0;
  int ret_val;

  g_mutex_lock (&mutex);
  ret_val = current_number = calc_next_number (current_number);
  g_mutex_unlock (&mutex);

  return ret_val;
}


Notice that the GMutex is not initialised to any particular value. Its placement in static storage ensures that it will be initialised to all-zeros, which is appropriate.

If a GMutex is placed in other contexts (eg: embedded in a struct) then it must be explicitly initialised using g_mutex_init().

"""


It might be that you have found a really tricky situation in which there is a race in this initialisation but very many people are using it in exactly this way already, without problems.

I think that Antoine's analysis of the problem is very likely correct: you're probably unlocking a mutex that's not locked.  We've seen many issues like this and the BSD guys are always arguing that we should change to error-checking mutexes in Linux by default specifically so developers would catch these kinds of problems sooner.

Comment 11 Martyn Russell 2013-04-24 16:17:05 UTC

(In reply to comment #10)
> hi Martyn,
> 
> GMutex indeed handles the 'static initialisation' case and is documented as
> such, even including example code to demonstrate:
> 
> """
> 
> Example 3. Using GMutex to protected a shared variable
> 
> int
> give_me_next_number (void)
> {
>   static GMutex mutex;
>   static int current_number = 0;
>   int ret_val;
> 
>   g_mutex_lock (&mutex);
>   ret_val = current_number = calc_next_number (current_number);
>   g_mutex_unlock (&mutex);
> 
>   return ret_val;
> }
> 
> 
> Notice that the GMutex is not initialised to any particular value. Its
> placement in static storage ensures that it will be initialised to all-zeros,
> which is appropriate.

Yea, we're actually using it in Vala like this:

https://git.gnome.org/browse/tracker/tree/src/libtracker-sparql-backend/tracker-backend.vala#n209

So the:

  static Mutex door = Mutex ();

The corresponding C is:

  static GMutex tracker_sparql_backend_door;
  static GMutex tracker_sparql_backend_door = {0};
 
> If a GMutex is placed in other contexts (eg: embedded in a struct) then it must
> be explicitly initialised using g_mutex_init().

Yea, we're not doing that. It's part of a class's private members which needs to be initiated before anything else to avoid singleton creation race conditions.

> I think that Antoine's analysis of the problem is very likely correct: you're
> probably unlocking a mutex that's not locked.  We've seen many issues like this

Antoine's analysis is the next logical progression from what i've described. That is to say that what he describes happens next anyway. I was trying to identify why. I checked his stack trace and the sources and confirmed what Antoine was finding.

So run through this with me then Ryan because I am not totally convinced:

1. We have a GMutex initialised with {0};

2. We call _lock() which looks like this:
   g_mutex_lock (GMutex *mutex)
   {
     if G_UNLIKELY ((status = pthread_mutex_lock (g_mutex_get_impl (mutex))) != 0)
     ...
   }

3. Here is the _impl() function:
   g_mutex_get_impl (GMutex *mutex)
   {
     pthread_mutex_t *impl = g_atomic_pointer_get (&mutex->p);
     ...
   }

So mutex->p should be NULL, where &mutex has a value and the contents of the union have been zeroed with {0}.

So looking at the code right now, this should work as you say in your example without any changes.

So what am I missing?

Comment 12 Allison Karlitskaya (desrt) 2013-04-24 17:08:20 UTC

Reading your code, I'm pretty sure I found the problem:

In the class_init C code emitted by Vala, I see this:

        g_mutex_init (&_tmp0_);
        tracker_sparql_backend_door = _tmp0_;
}


And I notice that you are using the '_door' lock from static functions (ie: it's possible that you take the lock before the class is initialised).

I think this sequence of events is extremely likely:

1) user calls static new Connection get()
2) lock is taken
3) new instance of object is created while lock is held, causing class_init
   to happen
4) class_init does _mutex_init(), replacing the already-existing mutex with
   a new one
5) you unlock this new mutex.  boom.

Comment 13 Kurt Miller 2013-04-24 18:13:47 UTC

(In reply to comment #12)
> Reading your code, I'm pretty sure I found the problem:
> 
> In the class_init C code emitted by Vala, I see this:
> 
>         g_mutex_init (&_tmp0_);
>         tracker_sparql_backend_door = _tmp0_;
> }
> 
> 
> And I notice that you are using the '_door' lock from static functions (ie:
> it's possible that you take the lock before the class is initialised).
> 
> I think this sequence of events is extremely likely:
> 
> 1) user calls static new Connection get()
> 2) lock is taken
> 3) new instance of object is created while lock is held, causing class_init
>    to happen
> 4) class_init does _mutex_init(), replacing the already-existing mutex with
>    a new one
> 5) you unlock this new mutex.  boom.

Yes. This is exactly what is happening. It is also the reason I think this is a vala bug.

Accessing a static member variable should trigger static initialization of the class prior to using that variable - not after.

Comment 14 Allison Karlitskaya (desrt) 2013-04-24 18:28:31 UTC

It certainly does seem like Vala could be doing better here.

It should either:

a) use a ctor to setup the value, instead of class_init

b) use some sort of 'ensure' g_once_* type of thing at all points of access

c) similarly (as you suggest) hold a class ref at all points of access


I really like 'a', but it would depend on constructor support in the C compiler, which is a non-standard feature.

Comment 15 Martyn Russell 2013-04-24 18:42:31 UTC

Ryan, Kurt, thank you both for the additional comments and analysis. I will cook up a patch here and do some testing to see what makes sense.

My initial thought is to try and NOT initialise with Mutex(), I suspect this is what is adding the g_mutex_init(&_tmp0_); C code. Failing that, I quite like the look of 'b'. I think 'a' still allows for a race condition of the singleton creation and you could end up with 2 classes competing there.

Thanks again for the help! :)

Comment 16 Allison Karlitskaya (desrt) 2013-04-24 19:08:01 UTC

(In reply to comment #15)
> I think 'a' still allows for a race condition of the singleton
> creation and you could end up with 2 classes competing there.


When I said ctor, I meant in the __attribute__((constructor)) case (ie: run once, before main() runs).

Comment 17 Jürg Billeter 2013-04-25 13:18:00 UTC

(In reply to comment #14)
> It certainly does seem like Vala could be doing better here.

(a) is not an option for Vala due to the lack of portability you already mentioned.

One option may be to simply make the issue obvious to the developer. E.g., only accept static variable initializers that are accepted as initializer expression in C as well. Require the developer to explicitly add an assignment to class_init or some other function for more complex initializers.

Comment 18 Martyn Russell 2013-04-30 19:53:19 UTC

Created attachment 242960 [details] [review]
Suggest patch to fix this bug

Antoine, Kurt, any chance you could test this patch?

Ryan, I tried this patch and it seems to work here on Ubuntu 13.04 with GLib 2.36. The patch seems to remove the g_mutex_init() call from the generated sources that Vala spits out. This should avoid the re-initialisation of the mutex. The atomic operations in the mutex internals should avoid race conditions too AFAICS.

Jürg, if you think this isn't enough or prefer we tackle this another way, let me know.

Thanks all.

Comment 19 Antoine Jacoutot 2013-05-01 00:52:28 UTC

(In reply to comment #18)
> Created an attachment (id=242960) [details] [review]
> Suggest patch to fix this bug
> 
> Antoine, Kurt, any chance you could test this patch?

Sure but:
CDPATH="${ZSH_VERSION+.}:" && cd . && /usr/local/bin/valac --pkg gio-2.0 -g --target-glib=2.34 ../../src/libtracker-sparql/tracker-sparql-0.16.vapi ../../src/libtracker-bus/tracker-bus.vapi ../../src/libtracker-direct/tracker-direct.vapi  -C tracker-backend.vala
error: ../../src/libtracker-bus/tracker-bus.vapi not found
error: ../../src/libtracker-direct/tracker-direct.vapi not found
Compilation failed: 2 error(s), 0 warning(s)

I suppose I need to use a git checkout and cannot just patch the latest released tarball?

Comment 20 Antoine Jacoutot 2013-05-01 06:47:12 UTC

Bah, removing the vala.stamp files did the trick and I was able to package this.
IT WORKS :-) No more abort().

I've already committed it to the OpenBSD ports tree since even if it is eventually not accepted as the correct solution, it is still better than what we have right now.

Thank you guys!

Comment 21 Martyn Russell 2013-05-01 08:45:01 UTC

(In reply to comment #20)
> Bah, removing the vala.stamp files did the trick and I was able to package
> this.
> IT WORKS :-) No more abort().
> 
> I've already committed it to the OpenBSD ports tree since even if it is
> eventually not accepted as the correct solution, it is still better than what
> we have right now.
> 
> Thank you guys!

Great. I am going to commit this anyway. I think if we need a different solution, we will likely have to build on this patch anyway. Since it works here and fixes your issue, it's better than no commit at all in the short term.

This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.

I should be releasing with this later today (0.16.1).

Comment 22 Martyn Russell 2013-05-01 08:45:37 UTC

I should add, I checked with Jürg on IRC before pushing.