After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 585375 - Performance and Contention problems with g_type_class_ref and type_rw_lock
Performance and Contention problems with g_type_class_ref and type_rw_lock
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gobject
2.22.x
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 595394 (view as bug list)
Depends on:
Blocks: 557100 599954
 
 
Reported: 2009-06-10 18:47 UTC by Antoine Tremblay
Modified: 2011-02-18 15:59 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gstbuffer stress test (2.71 KB, text/x-csrc)
2009-09-21 07:50 UTC, Edward Hervey
  Details
gobject/gtype.c: Lockless type reffing/unreffing. Fixes #585375 (16.33 KB, patch)
2009-09-23 18:14 UTC, Edward Hervey
none Details | Review
gobject/gtype.c: Lockless type reffing/unreffing. Fixes #585375 (13.12 KB, patch)
2009-09-24 07:46 UTC, Edward Hervey
none Details | Review
gobject/tests: Add a test that refs/unrefs a GType in several threads. (2.56 KB, patch)
2009-09-24 08:56 UTC, Edward Hervey
none Details | Review
gobject/tests: New test for dynamic class creation/destruction (7.67 KB, patch)
2009-09-24 10:20 UTC, Edward Hervey
none Details | Review
Move ref_count from TypeNode->data to TypeNode (7.65 KB, patch)
2009-09-24 10:51 UTC, Benjamin Otte (Company)
none Details | Review
Pass the TypeNode to type_data_last_unref_Wm() (3.32 KB, patch)
2009-09-24 10:51 UTC, Benjamin Otte (Company)
none Details | Review
Move setting the refcount to the end of the function (970 bytes, patch)
2009-09-24 10:51 UTC, Benjamin Otte (Company)
none Details | Review
Add a NODE_REFCOUNT getter (6.73 KB, patch)
2009-09-24 10:51 UTC, Benjamin Otte (Company)
none Details | Review
type_data_unref_Wm => type_data_unref_U (4.10 KB, patch)
2009-09-24 10:52 UTC, Benjamin Otte (Company)
none Details | Review
Make all accesses of Node->ref_count atomic (2.43 KB, patch)
2009-09-24 10:52 UTC, Benjamin Otte (Company)
none Details | Review
Make type_data_unref_U not take locks in the common case (1.55 KB, patch)
2009-09-24 10:52 UTC, Benjamin Otte (Company)
none Details | Review
Make ClassData->init_State atomic (3.87 KB, patch)
2009-09-24 10:52 UTC, Benjamin Otte (Company)
none Details | Review
Reorganize g_type_class_ref() (1.54 KB, patch)
2009-09-24 10:52 UTC, Benjamin Otte (Company)
none Details | Review
Add type_data_ref_U() and use it in g_type_class_ref() (3.56 KB, patch)
2009-09-24 10:52 UTC, Benjamin Otte (Company)
none Details | Review

Description Antoine Tremblay 2009-06-10 18:47:44 UTC
------------------------

The gory part  : 

After having hudge performance decreases passed a certain point in a GStreamer based
multithreaded application. I started profiling the mutexes in glib and found :

Profiling Result : (Note that this backtrace is only the first taken path for the mutex)

Mutex : 0x7f89b902aca8 , type 0, num_locks : 8143886 num_read_locks 0, num_write_locks 0 max_read_wait 0, max_write_wait 0
bt : /opt/trx-HEAD/glib/2.18.0//lib/libglib-2.0.so.0(g_my_profiling_add_mutex+0xe1) [0x7f89b82170d1]
bt : /opt/trx-HEAD/glib/2.18.0//lib/libgthread-2.0.so.0 [0x7f89b89e2601]
bt : /opt/trx-HEAD/glib/2.18.0//lib/libglib-2.0.so.0(g_static_rw_lock_writer_lock+0x64) [0x7f89b8217494]
bt : /opt/trx-HEAD/glib/2.18.0//lib/libgobject-2.0.so.0(g_type_init_with_debug_flags+0x4a) [0x7f89b8e0bc8a]
bt : /opt/trx-HEAD/gstreamer/gstreamer/0.10.20/lib/libgstreamer-0.10.so.0 [0x7f89b9052b57]
bt : /opt/trx-HEAD/glib/2.18.0//lib/libglib-2.0.so.0(g_option_context_parse+0x58) [0x7f89b81fd288]
bt : /opt/trx-HEAD/gstreamer/gstreamer/0.10.20/lib/libgstreamer-0.10.so.0(gst_init_check+0xad) [0x7f89b9051e8d]
bt : /opt/trx-HEAD/gstreamer/gstreamer/0.10.20/lib/libgstreamer-0.10.so.0(gst_init+0x17) [0x7f89b9051f77]



This mutex is taken 8 million times for 100 gstreamer transcoding  pipelines during 5 minutes
Or about 26 000 times per second!!

The next mutex most taken is taken about 800 000 times.. 

The mutex in question is : g_type_rw_lock (or more presisely  it's own real ->mutex)

This is a read write lock that is used to protect the class information of a type.

------------------------


To better understand the problem here's a quick intro on how to types work in glib :

Type generation : 
   	
	A type is created by a get_type function like gst_buffer_get_type ();
	This creates a static GTypeInfo structure. This contains the function 
	pointers used to do class and instance initialisation for example.

Type registration :

     The first time the get_type function is called for a for a type.
     g_type_register_static will be called with the GTypeInfo structure
     and a parent structure.

     This will register the type in a tree of types using the parent type.
     This tree of types consists of TypeInfo nodes with the fundamental
     glib types as their root. 

     Once this is done a dynamic TypeData structure is created 
     (data = g_malloc(sizeof (TypeData)..) this data pointer is part of the 
     TypeInfo node. And the GTypeInfo information is copied
     on that structure.

Refcounting :

    One important thing to consider about this type system that is directly
    related to the mutex contention problem is the refcounting of these types.
    
    Once your TypeData structure is created, any new object of that type
    will increment a refcount in this structure. So that if there are no more
    objects using that type. The TypeData structure will be freed. 

    (Do not get confused with the register_"static" .. static types are
    still refcounted and are "dynamic" in that sense.. the static in that func
    has nothing to do what that)

Instance creation :

       To create an object you use g_object_new or more generally
       g_type_create_instance.

       This will call g_type_class_ref on the type of the object since
       a new object for that class is created. And g_type_class_unref
       when it is destructed.

       And here the fun starts...

Locking :

     Since the TypeInfo->data pointer for the TypeData structure 
     for a type can be created and destroyed as it's refcount
     reaches 0 it must somehow be locked.

     Glib is using a read/write lock called g_type_rw_lock for that.
    
     At this point, I think it's worth taking a look at how a read/write lock works in glib :
     
     The two functions used to take a read or write lock for the g_type_rw_lock mutex are :
     g_static_rw_lock_reader_lock and g_static_rw_lock_writer_lock 

     Checking the implementation of g_static_rw_lock_reader_lock :

       g_static_mutex_lock (&lock->mutex);
       lock->want_to_read++;	   

         while (lock->have_writer || lock->want_to_write)
   	       g_static_rw_lock_wait (&lock->read_cond, &lock->mutex);

         lock->want_to_read--;
  	 lock->read_counter++;
	 g_static_mutex_unlock (&lock->mutex);

     And for g_static_rw_lock_writer_lock :

      g_static_mutex_lock (&lock->mutex);
      lock->want_to_write++;

        while (lock->have_writer || lock->read_counter)
   	       g_static_rw_lock_wait (&lock->write_cond, &lock->mutex);
  	
	 lock->want_to_write--;
	 lock->have_writer = TRUE;
	 g_static_mutex_unlock (&lock->mutex);	

   And : g_static_rw_lock_wait is a simple g_cond_wait on lock->mutex

   The unlock functions lock the lock->mutex and change the counts / flags 
   as expected.
   
   So in fact each _lock operation on a rw lock can lock the mutex twice
   once to change it's counter and once to retake the lock after.
   And each unlock takes the mutex once.

   So as you do a rw_read_lock /unlock you do at least 2 locks and 2 unlocks
   on the same mutex. And these locks are only there to protect the rw locks 
   counters. I think this could be done atomicly and create much less overhead
   for a lot of usecases.

   Ok so to end my kinda lengthy parenthesis on the rw_lock mutexes suffice it
   to say if you use a rw lock "mutex" a lot .. you will lock a lot more of real mutexes
   on the way and multiply your contention.


Contention :

   So now that we know that it's possible to have contention with a rw lock 
   why do we have so much contention ?

   Because each call to g_object_new / g_type_create_instance must call 
   g_type_class_ref and the destruction _unref and take the type_rw_lock!!

   So each time an object is created or destroyed in your whole application
   this static rw lock is taken !!!

   On multithreaded applications that create a lot of objects like applications
   based on GStreamer that creates an enormous amount of GstBuffers  
   that becomes a real problem.

   I did some profiling with LTTng and saw that at some point the contention was
   so great that the futex started taking the slow-path in the kernel and this
   created a hudge performance decrease and mutilple calls from the kernel
   to switch the process from a cpu to another... litteraly killing the performance
   of the whole application.


So What can be done :

 The ugly patch :
     
     A quick and easy workaround is to optimise the common path to g_class_ref
     for a type of object like GstBuffer so that the lock is not taken and that it is
     reffed only once and not unreffed and that the class information in the case of 
     g_class_ref is returned directly. But this assumes that your type will stay reffed 
     for the whole duration of the program.

     In the case of my gstreamer based application this gets rid of the problem
     as GstBuffers are by far the most instanciated objects.

The clean patch :
    
    This would make use of atomic operations to do the refcounting and possibly
    do that refcounting in the static GTypeInfo structure...  And use a lock
    only to protect the creation /destruction of the  TypeInfo->data pointer.
    
    Or some other idea... ?
Comment 1 Antoine Tremblay 2009-06-10 19:49:03 UTC
oops forget about adding refcounting in the GTyepInfo struct.. this is const.. and would require a large rewrite of the using apps... :(

Must be another way
Comment 2 Wim Taymans 2009-09-16 08:32:44 UTC
or change glib to use proper posix read/write locks. I might give that a try today.
Comment 3 Lennart Poettering 2009-09-16 18:21:45 UTC
also, see this:

http://0pointer.de/blog/projects/mutrace
Comment 4 Edward Hervey 2009-09-17 10:17:27 UTC
*** Bug 595394 has been marked as a duplicate of this bug. ***
Comment 5 Edward Hervey 2009-09-20 11:06:57 UTC
I took wim's work and rewrote GStaticRWLock to use native rwlock

It's pending work, but brings a 10-15% boost to the performance test ABOVE alexander's patches.

http://cgit.freedesktop.org/~bilboed/glib/log/
Comment 6 Edward Hervey 2009-09-21 07:48:54 UTC
Using a native rwlock doesn't really help us in the end :(

I wrote a simple stress case for GStreamer which spawns threads that create/destroys gstreamer buffers (which are miniobjects, as lightweight as you can get). (I'm on a dual core system).

#./gstbufferstress 1 500000
main(): Creating 1 threads.
total 0:00:00.283225000 - average 0:00:00.000000566  - Thread 0
*** total 0:00:00.283791000 - average 0:00:00.000000567  - Done creating 500000 buffers

When using 2 threads... the results can vary drastically:

#./gstbufferstress 2 500000
main(): Creating 2 threads.
total 0:00:02.457827000 - average 0:00:00.000004915  - Thread 1
total 0:00:02.464033000 - average 0:00:00.000004928  - Thread 0
*** total 0:00:02.464540000 - average 0:00:00.000002464  - Done creating 1000000 buffers

#./gstbufferstress 2 500000
main(): Creating 2 threads.
total 0:00:00.779137000 - average 0:00:00.000001558  - Thread 0
total 0:00:00.802021000 - average 0:00:00.000001604  - Thread 1
*** total 0:00:00.823024000 - average 0:00:00.000000823  - Done creating 1000000 buffers


So... it ranges from... being 50% slower than a single thread.. to being 5 times slower. (while the numbers in an individual thread going up 3-9 times).

FWIW, Let's imagine the top scenario with 100 pipelines.
$ ./gstbufferstress 100 500000
main(): Creating 100 threads.
total 0:01:27.863730000 - average 0:00:00.000175727  - Thread 37
total 0:01:31.278407000 - average 0:00:00.000182556  - Thread 64
...
total 0:02:09.661608000 - average 0:00:00.000259323  - Thread 13
total 0:02:09.707792000 - average 0:00:00.000259415  - Thread 50
*** total 0:02:09.756068000 - average 0:00:00.000002595  - Done creating 50000000 buffers

Ideally we want to reach (time_per_buffer_single_threaded / num_cores)... not got the complete other direction and have that multiplied by 5.



The culprit is the "one-lock-to-rule-them-all" design in gtype.c

I'm having a look at splitting this up in a more MT-friendly design (while obviously keeping it API/ABI compatible).
Comment 7 Edward Hervey 2009-09-21 07:50:03 UTC
Created attachment 143571 [details]
gstbuffer stress test

The buffer stress test used above.

compile with `pkg-config --cflags --libs gstreamer-0.10`
Comment 8 Wim Taymans 2009-09-21 08:17:10 UTC
(In reply to comment #6)
> Using a native rwlock doesn't really help us in the end :(
> 

It will but the first problem to solve is not using native rw locks but changing glib to not take the writer lock all the time.
Comment 9 Edward Hervey 2009-09-21 08:39:34 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > Using a native rwlock doesn't really help us in the end :(
> > 
> 
> It will but the first problem to solve is not using native rw locks but
> changing glib to not take the writer lock all the time.

Ergo the "really" in my sentence :)
Comment 10 Edward Hervey 2009-09-23 18:14:30 UTC
Created attachment 143818 [details] [review]
gobject/gtype.c: Lockless type reffing/unreffing. Fixes #585375

This is basically done by the following:
* Move ref_count from TypeData to TypeNode
* Only access ref_count unlocked with g_atomic_int_*()

Passes all checks.
Comment 11 Edward Hervey 2009-09-23 18:15:46 UTC
It does need some cleanups though (about to go out for Merce Festival, but thought people would like to try it out).

Testing, feedback, comments more than welcome.
Comment 12 Benjamin Otte (Company) 2009-09-23 18:42:27 UTC
The patch is a bit big, it'd be nice if the stuff that marks as /* (UN)LOCKED */ were a different patch.
I also don't like that you make the locked and the unlocked code call the same function.

Last but not least, the patch is buggy. 
- You need to do the first ref/last unref while holding the lock. Otherwise there's a race between unlocking the mutex and changing the refcount.
- You need to unref first thing after locking, otherwise other code can happily take a reference while you're destructing the class data.

All of this is a bit ugly because you can call g_type_class_ref() on a type without a reference and the code needs to account for that.

Do you have a test where you just do do { g_type_class_unref (g_type_class_ref (my_type)); } while (TRUE); in lots of threads?

Anyway, I guess I'm gonna look at this tomorrow :)
Comment 13 Edward Hervey 2009-09-23 18:46:47 UTC
will fix that tomorrow. Just thought people could have a shoot at it.
Comment 14 Edward Hervey 2009-09-24 07:30:36 UTC
I'm cleaning/fixing the patch according to comment #12
Comment 15 Edward Hervey 2009-09-24 07:46:15 UTC
Created attachment 143873 [details] [review]
gobject/gtype.c: Lockless type reffing/unreffing. Fixes #585375

This is basically done by the following:
* Move ref_count from TypeData to TypeNode
* Only access ref_count unlocked with g_atomic_int_*()
* ref_count corresponds to the existence of TypeData, therefore
  only the creation of destruction of TypeData is protected by the
  WRITE_LOCK.

Modified according to comments by Benjamin Otte

Passes all checks.
Comment 16 Edward Hervey 2009-09-24 08:56:49 UTC
Created attachment 143883 [details] [review]
gobject/tests: Add a test that refs/unrefs a GType in several threads.
Comment 17 Edward Hervey 2009-09-24 10:20:29 UTC
Created attachment 143890 [details] [review]
gobject/tests: New test for dynamic class creation/destruction

Starts 3 threads which ref/unref a dynamic class.
Comment 18 Benjamin Otte (Company) 2009-09-24 10:51:36 UTC
Created attachment 143891 [details] [review]
Move ref_count from TypeNode->data to TypeNode
Comment 19 Benjamin Otte (Company) 2009-09-24 10:51:41 UTC
Created attachment 143892 [details] [review]
Pass the TypeNode to type_data_last_unref_Wm()

Previously the GType was looked up just for calling the function
Also moves the unref functions together in the code.
Comment 20 Benjamin Otte (Company) 2009-09-24 10:51:47 UTC
Created attachment 143893 [details] [review]
Move setting the refcount to the end of the function

This is a safety feature for when making it atomic later.
Comment 21 Benjamin Otte (Company) 2009-09-24 10:51:53 UTC
Created attachment 143894 [details] [review]
Add a NODE_REFCOUNT getter

This is useful when moving the code to be atomic.
It also will make that patch smaller.
Comment 22 Benjamin Otte (Company) 2009-09-24 10:52:00 UTC
Created attachment 143895 [details] [review]
type_data_unref_Wm => type_data_unref_U

Make the type unref function not hold any locks when called. This
makes it easier to optimize it to be atomic later.
Comment 23 Benjamin Otte (Company) 2009-09-24 10:52:06 UTC
Created attachment 143896 [details] [review]
Make all accesses of Node->ref_count atomic

This does not change any locking behavior at all, it just replaces
simple getters/setters of the variable with atomic versions.

The ref_count variable was kept as unsigned, even though that requires
casting for all operations, to mirror GObject->refcount.
Comment 24 Benjamin Otte (Company) 2009-09-24 10:52:12 UTC
Created attachment 143897 [details] [review]
Make type_data_unref_U not take locks in the common case
Comment 25 Benjamin Otte (Company) 2009-09-24 10:52:19 UTC
Created attachment 143898 [details] [review]
Make ClassData->init_State atomic

This is necessary to make g_type_class_ref() lockless.
Comment 26 Benjamin Otte (Company) 2009-09-24 10:52:26 UTC
Created attachment 143899 [details] [review]
Reorganize g_type_class_ref()

Moves the first check out of the lock, as it's not required.
Comment 27 Benjamin Otte (Company) 2009-09-24 10:52:32 UTC
Created attachment 143900 [details] [review]
Add type_data_ref_U() and use it in g_type_class_ref()

The function returns TRUE if the type was previously initialized and can
be easily reused. It returns FALSE and does not take a reference if the
type is not referenced yet.

g_type_class_ref() uses this to avoid taking locks in the common path,
which speeds up object creation a lot - in particular in multithreaded
applications.
Comment 28 Benjamin Otte (Company) 2009-09-24 10:56:31 UTC
Here's a huge patchset I came up with when splitting the above patch from Edward to understand it better. The patches are on top of the gobject-performance branch (where I intend to push them once Edward says it looks ok to him).

I hope they work the intended way, I didn't do extensive testing. They certainly pass the attached testcase.
Comment 29 Wim Taymans 2009-09-25 08:59:30 UTC
Antoine, any chance that you could try the optimisations that have been done in the gobject-performance branch in glib git?
Comment 30 Antoine Tremblay 2009-09-25 13:39:01 UTC
I should have time next week to test it :)
Comment 31 Alexander Larsson 2009-10-02 19:14:16 UTC
The gobject-performance branch is now rebased onto the gobject-performance-2 branch based on glib 2.18, with some stuff merged.
Comment 32 Edward Hervey 2009-10-03 07:21:30 UTC
Alex, you got me scared there :) It's rebased against 2.22.1 and not 2.18

I'll clean up my persona repos soon, got some additions to the performance and performance-threaded tests, as well as some other tests (that unfortunately show bugs that were present before).
Comment 33 Javier Jardón (IRC: jjardon) 2009-11-30 23:54:40 UTC
As gobject-performance-2 has been merged, Can this bug be closed?