GNOME Bugzilla – Bug 599870
[API] micro-optimisation: add GST_DEBUG_FUNCPTR_FULL
Last modified: 2009-12-02 00:59:24 UTC
Created attachment 146401 [details] [review] Add GST_DEBUG_FUNCPTR_FULL() and use it in core Edward mentioned the other day that GST_DEBUG_FUNCPTR showed up in his list when benchmarking decodebin2 prerolling, so that got me curious. Turns out that in the course of playbin2 prerolling to play a video we are re-registering about 1100 function names/pointers we've already registered. Almost all of these come from gstpad.c and gstghostpad.c, and every attempt will incur a hash table lookup protected by a mutex. Not sure if contention was actually observed or not, but it's conceivable and is likely to be more noticable in more complex dyanamic pipelines (buzztard etc.?). Enter GST_DEBUG_FUNCPTR_FULL() which will shortcut the registration based on a local (static) gboolean. Admittedly not very elegant, but shortcuts 99% of the re-registration attempts already, just by using it in gstpad.c and gstghostpad.c. Not entirely sure if it's worth it at all, absolute numbers are still fairly small after all, or if there's a nicer way or if it should just be kept internal, but I thought I'd put it out there for other people to comment on.
Makes sense... letting instantiation block on a global mutex is probably not the best idea :) Changing that mutex to a RW lock and letting _gst_debug_register_funcptr() first take a R lock to check if it already is inserted and only take a W lock if it isn't might be a cleaner fix though.
Review of attachment 146401 [details] [review]: ::: gst/gstinfo.h @@ +1398,2 @@ #define GST_DEBUG_FUNCPTR(ptr) (ptr) +#define GST_DEBUG_FUNCPTR_FULL(ptr,inited) (ptr) When debugging is disabled, shouldn't that 'inited' boolean be removed somehow (since it won't be used) ?
(In reply to comment #1) > Makes sense... letting instantiation block on a global mutex is probably not > the best idea :) > > Changing that mutex to a RW lock and letting _gst_debug_register_funcptr() > first take a R lock to check if it already is inserted and only take a W lock > if it isn't might be a cleaner fix though. Use a rwlock will be even worse, since the glib rwlock isn't a native implementation, but instead takes two regular locks... so best case scenario you'd be now taking two lock, and worst case scenario four :)
Yes, I'm talking about sane RW locks :) Also, didn't you write a patch to let the RW locks use the pthread RW locks because of this? :P
> When debugging is disabled, shouldn't that 'inited' boolean be removed somehow > (since it won't be used) ? I guess so. I wasn't going to litter the code with #ifndef #endif though unless a compiler actually complained though. Filed it in the bridges-to-cross-when-we-get-there-section for now.
> Changing that mutex to a RW lock and letting _gst_debug_register_funcptr() > first take a R lock to check if it already is inserted and only take a W lock > if it isn't might be a cleaner fix though. Depends on whether the issue was primarily contention (doubtful?) or the hash table lookups I guess.
(In reply to comment #4) > Yes, I'm talking about sane RW locks :) Also, didn't you write a patch to let > the RW locks use the pthread RW locks because of this? :P Also you could use #ifdef HAVE_PTHREAD a pthread RW lock implementation inside gstinfo.c for that hash table, use GStaticRWMutex if GLib new enough and contains your patch and otherwise continue to use the broken normal mutex otherwise. (In reply to comment #6) > Depends on whether the issue was primarily contention (doubtful?) or the hash > table lookups I guess. I was expecting that contention was the problem here. Hash table lookups should be rather fast in this case, it should more or less just be somearray[(funcptr % somevalue)] and check if the pointers are equal (*) If there are multiple function pointers mapped to the same bucket this would just be another (very few) indirections through a linked list (chained hash table implementation) or some more of (*) (open addressing). Another source of overhead would be the static mutex implementation btw, Edward said it's very inefficient too... so maybe a GMutex instead of a GStaticMutex would help too.
(In reply to comment #6) > > Changing that mutex to a RW lock and letting _gst_debug_register_funcptr() > > first take a R lock to check if it already is inserted and only take a W lock > > if it isn't might be a cleaner fix though. > > Depends on whether the issue was primarily contention (doubtful?) or the hash > table lookups I guess. both actually. For every single element/pad/object that you create that uses those decorators... you'll be: (1) introducing extra contention, what's worse not once but many times (for every GST_DEBUG_FUNCPTR call) (2) using g_hash_table ... when you shouldn't need to Maybe we could adjust the macro to be a G_STMT with the volatile inside.
Thinking back to my non-existent CS days, shouldn't the ideal hash table lookup in this case be lockless? (Or rather, lockless check followed by lock;recheck;register;unlock.) If this is not the case, let's fix glib.
(In reply to comment #9) > Thinking back to my non-existent CS days, shouldn't the ideal hash table lookup > in this case be lockless? (Or rather, lockless check followed by > lock;recheck;register;unlock.) If this is not the case, let's fix glib. Well, GHashTable simply isn't threadsafe and you need to do the locking from the outside :(
(In reply to comment #8) > (In reply to comment #6) > > > Changing that mutex to a RW lock and letting _gst_debug_register_funcptr() > > > first take a R lock to check if it already is inserted and only take a W lock > > > if it isn't might be a cleaner fix though. > > > > Depends on whether the issue was primarily contention (doubtful?) or the hash > > table lookups I guess. > Maybe we could adjust the macro to be a G_STMT with the volatile inside. That's probably the best solution... any reason not to do that? :)
> > Maybe we could adjust the macro to be a G_STMT with the volatile inside. > > That's probably the best solution... any reason not to do that? :) I didn't know it was possible to do that in a portable way? (afaik this would rely on gcc extensions) (Also, while it's cleaner API wise it's really a bit ugly under the hood, with it adding 5-20 static booleans per function where just a single one would do just as well in practice...) FWIW, my plan was to just put the macro into gstprivate.h for now and commit the gst/*pad* bits, which takes care of 90% of the calls and presumably contention already, unless we can make the macro suggestion work in a portable way.