GNOME Bugzilla – Bug 755260
decodebin: Fix a race condition accessing the decode_chain field.
Last modified: 2015-12-01 17:42:09 UTC
In our application I run into this issue quite frequently. This patch seems to help, this is what I usually would run into: 1:11:49.314779015 18194 0xb20d9d40 WARN basesrc gstbasesrc.c:3460:gst_base_src_start_complete:<source> pad not activated yet 1:11:49.315069258 18194 0xb20d9d40 WARN GST_PADS gstpad.c:1058:gst_pad_set_active:<qtdemux456:sink> Failed to activate pad 1:11:49.315317966 18194 0xb20d9d40 WARN decodebin gstdecodebin2.c:2218:connect_pad:<decodebin294> Couldn't set qtdemux456 to PAUSED GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_lock': Invalid argument. Aborting. [New LWP 24512] Program received signal SIGABRT, Aborted. [Switching to LWP 24512] 0xffffe424 in __kernel_vsyscall () (gdb) thread apply all bt
+ Trace 235472
Thread 11 (LWP 24512)
This bug is related to bug #690420 and may be a duplicate of bug #752651.
Created attachment 311671 [details] [review] Patch
Review of attachment 311671 [details] [review]: I think you found something here that caused a lot of problems in the past, there should be a few more spurious decodebin2 crash bug reports about this ::: gst/playback/gstdecodebin2.c @@ +2844,3 @@ + + decode_bin->decode_chain = gst_decode_chain_new (decode_bin, NULL, pad); + chain = gst_decode_chain_ref (decode_bin->decode_chain); It's worrying that the chain can get destroyed here if we don't hold the expose lock. Maybe the code below should be called with some lock held instead so that shutting down would block on that (e.g. the chain lock)? Also this is probably not the only problematic place then
This patch effectively introduces a new state for the GstDecodeChain. Freeing and then actually destroying it (when the ref count hits zero). As far as I can tell, freeing the chain and its content is always done with the expose lock held, it however won't actually be destroyed unless no one else (e.g. a type_found signal scheduled to be executed) might care about it. So if anyone else (like type_found) may access the chain and it already got freed (but not destroyed), it can still lock the chain's mutex and check if it was freed. The reason I'm adding a ref in the quoted hunk is that because I need to unlock the expose lock prior to calling into analyze_new_pad(). Because it needs to be unlocked, a state change could happen that would otherwise free and destroy the decode_chain. Because it keeps an additional reference prior to calling into analyze_new_pad(), a state change allows the decode_chain to be freed, but it won't actually be destroyed (meaning the chain's mutex is still usable) until both analyze_new_pad() and the state change handler no longer need the GstDecodeChain. Within analyze_new_pad() access to the chain doesn't really need the expose lock, it only needs the chain's lock itself. The expose lock really only should be necessary when accessing the decode_chain member, which means if any code needs to use the debug chain and unlock the expose chain, it should be ref'ed and when it no longer is needed it should be unref'ed. I hope this makes sense.
You're right, though, I think the analyze_new_pad() function should probably lock the chain's mutex as well...
I pushed this patch along the ones from https://bugzilla.gnome.org/show_bug.cgi?id=752651 The test case from the other bug works for quite a while (though not 100%) with all those patches. Any further fix can be merged later if necessary. Thanks commit 2c62aad159694b0f4fb1d19297abc4c38650bcb9 Author: Thomas Bluemel <tbluemel@control4.com> Date: Fri Nov 6 14:21:14 2015 +0000 [PATCH] Fix a race condition accessing the decode_chain field. Make sure that any access to the GstDecodeBin's decode_chain field is protected using the EXPOSE_LOCK. Also add a simple reference counter to the GstDecodeChain structure so that when the type_found signal fires it can hold onto the decode chain even while the EXPOSE_LOCK is not held. This should fix a race condition if the type_found signal fires right in the middle of a state change that messes with the same decode chain. https://bugzilla.gnome.org/show_bug.cgi?id=755260