After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 755260 - decodebin: Fix a race condition accessing the decode_chain field.
decodebin: Fix a race condition accessing the decode_chain field.
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: Normal normal
: 1.7.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2015-09-19 12:32 UTC by GstBlub
Modified: 2015-12-01 17:42 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch (4.53 KB, patch)
2015-09-19 12:32 UTC, GstBlub
committed Details | Review

Description GstBlub 2015-09-19 12:32:31 UTC
In our application I run into this issue quite frequently.  This patch seems to help, this is what I usually would run into:

1:11:49.314779015 18194 0xb20d9d40 WARN                 basesrc gstbasesrc.c:3460:gst_base_src_start_complete:<source> pad not activated yet
1:11:49.315069258 18194 0xb20d9d40 WARN                GST_PADS gstpad.c:1058:gst_pad_set_active:<qtdemux456:sink> Failed to activate pad
1:11:49.315317966 18194 0xb20d9d40 WARN               decodebin gstdecodebin2.c:2218:connect_pad:<decodebin294> Couldn't set qtdemux456 to PAUSED
GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_lock': Invalid argument.  Aborting.
[New LWP 24512]

Program received signal SIGABRT, Aborted.
[Switching to LWP 24512]
0xffffe424 in __kernel_vsyscall ()
(gdb) thread apply all bt

Thread 11 (LWP 24512)

  • #0 __kernel_vsyscall
  • #1 raise
    from /lib/libc.so.6
  • #2 abort
    from /lib/libc.so.6
  • #3 ??
    from /usr/lib/libglib-2.0.so.0
  • #4 connect_pad
    at gstdecodebin2.c line 2222
  • #5 analyze_new_pad
    at gstdecodebin2.c line 1742
  • #6 type_found
    at gstdecodebin2.c line 2535
  • #7 ffi_call_SYSV
    from /usr/lib/libffi.so.6
  • #8 ffi_call
    from /usr/lib/libffi.so.6
  • #9 g_cclosure_marshal_generic
    from /usr/lib/libgobject-2.0.so.0
  • #10 g_closure_invoke
    from /usr/lib/libgobject-2.0.so.0
  • #11 ??
    from /usr/lib/libgobject-2.0.so.0
  • #12 g_signal_emit_valist
    from /usr/lib/libgobject-2.0.so.0
  • #13 g_signal_emit
    from /usr/lib/libgobject-2.0.so.0
  • #14 gst_type_find_element_loop
    at gsttypefindelement.c line 1091
  • #15 gst_task_func
    at gsttask.c line 317
  • #16 start_thread
    from /lib/libpthread.so.0
  • #17 clone
    from /lib/libc.so.6

This bug is related to bug #690420 and may be a duplicate of bug #752651.
Comment 1 GstBlub 2015-09-19 12:32:55 UTC
Created attachment 311671 [details] [review]
Patch
Comment 2 Sebastian Dröge (slomo) 2015-09-19 16:05:24 UTC
Review of attachment 311671 [details] [review]:

I think you found something here that caused a lot of problems in the past, there should be a few more spurious decodebin2 crash bug reports about this

::: gst/playback/gstdecodebin2.c
@@ +2844,3 @@
+
+    decode_bin->decode_chain = gst_decode_chain_new (decode_bin, NULL, pad);
+    chain = gst_decode_chain_ref (decode_bin->decode_chain);

It's worrying that the chain can get destroyed here if we don't hold the expose lock. Maybe the code below should be called with some lock held instead so that shutting down would block on that (e.g. the chain lock)?

Also this is probably not the only problematic place then
Comment 3 GstBlub 2015-09-19 20:33:20 UTC
This patch effectively introduces a new state for the GstDecodeChain.  Freeing and then actually destroying it (when the ref count hits zero).  As far as I can tell, freeing the chain and its content is always done with the expose lock held, it however won't actually be destroyed unless no one else (e.g. a type_found signal scheduled to be executed) might care about it.  So if anyone else (like type_found) may access the chain and it already got freed (but not destroyed), it can still lock the chain's mutex and check if it was freed.

The reason I'm adding a ref in the quoted hunk is that because I need to unlock the expose lock prior to calling into analyze_new_pad().  Because it needs to be unlocked, a state change could happen that would otherwise free and destroy the decode_chain.  Because it keeps an additional reference prior to calling into analyze_new_pad(), a state change allows the decode_chain to be freed, but it won't actually be destroyed (meaning the chain's mutex is still usable) until both analyze_new_pad() and the state change handler no longer need the GstDecodeChain.  Within analyze_new_pad() access to the chain doesn't really need the expose lock, it only needs the chain's lock itself.  The expose lock really only should be necessary when accessing the decode_chain member, which means if any code needs to use the debug chain and unlock the expose chain, it should be ref'ed and when it no longer is needed it should be unref'ed.

I hope this makes sense.
Comment 4 GstBlub 2015-09-19 20:48:34 UTC
You're right, though, I think the analyze_new_pad() function should probably lock the chain's mutex as well...
Comment 5 Vincent Penquerc'h 2015-12-01 17:42:09 UTC
I pushed this patch along the ones from https://bugzilla.gnome.org/show_bug.cgi?id=752651

The test case from the other bug works for quite a while (though not 100%) with all those patches. Any further fix can be merged later if necessary.

Thanks

commit 2c62aad159694b0f4fb1d19297abc4c38650bcb9
Author: Thomas Bluemel <tbluemel@control4.com>
Date:   Fri Nov 6 14:21:14 2015 +0000

    [PATCH] Fix a race condition accessing the decode_chain field.
    
    Make sure that any access to the GstDecodeBin's decode_chain
    field is protected using the EXPOSE_LOCK.  Also add a simple
    reference counter to the GstDecodeChain structure so that when
    the type_found signal fires it can hold onto the decode chain
    even while the EXPOSE_LOCK is not held.  This should fix a
    race condition if the type_found signal fires right in the
    middle of a state change that messes with the same decode
    chain.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=755260