GNOME Bugzilla – Bug 784911
aggregate: chain_internal function called from two different threads for the same pad
Last modified: 2017-12-05 15:42:25 UTC
The muxer is proprietary so it's been edited out, but here's a backtrace I keep getting when remuxing a specific file:
+ Trace 237647
Thread 4 (Thread 0x7ffff11df700 (LWP 31461))
Thread 2 (Thread 0x7ffff21e1700 (LWP 31459))
It's probably also not a good idea that while waiting the pad holds the flush lock.
git bisect showed me this commit as first bad: commit 20bf97f08912f0edb72bfdcdde4e5c40acb29823 Author: Olivier Crête <olivier.crete@collabora.com> Date: Sun May 21 14:34:13 2017 +0200 aggregator: Check for the result of caps events https://bugzilla.gnome.org/show_bug.cgi?id=782918 Therefore, this bug is most likely related to https://bugzilla.gnome.org/show_bug.cgi?id=784846 .
Created attachment 355557 [details] [review] aggregator: Don't block if adding to the tail of the queue If we're adding to the tail of the queue, it's because we're converting a gap event, so don't block there it means we're calling from the output thread.
The patch I just attached hasn't been too tested, let mw know if it fixes your problem and we can QA it a bit more before merging.
Actually the patch doesn't fix the problem seen in #784846 , also need to not take the flush lock if head==TRUE, and then figure out why it still doesnt work.. grrr..
Created attachment 355558 [details] [review] aggregator: Don't take flush lock from output thread Instead just take it in the chain function. This should fix this bug entirely.
Created attachment 355559 [details] [review] aggregator: Don't take flush lock from output thread Instead just take it in the chain function. Oops, previous patch ignored the erorr cases, this is more simple anyway.
Works, thanks a lot! :)
Review of attachment 355557 [details] [review]: Looks good.
Review of attachment 355559 [details] [review]: Seems fair, though can you update the comment next to the flush lock, so we remember that it shouldn't be taken from the srcpad stream thread. Also, maybe a comment somewhere next to chain_internal that it can be called from the srcpad thread for GAP handling ?
Created attachment 355598 [details] [review] 0001-aggregator-Don-t-take-flush-lock-from-output-thread.patch Thanks for reviewing those! :) I added the comments.
Review of attachment 355598 [details] [review]: Still missing an update to the priv->flush_lock comment. ::: gst-libs/gst/base/gstaggregator.c @@ +2488,3 @@ +/* This function can be called from the srcpad streaming thread for GAP + * handling. Don't get the flush lock in here */ Documentation should state: - It's called normally from sinkpad stream threads, in which case it's called with the FLUSH_LOCK held. - In may also be called from the srcpad thread in the case of GAP handling,
(In reply to Nicolas Dufresne (stormer) from comment #12) > - In may also be called from the srcpad thread in the case of GAP handling, Actually it can't be taken from the output thread, otherwise we get a deadlock.
(In reply to Olivier Crête from comment #13) > (In reply to Nicolas Dufresne (stormer) from comment #12) > > - In may also be called from the srcpad thread in the case of GAP handling, > > Actually it can't be taken from the output thread, otherwise we get a > deadlock. You miss-read, the *function* is called from those two threads, not the lock.
Created attachment 362005 [details] [review] aggregator: Don't take flush lock from output thread Updated patch based on Nicolas's comments and rebased on current git master.
Patches merged commit 25ea37e7d9b54bb47218b1f7d7c51c4f56d25171 Author: Olivier Crête <olivier.crete@collabora.com> Date: Thu Jul 13 19:03:19 2017 -0400 aggregator: Don't take flush lock from output thread Instead just take it in the chain function. https://bugzilla.gnome.org/show_bug.cgi?id=784911 commit bb5a8ebec32b65d7ce8bbcb063020cc166fe57c3 Author: Olivier Crête <olivier.crete@collabora.com> Date: Thu Jul 13 18:38:34 2017 -0400 aggregator: Don't block if adding to the tail of the queue If we're adding to the tail of the queue, it's because we're converting a gap event, so don't block there it means we're calling from the output thread. https://bugzilla.gnome.org/show_bug.cgi?id=784911