After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 784911 - aggregate: chain_internal function called from two different threads for the same pad
aggregate: chain_internal function called from two different threads for the ...
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-bad
git master
Other Linux
: Normal normal
: 1.13.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks: 784846
 
 
Reported: 2017-07-13 15:00 UTC by Vivia Nikolaidou
Modified: 2017-12-05 15:42 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
aggregator: Don't block if adding to the tail of the queue (1.12 KB, patch)
2017-07-13 22:41 UTC, Olivier Crête
committed Details | Review
aggregator: Don't take flush lock from output thread (1.18 KB, patch)
2017-07-13 23:06 UTC, Olivier Crête
none Details | Review
aggregator: Don't take flush lock from output thread (2.05 KB, patch)
2017-07-13 23:09 UTC, Olivier Crête
none Details | Review
0001-aggregator-Don-t-take-flush-lock-from-output-thread.patch (2.52 KB, patch)
2017-07-14 14:16 UTC, Vivia Nikolaidou
needs-work Details | Review
aggregator: Don't take flush lock from output thread (2.38 KB, patch)
2017-10-21 10:07 UTC, Olivier Crête
committed Details | Review

Description Vivia Nikolaidou 2017-07-13 15:00:55 UTC
The muxer is proprietary so it's been edited out, but here's a backtrace I keep getting when remuxing a specific file:


Thread 4 (Thread 0x7ffff11df700 (LWP 31461))

  • #0 syscall
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
  • #1 g_cond_wait
    at ././glib/gthread-posix.c line 1395
  • #2 gst_aggregator_pad_chain_internal
    at gstaggregator.c line 2560
  • #3 gst_pad_chain_data_unchecked
    at gstpad.c line 4205
  • #4 gst_pad_push_data
    at gstpad.c line 4457
  • #5 gst_pad_push
    at gstpad.c line 4576
  • #6 gst_queue_push_one
    at gstqueue.c line 1365
  • #7 gst_queue_loop
    at gstqueue.c line 1517
  • #8 gst_task_func
    at gsttask.c line 332
  • #9 g_thread_pool_thread_proxy
    at ././glib/gthreadpool.c line 307
  • #10 g_thread_proxy
    at ././glib/gthread.c line 784
  • #11 start_thread
    at pthread_create.c line 333
  • #12 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 97

Thread 2 (Thread 0x7ffff21e1700 (LWP 31459))

  • #0 syscall
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
  • #1 g_mutex_lock_slowpath
    at ././glib/gthread-posix.c line 1313
  • #2 g_mutex_lock
    at ././glib/gthread-posix.c line 1337
  • #3 gst_aggregator_pad_chain_internal
    at gstaggregator.c line 2512
  • #4 gst_aggregator_default_sink_event
    at gstaggregator.c line 1493
  • #5 gst_nbmxfmux_sink_event(GstAggregator*, GstAggregatorPad*, GstEvent*)
    at libgstnbmxfmux.c line 584
  • #6 check_events
    at gstaggregator.c line 786
  • #7 gst_aggregator_iterate_sinkpads
    at gstaggregator.c line 392
  • #8 gst_aggregator_aggregate_func
    at gstaggregator.c line 1105
  • #9 gst_task_func
    at gsttask.c line 332
  • #10 g_thread_pool_thread_proxy
    at ././glib/gthreadpool.c line 307
  • #11 g_thread_proxy
    at ././glib/gthread.c line 784
  • #12 start_thread
    at pthread_create.c line 333
  • #13 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 97

Comment 1 Sebastian Dröge (slomo) 2017-07-13 15:19:16 UTC
It's probably also not a good idea that while waiting the pad holds the flush lock.
Comment 2 Vivia Nikolaidou 2017-07-13 15:28:49 UTC
git bisect showed me this commit as first bad:

commit 20bf97f08912f0edb72bfdcdde4e5c40acb29823
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Sun May 21 14:34:13 2017 +0200

    aggregator: Check for the result of caps events
    
    https://bugzilla.gnome.org/show_bug.cgi?id=782918


Therefore, this bug is most likely related to https://bugzilla.gnome.org/show_bug.cgi?id=784846 .
Comment 3 Olivier Crête 2017-07-13 22:41:45 UTC
Created attachment 355557 [details] [review]
aggregator: Don't block if adding to the tail of the queue

If we're adding to the tail of the queue, it's because we're converting
a gap event, so don't block there it means we're calling from the output
thread.
Comment 4 Olivier Crête 2017-07-13 22:42:30 UTC
The patch I just attached hasn't been too tested, let mw know if it fixes your problem and we can QA it a bit more before merging.
Comment 5 Olivier Crête 2017-07-13 22:52:16 UTC
Actually the patch doesn't fix the problem seen in #784846 , also need to not take the flush lock if head==TRUE, and then figure out why it still doesnt work.. grrr..
Comment 6 Olivier Crête 2017-07-13 23:06:33 UTC
Created attachment 355558 [details] [review]
aggregator: Don't take flush lock from output thread

Instead just take it in the chain function.

This should fix this bug entirely.
Comment 7 Olivier Crête 2017-07-13 23:09:33 UTC
Created attachment 355559 [details] [review]
aggregator: Don't take flush lock from output thread

Instead just take it in the chain function.

Oops, previous patch ignored the erorr cases, this is more simple anyway.
Comment 8 Vivia Nikolaidou 2017-07-14 10:46:21 UTC
Works, thanks a lot! :)
Comment 9 Nicolas Dufresne (ndufresne) 2017-07-14 13:16:19 UTC
Review of attachment 355557 [details] [review]:

Looks good.
Comment 10 Nicolas Dufresne (ndufresne) 2017-07-14 13:25:35 UTC
Review of attachment 355559 [details] [review]:

Seems fair, though can you update the comment next to the flush lock, so we remember that it shouldn't be taken from the srcpad stream thread. Also, maybe a comment somewhere next to chain_internal that it can be called from the srcpad thread for GAP handling ?
Comment 11 Vivia Nikolaidou 2017-07-14 14:16:55 UTC
Created attachment 355598 [details] [review]
0001-aggregator-Don-t-take-flush-lock-from-output-thread.patch

Thanks for reviewing those! :) I added the comments.
Comment 12 Nicolas Dufresne (ndufresne) 2017-07-14 14:26:38 UTC
Review of attachment 355598 [details] [review]:

Still missing an update to the priv->flush_lock comment.

::: gst-libs/gst/base/gstaggregator.c
@@ +2488,3 @@
 
+/* This function can be called from the srcpad streaming thread for GAP
+ * handling. Don't get the flush lock in here */

Documentation should state:
- It's called normally from sinkpad stream threads, in which case it's called with the FLUSH_LOCK held.
- In may also be called from the srcpad thread in the case of GAP handling,
Comment 13 Olivier Crête 2017-07-17 23:19:43 UTC
(In reply to Nicolas Dufresne (stormer) from comment #12)
> - In may also be called from the srcpad thread in the case of GAP handling,

Actually it can't be taken from the output thread, otherwise we get a deadlock.
Comment 14 Nicolas Dufresne (ndufresne) 2017-07-18 01:17:15 UTC
(In reply to Olivier Crête from comment #13)
> (In reply to Nicolas Dufresne (stormer) from comment #12)
> > - In may also be called from the srcpad thread in the case of GAP handling,
> 
> Actually it can't be taken from the output thread, otherwise we get a
> deadlock.

You miss-read, the *function* is called from those two threads, not the lock.
Comment 15 Olivier Crête 2017-10-21 10:07:05 UTC
Created attachment 362005 [details] [review]
aggregator: Don't take flush lock from output thread

Updated patch based on Nicolas's comments and rebased on current git master.
Comment 16 Olivier Crête 2017-10-23 10:08:53 UTC
Patches merged

commit 25ea37e7d9b54bb47218b1f7d7c51c4f56d25171
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Thu Jul 13 19:03:19 2017 -0400

    aggregator: Don't take flush lock from output thread
    
    Instead just take it in the chain function.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=784911

commit bb5a8ebec32b65d7ce8bbcb063020cc166fe57c3
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Thu Jul 13 18:38:34 2017 -0400

    aggregator: Don't block if adding to the tail of the queue
    
    If we're adding to the tail of the queue, it's because we're converting
    a gap event, so don't block there it means we're calling from the output
    thread.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=784911