After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 792341 - pad: Deadlock after upgrade from 1.12.2 to 1.12.4 (regression)
pad: Deadlock after upgrade from 1.12.2 to 1.12.4 (regression)
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
1.12.4
Other Linux
: Normal blocker
: 1.12.5
Assigned To: GStreamer Maintainers
GStreamer Maintainers
: 792460 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2018-01-08 20:48 UTC by Carlos Alberto Lopez Perez
Modified: 2018-01-16 16:27 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
thread apply all bt (34.40 KB, text/plain)
2018-01-08 20:48 UTC, Carlos Alberto Lopez Perez
  Details
thread apply all bt full (74.49 KB, text/plain)
2018-01-08 20:49 UTC, Carlos Alberto Lopez Perez
  Details
Result of "t a a bt" in gdb (83.78 KB, text/plain)
2018-01-11 21:26 UTC, Alex Băluț
  Details
gstpad: Release pending g_cond_wait() when stopping task (892 bytes, patch)
2018-01-15 17:16 UTC, Edward Hervey
accepted-commit_now Details | Review
gstpad: Release pending g_cond_wait() when stopping/pausing task (1.26 KB, patch)
2018-01-15 17:18 UTC, Edward Hervey
committed Details | Review
stack trace after patch (51.87 KB, text/plain)
2018-01-15 17:40 UTC, Christoph Reiter (lazka)
  Details
gstpad: Avoid stream-dead-lock on deactivation (1.84 KB, patch)
2018-01-16 09:24 UTC, Edward Hervey
committed Details | Review

Description Carlos Alberto Lopez Perez 2018-01-08 20:48:29 UTC
Created attachment 366517 [details]
thread apply all bt

I have WPE (WebKit) running on a Freescale (NXP) i.MX6 board (ARMv7).

The browser (WPE) is running on a Weston session and its configured to leverage accelerated video decoding via gstreamer-imx and gstreamer-gl (glupload).

The application is a web page that plays videos in a loop and changes from one video to the next by pre-loading the next video some seconds before the current one stops playing and making it appear on the screen at the same time it hides and destroys the old one (internally this uses a different gstreamer pipeline per <video> element)

This was working fine with 1.12.2, but after an upgrade to 1.12.4 it started to freeze after a few minutes of playing and a restart was needed.

I tracked this down to the commits 884e4b78105db30c21a0bea96cc666252140ee37 and 1a6c4be242d6d684b03bea6cba339207358faf79 ( from bug 790431 ) that were merged in 1.12.4.

I tested to revert this two commits locally and the freeze is gone.

I attached GDB to the WPEWebProcess when it was in frozen state and I dumped some backtraces (one with "thread apply all bt" and the other one with "thread apply all bt full"). Find both the files attached here. 

I think the deadlock happens between thread 1 and thread 13, as both are inside gst_pad_set_active()
Comment 1 Carlos Alberto Lopez Perez 2018-01-08 20:49:04 UTC
Created attachment 366518 [details]
thread apply all bt full
Comment 2 Thibault Saunier 2018-01-09 14:22:38 UTC
There is some confusion here, you link to #790431 but the commits you link are from #781018. From your explanation your issue is actually related to #781018 meaning the problem is in gst-plugins-bad (actually -base in master as gl plugins were moved there).

Please confirm where the regression comes from.
Comment 3 Carlos Alberto Lopez Perez 2018-01-09 14:36:53 UTC
(In reply to Thibault Saunier from comment #2)
> There is some confusion here, you link to #790431 but the commits you link
> are from #781018. From your explanation your issue is actually related to
> #781018 meaning the problem is in gst-plugins-bad (actually -base in master
> as gl plugins were moved there).
> 
> Please confirm where the regression comes from.

Sorry, I pasted the wrong git sha1 above.

The commits I was referring to should be:

6019d8c3cdfc346051c86cb83fba59d4e16111b4 gstpad: Make pad (de)activation atomic
4fbdffd8d686b92d4d88a2571cc1ee2183407987 gstpad: Make calls to GstPadActivateFunction MT-safe

Both from gstreamer core repository and related to bug 790431
Comment 4 Alex Băluț 2018-01-11 21:26:58 UTC
Created attachment 366687 [details]
Result of "t a a bt" in gdb

I also see this problem in Pitivi. I can investigate to give more details if it helps.
Comment 5 Sebastian Dröge (slomo) 2018-01-13 10:24:40 UTC
*** Bug 792460 has been marked as a duplicate of this bug. ***
Comment 6 Sebastian Dröge (slomo) 2018-01-13 10:28:56 UTC
Both backtraces are the same. One thread is reactivating the typefind pad in PULL mode (from the demuxer wanting to run in PULL mode itself instead of having typefind run in PULL mode), the other thread (from change_state on decodebin) is stopping the typefind task (and thus deactivating the pad).

This should probably never happen in parallel and be protected by some mutex until the other one is finished.
Comment 7 Edward Hervey 2018-01-15 17:16:06 UTC
Created attachment 366851 [details] [review]
gstpad: Release pending g_cond_wait() when stopping task

Otherwise we would deadlock waiting forever for the streaming lock
to be released
Comment 8 Edward Hervey 2018-01-15 17:18:54 UTC
Created attachment 366852 [details] [review]
gstpad: Release pending g_cond_wait() when stopping/pausing task

Otherwise we would deadlock waiting forever for the streaming lock
to be released
Comment 9 Sebastian Dröge (slomo) 2018-01-15 17:19:56 UTC
Comment on attachment 366851 [details] [review]
gstpad: Release pending g_cond_wait() when stopping task

same for pause_task() as you said on IRC
Comment 10 Edward Hervey 2018-01-15 17:25:25 UTC
Comment on attachment 366852 [details] [review]
gstpad: Release pending g_cond_wait() when stopping/pausing task

Pushed to master and 1.12
Comment 11 Christoph Reiter (lazka) 2018-01-15 17:40:52 UTC
Created attachment 366854 [details]
stack trace after patch

I still get a deadlock with current master
Comment 12 Edward Hervey 2018-01-16 09:24:23 UTC
Created attachment 366874 [details] [review]
gstpad: Avoid stream-dead-lock on deactivation

The following case can happen when two thread try to activate and
deactivate a pad at the same time:
T1: starts to deactivate, calls pre_activate(), sets in_activation
    to TRUE and carries on
T2: starts to activate, calls pre_activate(), in_activation is TRUE
    so it waits on the GCond
T1: calls post_activate(), tries to acquire the streaming lock ..
    but can't because T1 is currently holding it

With this patch, the deadlock will no longer happen but does not
solve the problem that:
T2: will resume activation of the pad, set the pad mode to the target
   one (PUSH or PULL) and eventually the streaming lock gets released.
T1: is able to finish calling post_activate() ... but ... the pad
   wasn't deactivated (T2 was the last one to "activate" the pad.
Comment 13 Christoph Reiter (lazka) 2018-01-16 10:13:45 UTC
No more deadlocks here, thanks!
Comment 14 Edward Hervey 2018-01-16 11:09:43 UTC
Thanks for confirming.
Comment 15 Sebastian Dröge (slomo) 2018-01-16 16:27:43 UTC
Can you open another bug for the remaining issue Edward?