GNOME Bugzilla – Bug 792341
pad: Deadlock after upgrade from 1.12.2 to 1.12.4 (regression)
Last modified: 2018-01-16 16:27:43 UTC
Created attachment 366517 [details] thread apply all bt I have WPE (WebKit) running on a Freescale (NXP) i.MX6 board (ARMv7). The browser (WPE) is running on a Weston session and its configured to leverage accelerated video decoding via gstreamer-imx and gstreamer-gl (glupload). The application is a web page that plays videos in a loop and changes from one video to the next by pre-loading the next video some seconds before the current one stops playing and making it appear on the screen at the same time it hides and destroys the old one (internally this uses a different gstreamer pipeline per <video> element) This was working fine with 1.12.2, but after an upgrade to 1.12.4 it started to freeze after a few minutes of playing and a restart was needed. I tracked this down to the commits 884e4b78105db30c21a0bea96cc666252140ee37 and 1a6c4be242d6d684b03bea6cba339207358faf79 ( from bug 790431 ) that were merged in 1.12.4. I tested to revert this two commits locally and the freeze is gone. I attached GDB to the WPEWebProcess when it was in frozen state and I dumped some backtraces (one with "thread apply all bt" and the other one with "thread apply all bt full"). Find both the files attached here. I think the deadlock happens between thread 1 and thread 13, as both are inside gst_pad_set_active()
Created attachment 366518 [details] thread apply all bt full
There is some confusion here, you link to #790431 but the commits you link are from #781018. From your explanation your issue is actually related to #781018 meaning the problem is in gst-plugins-bad (actually -base in master as gl plugins were moved there). Please confirm where the regression comes from.
(In reply to Thibault Saunier from comment #2) > There is some confusion here, you link to #790431 but the commits you link > are from #781018. From your explanation your issue is actually related to > #781018 meaning the problem is in gst-plugins-bad (actually -base in master > as gl plugins were moved there). > > Please confirm where the regression comes from. Sorry, I pasted the wrong git sha1 above. The commits I was referring to should be: 6019d8c3cdfc346051c86cb83fba59d4e16111b4 gstpad: Make pad (de)activation atomic 4fbdffd8d686b92d4d88a2571cc1ee2183407987 gstpad: Make calls to GstPadActivateFunction MT-safe Both from gstreamer core repository and related to bug 790431
Created attachment 366687 [details] Result of "t a a bt" in gdb I also see this problem in Pitivi. I can investigate to give more details if it helps.
*** Bug 792460 has been marked as a duplicate of this bug. ***
Both backtraces are the same. One thread is reactivating the typefind pad in PULL mode (from the demuxer wanting to run in PULL mode itself instead of having typefind run in PULL mode), the other thread (from change_state on decodebin) is stopping the typefind task (and thus deactivating the pad). This should probably never happen in parallel and be protected by some mutex until the other one is finished.
Created attachment 366851 [details] [review] gstpad: Release pending g_cond_wait() when stopping task Otherwise we would deadlock waiting forever for the streaming lock to be released
Created attachment 366852 [details] [review] gstpad: Release pending g_cond_wait() when stopping/pausing task Otherwise we would deadlock waiting forever for the streaming lock to be released
Comment on attachment 366851 [details] [review] gstpad: Release pending g_cond_wait() when stopping task same for pause_task() as you said on IRC
Comment on attachment 366852 [details] [review] gstpad: Release pending g_cond_wait() when stopping/pausing task Pushed to master and 1.12
Created attachment 366854 [details] stack trace after patch I still get a deadlock with current master
Created attachment 366874 [details] [review] gstpad: Avoid stream-dead-lock on deactivation The following case can happen when two thread try to activate and deactivate a pad at the same time: T1: starts to deactivate, calls pre_activate(), sets in_activation to TRUE and carries on T2: starts to activate, calls pre_activate(), in_activation is TRUE so it waits on the GCond T1: calls post_activate(), tries to acquire the streaming lock .. but can't because T1 is currently holding it With this patch, the deadlock will no longer happen but does not solve the problem that: T2: will resume activation of the pad, set the pad mode to the target one (PUSH or PULL) and eventually the streaming lock gets released. T1: is able to finish calling post_activate() ... but ... the pad wasn't deactivated (T2 was the last one to "activate" the pad.
No more deadlocks here, thanks!
Thanks for confirming.
Can you open another bug for the remaining issue Edward?