GNOME Bugzilla – Bug 778830
v4l2dec: Fix race when going from PAUSED to READY
Last modified: 2017-04-07 18:05:52 UTC
See commit message
Created attachment 346071 [details] [review] v4l2dec: Fix race when going from PAUSED to READY Running `gst-validate-launcher -t validate.file.playback.change_state_intensive.vorbis_vp8_1_webm` on odroid XU4 (s5p-mfc v4l2 driver) often leads to: ERROR:../subprojects/gst-plugins-good/sys/v4l2/gstv4l2videodec.c:215:gst_v4l2_video_dec_stop: assertion failed: (g_atomic_int_get (&self->processing) == FALSE) This happens when the following race happens: - T0: Main thread - T1: Upstream streaming thread - T2. v4l2dec processing thread) [The decoder is in PAUSED state] T0. The validate scenario runs `Executing (36/40) set-state: state=null repeat=40` T1- The decoder handles a frame T2- A decoded frame is push downstream T2- Downstream returns FLUSHING as it is already flushing changing state T2- The decoder stops its processing thread and sets `->processing = FALSE` T1- The decoder handles another frame T1- `->process` is FALSE so the decoder restarts its streaming thread T0- In v4l2dec-> stop the processing thread is stopped NOTE: At this point the processing thread loop never started. T0- assertion failed: (g_atomic_int_get (&self->processing) == FALSE) The taken solution in that patch is to delay the setting of `v4l2dec->process` to the time where the processing loop is actually started.
Review of attachment 346071 [details] [review]: Have you tested some aggressive seeks (where we don't wait for the seek to complete) ? This patch needs work since it make gst_v4l2_video_dec_loop_stopped() and some other code useless. It's all complicated, and I never found a good solution. What happens is that sometimes, the pad task will stop before gst_v4l2_video_dec_loop() is called. In this case, we ended up with bugs, since the handle_frame() should return FLUSHING. I think I was confused when I wrote this. Later Philipp Zabel found that we can read and sync on the pad task state. This is being used in _finish() function. I think the right solution is to completely remove this processing state and to use the pad state instead to detect early flush (while _process() is being called).
> Have you tested some aggressive seeks (where we don't wait for the seek to complete) ? This patch needs work since it make gst_v4l2_video_dec_loop_stopped() and some other code useless. Yeah, I ran the validate testsuite with all the scrub seeking tests with that patch and there seem to be no regression at all (and the testsuite results are pretty good on the platform, nice work!). > It's all complicated, and I never found a good solution. What happens is that sometimes, the pad task will stop before gst_v4l2_video_dec_loop() is called. In this case, we ended up with bugs, since the handle_frame() should return FLUSHING. My patch is about that case in the case of going back to READY, I am not sure what is your concerns about seeking here. > I think I was confused when I wrote this. Later Philipp Zabel found that we can read and sync on the pad task state. This is being used in _finish() function. I think the right solution is to completely remove this processing state and to use the pad state instead to detect early flush (while _process() is being called). That makes sense, will give that a try.
Created attachment 346389 [details] [review] v4l2dec: Fix race when going from PAUSED to READY Running `gst-validate-launcher -t validate.file.playback.change_state_intensive.vorbis_vp8_1_webm` on odroid XU4 (s5p-mfc v4l2 driver) often leads to: ERROR:../subprojects/gst-plugins-good/sys/v4l2/gstv4l2videodec.c:215:gst_v4l2_video_dec_stop: assertion failed: (g_atomic_int_get (&self->processing) == FALSE) This happens when the following race happens: - T0: Main thread - T1: Upstream streaming thread - T2. v4l2dec processing thread) [The decoder is in PAUSED state] T0. The validate scenario runs `Executing (36/40) set-state: state=null repeat=40` T1- The decoder handles a frame T2- A decoded frame is push downstream T2- Downstream returns FLUSHING as it is already flushing changing state T2- The decoder stops its processing thread and sets `->processing = FALSE` T1- The decoder handles another frame T1- `->process` is FALSE so the decoder restarts its streaming thread T0- In v4l2dec-> stop the processing thread is stopped NOTE: At this point the processing thread loop never started. T0- assertion failed: (g_atomic_int_get (&self->processing) == FALSE) Here I am removing the whole ->processing logic to base it all on the GstTask state to avoid duplicating the knowledge.
Created attachment 346390 [details] [review] pad: Add API to get the current state of a task Avoiding the user to need to deal with the locking himself etc. API: gst_pad_task_get_state
Comment on attachment 346390 [details] [review] pad: Add API to get the current state of a task >+ * Get @pad task state. If not task is currently >+ * set, GST_TASK_STOPPED is returned. not task -> no task >+ * Returns: The current state of @pad' task. @pad' -> pad's >+GstTaskState >+gst_pad_get_task_state (GstPad * pad) >+{ >+ ... >+ g_return_val_if_fail (GST_IS_PAD (pad), FALSE); Should probably return a GstTaskState and not a boolean :)
Created attachment 346392 [details] [review] pad: Add API to get the current state of a task - Fix documenation typos - Return a GstTaskStatus on assertions
Review of attachment 346389 [details] [review]: It would also be nice to make a version with an internal version of pad_get_state(), so we could merge in 1.10. ::: sys/v4l2/gstv4l2videodec.c @@ +478,3 @@ /* When flushing, decoding thread may never run */ + if (gst_pad_get_task_state (GST_VIDEO_DECODER_SRC_PAD (self)) != + GST_TASK_STOPPED) { Is that really possible in the task implementation ? When I read gst_task_func(), the state will always be stopped when this is called. That's because we take a ref in start_task, and will only leave gst_task_func() when the state is STOPPED. What this is trying to do, is to catch the case were we flush before our task function get called. Because we have no guaranty to be called once. In that case, the output_flow won't be set properly. Maybe with a small rework, we could manage to initially set the output_flow to FLUSHING, and get away with that ?
Review of attachment 346392 [details] [review]: ::: gst/gstpad.c @@ +6090,3 @@ + task = GST_PAD_TASK (pad); + if (task == NULL) + goto no_task; I think the goto is not really needed here. Just set ret = GST_TASK_STOPPED, and then else ... ::: gst/gstpad.h @@ +1418,3 @@ gboolean gst_pad_pause_task (GstPad *pad); gboolean gst_pad_stop_task (GstPad *pad); +GstTaskState gst_pad_get_task_state (GstPad *pad); Does not seem to align with other declarations.
Comment on attachment 346392 [details] [review] pad: Add API to get the current state of a task commit 6326c764ee85fa17f1d7838f6e4d5824e07e3132 Author: Thibault Saunier <thibault.saunier@osg.samsung.com> Date: Fri Feb 17 15:48:17 2017 -0300 pad: Add API to get the current state of a task Avoiding the user to need to deal with the locking himself etc. API: gst_pad_task_get_state https://bugzilla.gnome.org/show_bug.cgi?id=778830
Are you still looking into this issue ?
Yes, I have an updated version of the patch which seems to work well on odroid and wanted to check if I could test with vivid, gonna propose tomorrow.
Created attachment 349475 [details] [review] v4l2dec: Fix race when going from PAUSED to READY Running `gst-validate-launcher -t validate.file.playback.change_state_intensive.vorbis_vp8_1_webm` on odroid XU4 (s5p-mfc v4l2 driver) often leads to: ERROR:../subprojects/gst-plugins-good/sys/v4l2/gstv4l2videodec.c:215:gst_v4l2_video_dec_stop: assertion failed: (g_atomic_int_get (&self->processing) == FALSE) This happens when the following race happens: - T0: Main thread - T1: Upstream streaming thread - T2. v4l2dec processing thread) [The decoder is in PAUSED state] T0. The validate scenario runs `Executing (36/40) set-state: state=null repeat=40` T1- The decoder handles a frame T2- A decoded frame is push downstream T2- Downstream returns FLUSHING as it is already flushing changing state T2- The decoder stops its processing thread and sets `->processing = FALSE` T1- The decoder handles another frame T1- `->process` is FALSE so the decoder restarts its streaming thread T0- In v4l2dec-> stop the processing thread is stopped NOTE: At this point the processing thread loop never started. T0- assertion failed: (g_atomic_int_get (&self->processing) == FALSE) Here I am removing the whole ->processing logic to base it all on the GstTask state to avoid duplicating the knowledge.
Review of attachment 349475 [details] [review]: Loogs good to me. I'm quite happy the we are finally removing some of the atomic state.
commit 7b7a809818a0fa77dee1e340b42473a1e6e9ea8a Author: Thibault Saunier <thibault.saunier@osg.samsung.com> Date: Fri Feb 17 10:01:08 2017 -0300 v4l2dec: Fix race when going from PAUSED to READY Running `gst-validate-launcher -t validate.file.playback.change_state_intensive.vorbis_vp8_1_webm` on odroid XU4 (s5p-mfc v4l2 driver) often leads to: ERROR:../subprojects/gst-plugins-good/sys/v4l2/gstv4l2videodec.c:215:gst_v4l2_video_dec_stop: assertion failed: (g_atomic_int_get (&self->processing) == FALSE) This happens when the following race happens: - T0: Main thread - T1: Upstream streaming thread - T2. v4l2dec processing thread) [The decoder is in PAUSED state] T0. The validate scenario runs `Executing (36/40) set-state: state=null repeat=40` T1- The decoder handles a frame T2- A decoded frame is push downstream T2- Downstream returns FLUSHING as it is already flushing changing state T2- The decoder stops its processing thread and sets `->processing = FALSE` T1- The decoder handles another frame T1- `->process` is FALSE so the decoder restarts its streaming thread T0- In v4l2dec-> stop the processing thread is stopped NOTE: At this point the processing thread loop never started. T0- assertion failed: (g_atomic_int_get (&self->processing) == FALSE) Here I am removing the whole ->processing logic to base it all on the GstTask state to avoid duplicating the knowledge. https://bugzilla.gnome.org/show_bug.cgi?id=778830