GNOME Bugzilla – Bug 774254
vaapi: deadlock during seek of certain media in gst-validate
Last modified: 2017-01-10 12:23:15 UTC
Tested video : GH1_00094_1920x1280.MTS Reproduced step : run gst-play-1.0 GH1_00094_1920x1280.MTS and just seek a couple of times. This deadlock leads to several timeout failure in TCs of gst-validate.
fwiw it is probably simpler to use gstv-validate to debug this kind of things as the tests are more reproducible, you can for example run: gst-validate-1.0 playbin uri=file://..../GH1_00094_1920x1280.MTS --set-scenario=seek_forward Also gst-validate-launcher has many options so it is usable for debugging purposes (--debug help you get into gdb when a test times out for example, --gdb runs the tests in gdb and lets you debug them on failure/timeout), latest master will also print a stack trace on segfault or timeout by default. Just saying as I am working on making gst-validate a debugging tool :)
(In reply to Thibault Saunier from comment #1) > fwiw it is probably simpler to use gstv-validate to debug this kind of > things as the tests are more reproducible, you can for example run: > > gst-validate-1.0 playbin uri=file://..../GH1_00094_1920x1280.MTS > --set-scenario=seek_forward > > Also gst-validate-launcher has many options so it is usable for debugging > purposes (--debug help you get into gdb when a test times out for example, > --gdb runs the tests in gdb and lets you debug them on failure/timeout), > latest master will also print a stack trace on segfault or timeout by > default. > > Just saying as I am working on making gst-validate a debugging tool :) Thanks Thibalut. :) I hope that feature of printing stack trace would come soon!
This issue is about timestamp. First of all, definately I'm not a specialist of decoder/encoder, but I'm trying to describe what this problem is. In case of this media, vaapih264dec decodes 2 frames per one gst buffer from upstream, which is unusual (to me at least :P...) At the moment, for the first frame of two, it can't get pts from gst_video_decoder_get_timestamp_at_offset, because offset is not enough to get pts from preserved list, then set GST_CLOCK_TIME_NONE. (See https://cgit.freedesktop.org/gstreamer/gst-plugins-base/tree/gst-libs/gst/video/gstvideodecoder.c#n1968 ) With this timestamp, decoder calls finish_frame and pushed it to downstream with "guessed timestamp", which might be wrong after seek (if buffer with invalid pts is passed from upstream... this media is the case) Then let's see this deadlock. If a pipeline running this media has only video pipeline, seek is working fine. But with audio pipeline, deadlock happens as the following. 1. seek 2. video pipeline reaches ASYNC_DONE, but waiting for another pipeline. 3. audio pipeline failed to reach ASYNC_DONE. => Because video pipeline is trying to push buffers downstream even though the buffer has invalid pts. Pushing buffer succeeds after change with guessed pts. But after some push, surface exhaustion happens finally. With libav decoder, the video frames(with invalid pts) are all dropped and working fine.
Created attachment 339998 [details] [review] decoder: h264: set out_frame's pts only if the frame has available pts I'm not sure if this patch is proper fix or just workaround. Please review this.
Did you check why gst_adapter_prev_pts() [1] is returning GST_CLOCK_TIME_NONE? is it because the frame is a DISCONT? 1. https://cgit.freedesktop.org/gstreamer/gstreamer-vaapi/tree/gst-libs/gst/vaapi/gstvaapidecoder.c#n334
(In reply to Víctor Manuel Jáquez Leal from comment #5) > Did you check why gst_adapter_prev_pts() [1] is returning > GST_CLOCK_TIME_NONE? is it because the frame is a DISCONT? > > 1. > https://cgit.freedesktop.org/gstreamer/gstreamer-vaapi/tree/gst-libs/gst/ > vaapi/gstvaapidecoder.c#n334 Well, as far as I see, decode_step is not used at all except for test decoder in tests/
(In reply to Hyunjun Ko from comment #6) > (In reply to Víctor Manuel Jáquez Leal from comment #5) > > Did you check why gst_adapter_prev_pts() [1] is returning > > GST_CLOCK_TIME_NONE? is it because the frame is a DISCONT? > > > > 1. > > https://cgit.freedesktop.org/gstreamer/gstreamer-vaapi/tree/gst-libs/gst/ > > vaapi/gstvaapidecoder.c#n334 > > Well, as far as I see, decode_step is not used at all except for test > decoder in tests/ You're right. We should get rid of all the rotten/unused code in libgstvaapi. > Because video pipeline is trying to push buffers downstream even though the > buffer has invalid pts. Pushing buffer succeeds after change with guessed pts. > But after some push, surface exhaustion happens finally. We should rethink the whole surface_ready_mutex mechanism. > With libav decoder, the video frames(with invalid pts) are all dropped and > working fine. Where does libav do that? I only see the "ghost" frame removal: https://cgit.freedesktop.org/gstreamer/gst-libav/tree/ext/libav/gstavviddec.c#n1583 I don't know, perhaps that's our problem.
(In reply to Víctor Manuel Jáquez Leal from comment #7) > > We should rethink the whole surface_ready_mutex mechanism. > Totally agree! though I have no idae of this currently. > > With libav decoder, the video frames(with invalid pts) are all dropped and > > working fine. > > Where does libav do that? > > I only see the "ghost" frame removal: > https://cgit.freedesktop.org/gstreamer/gst-libav/tree/ext/libav/gstavviddec. > c#n1583 > > I don't know, perhaps that's our problem. Drop's done in VideoDecoder in gst_video_decoder_clip_and_push_buf [libav] Seek -> got buffer from upstrem, but pts is out of segment -> decoded -> dropped this frame in gst_video_decoder_clip_and_push_buf until valid pts(inside segment) comes. [vaapi] Seek -> got buffer from upstrem, but pts is out of segment -> decoded -> because of what i mentioned above, guessed timestamp in gst_video_decoder_prepare_finish_frame, which results in pts inside segemnt -> pushed continuously. -> surface exhaustion until ASYNC_DONE in audio pipeline. Note that in case of libav decoder, it's packetized, which means doesn't call add_frame/have_frame in GstVideoDecoder. So it's not being through what i mentioned in gst_video_decoder_get_timestamp_at_offset, which is called only in have_frame. So it doesn't have to guess timestamp because pts is not GST_CLOCK_TIME_NONE, but out of segmet and dropped finally
Comment on attachment 339998 [details] [review] decoder: h264: set out_frame's pts only if the frame has available pts We now know what is the problem. But this is not the way to fix it. We rather fix videodecoder base class, or build up our internal solution in vaapivideodecode as is done in videodecoder
(In reply to Víctor Manuel Jáquez Leal from comment #9) > Comment on attachment 339998 [details] [review] [review] > decoder: h264: set out_frame's pts only if the frame has available pts > > We now know what is the problem. But this is not the way to fix it. We > rather fix videodecoder base class, or build up our internal solution in > vaapivideodecode as is done in videodecoder Totally agreed. I'll be working on this issue next week.
IIUC, the problem explained above occurs only if decoder has to create cloned picture. This picture's pts is set by parent picture in gst_vaapi_picture_create, which means it doesn't need to set pts again.
Created attachment 342474 [details] [review] libs: decoder: h264: doesn't set codec frame's pts in case of cloned picture If it's cloned picture, it has pts from parent picture already. In addition, base decoder doesn't set valid pts to the frame corresponding to cloned picture. See gst_video_decoder_get_timestamp_at_offset in gstvideodecoder.c This is similar to the previous patch, but I think this is better than it.
Review of attachment 342474 [details] [review]: :D great! ::: gst-libs/gst/vaapi/gstvaapidecoder_h264.c @@ +3269,3 @@ base_picture->type = GST_VAAPI_PICTURE_TYPE_NONE; base_picture->view_id = pi->view_id; base_picture->voc = pi->voc; since type, view_id and voc are also set in gst_vaapi_picture_create() when a picture clone is created, I wonder is should also put those assignation inside of the if. Also use G_LIKELY for optimization, since it is going to be the common case (not having a parent_picture)
> since type, view_id and voc are also set in gst_vaapi_picture_create() when > a picture clone is created, I wonder is should also put those assignation > inside of the if. Also use G_LIKELY for optimization, since it is going to > be the common case (not having a parent_picture) Yes. I think so about other assignation. And good idea about G_LIKELY. I have never seen this case except for this media.
Created attachment 343209 [details] [review] libs: decoder: h264: doesn't update some attributes in case of cloned picture If it's cloned picture, it has pts from parent picture already. In addition, base decoder doesn't set valid pts to the frame corresponding to cloned picture.
Committed with some modifications in the commit log and the code commentary. commit 58e7b575bbcaac8a9b1d442087ccf0dcb34313d5 Author: Hyunjun Ko <zzoon@igalia.com> Date: Tue Jan 10 13:49:27 2017 +0900 libs: decoder: h264: don't update cloned attributes If the frame is a cloned picture, its PTS comes from its parent picture. In addition, the base decoder doesn't set a valid PTS to the frame corresponding to the cloned picture. https://bugzilla.gnome.org/show_bug.cgi?id=774254
Also pushed for branch 1.10 commit c94b83ebdc6470fc28f599b7139b722dc16399df Author: Hyunjun Ko <zzoon@igalia.com> Date: Tue Jan 10 13:49:27 2017 +0900 libs: decoder: h264: don't update cloned attributes If the frame is a cloned picture, its PTS comes from its parent picture. In addition, the base decoder doesn't set a valid PTS to the frame corresponding to the cloned picture. https://bugzilla.gnome.org/show_bug.cgi?id=774254