After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 774254 - vaapi: deadlock during seek of certain media in gst-validate
vaapi: deadlock during seek of certain media in gst-validate
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer-vaapi
git master
Other Linux
: Normal normal
: 1.10.3
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks: 774241
 
 
Reported: 2016-11-11 10:32 UTC by Hyunjun Ko
Modified: 2017-01-10 12:23 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
decoder: h264: set out_frame's pts only if the frame has available pts (1.18 KB, patch)
2016-11-16 08:53 UTC, Hyunjun Ko
none Details | Review
libs: decoder: h264: doesn't set codec frame's pts in case of cloned picture (1.54 KB, patch)
2016-12-26 06:38 UTC, Hyunjun Ko
none Details | Review
libs: decoder: h264: doesn't update some attributes in case of cloned picture (1.71 KB, patch)
2017-01-10 04:52 UTC, Hyunjun Ko
committed Details | Review

Description Hyunjun Ko 2016-11-11 10:32:11 UTC
Tested video : GH1_00094_1920x1280.MTS

Reproduced step :
run gst-play-1.0 GH1_00094_1920x1280.MTS
and just seek a couple of times.

This deadlock leads to several timeout failure in TCs of gst-validate.
Comment 1 Thibault Saunier 2016-11-11 11:40:15 UTC
fwiw it is probably simpler to use gstv-validate to debug this kind of things as the tests are more reproducible, you can for example run:

gst-validate-1.0 playbin uri=file://..../GH1_00094_1920x1280.MTS --set-scenario=seek_forward

Also gst-validate-launcher has many options so it is usable for debugging purposes (--debug help you get into gdb when a test times out for example, --gdb runs the tests in gdb and lets you debug them on failure/timeout), latest master will also print a stack trace on segfault or timeout by default.

Just saying as I am working on making gst-validate a debugging tool :)
Comment 2 Hyunjun Ko 2016-11-16 08:27:22 UTC
(In reply to Thibault Saunier from comment #1)
> fwiw it is probably simpler to use gstv-validate to debug this kind of
> things as the tests are more reproducible, you can for example run:
> 
> gst-validate-1.0 playbin uri=file://..../GH1_00094_1920x1280.MTS
> --set-scenario=seek_forward
> 
> Also gst-validate-launcher has many options so it is usable for debugging
> purposes (--debug help you get into gdb when a test times out for example,
> --gdb runs the tests in gdb and lets you debug them on failure/timeout),
> latest master will also print a stack trace on segfault or timeout by
> default.
> 
> Just saying as I am working on making gst-validate a debugging tool :)

Thanks Thibalut. :)
I hope that feature of printing stack trace would come soon!
Comment 3 Hyunjun Ko 2016-11-16 08:50:52 UTC
This issue is about timestamp.
First of all, definately I'm not a specialist of decoder/encoder, but I'm trying to describe what this problem is.

In case of this media, vaapih264dec decodes 2 frames per one gst buffer from upstream, which is unusual (to me at least :P...)
At the moment, for the first frame of two, it can't get pts from gst_video_decoder_get_timestamp_at_offset, because offset is not enough to get pts from preserved list, then set GST_CLOCK_TIME_NONE.
(See https://cgit.freedesktop.org/gstreamer/gst-plugins-base/tree/gst-libs/gst/video/gstvideodecoder.c#n1968 )

With this timestamp, decoder calls finish_frame and pushed it to downstream with "guessed timestamp", which might be wrong after seek (if buffer with invalid pts is passed from upstream... this media is the case)

Then let's see this deadlock.
If a pipeline running this media has only video pipeline, seek is working fine.
But with audio pipeline, deadlock happens as the following.
1. seek
2. video pipeline reaches ASYNC_DONE, but waiting for another pipeline.
3. audio pipeline failed to reach ASYNC_DONE.
=> Because video pipeline is trying to push buffers downstream even though the buffer has invalid pts. Pushing buffer succeeds after change with guessed pts. 
But after some push, surface exhaustion happens finally.

With libav decoder, the video frames(with invalid pts) are all dropped and working fine.
Comment 4 Hyunjun Ko 2016-11-16 08:53:00 UTC
Created attachment 339998 [details] [review]
decoder: h264: set out_frame's pts only if the frame has  available pts

I'm not sure if this patch is proper fix or just workaround.
Please review this.
Comment 5 Víctor Manuel Jáquez Leal 2016-11-16 10:24:09 UTC
Did you check why gst_adapter_prev_pts() [1] is returning GST_CLOCK_TIME_NONE? is it because the frame is a DISCONT?

1. https://cgit.freedesktop.org/gstreamer/gstreamer-vaapi/tree/gst-libs/gst/vaapi/gstvaapidecoder.c#n334
Comment 6 Hyunjun Ko 2016-11-17 01:17:11 UTC
(In reply to Víctor Manuel Jáquez Leal from comment #5)
> Did you check why gst_adapter_prev_pts() [1] is returning
> GST_CLOCK_TIME_NONE? is it because the frame is a DISCONT?
> 
> 1.
> https://cgit.freedesktop.org/gstreamer/gstreamer-vaapi/tree/gst-libs/gst/
> vaapi/gstvaapidecoder.c#n334

Well, as far as I see, decode_step is not used at all except for test decoder in tests/
Comment 7 Víctor Manuel Jáquez Leal 2016-11-17 10:04:14 UTC
(In reply to Hyunjun Ko from comment #6)
> (In reply to Víctor Manuel Jáquez Leal from comment #5)
> > Did you check why gst_adapter_prev_pts() [1] is returning
> > GST_CLOCK_TIME_NONE? is it because the frame is a DISCONT?
> > 
> > 1.
> > https://cgit.freedesktop.org/gstreamer/gstreamer-vaapi/tree/gst-libs/gst/
> > vaapi/gstvaapidecoder.c#n334
> 
> Well, as far as I see, decode_step is not used at all except for test
> decoder in tests/

You're right. We should get rid of all the rotten/unused code in libgstvaapi.

> Because video pipeline is trying to push buffers downstream even though the 
> buffer has invalid pts. Pushing buffer succeeds after change with guessed pts. 
> But after some push, surface exhaustion happens finally.

We should rethink the whole surface_ready_mutex mechanism.
 
> With libav decoder, the video frames(with invalid pts) are all dropped and 
> working fine.

Where does libav do that?

I only see the "ghost" frame removal: 
https://cgit.freedesktop.org/gstreamer/gst-libav/tree/ext/libav/gstavviddec.c#n1583

I don't know, perhaps that's our problem.
Comment 8 Hyunjun Ko 2016-11-18 00:46:58 UTC
(In reply to Víctor Manuel Jáquez Leal from comment #7)
> 
> We should rethink the whole surface_ready_mutex mechanism.
>  
Totally agree! though I have no idae of this currently.

> > With libav decoder, the video frames(with invalid pts) are all dropped and 
> > working fine.
> 
> Where does libav do that?
> 
> I only see the "ghost" frame removal: 
> https://cgit.freedesktop.org/gstreamer/gst-libav/tree/ext/libav/gstavviddec.
> c#n1583
> 
> I don't know, perhaps that's our problem.

Drop's done in VideoDecoder in gst_video_decoder_clip_and_push_buf

[libav]
Seek
-> got buffer from upstrem, but pts is out of segment
-> decoded
-> dropped this frame in gst_video_decoder_clip_and_push_buf until valid pts(inside segment) comes.

[vaapi]
Seek
-> got buffer from upstrem, but pts is out of segment
-> decoded
-> because of what i mentioned above, guessed timestamp in gst_video_decoder_prepare_finish_frame, which results in pts inside segemnt
-> pushed continuously.
-> surface exhaustion until ASYNC_DONE in audio pipeline.

Note that in case of libav decoder, it's packetized, which means doesn't call add_frame/have_frame in GstVideoDecoder. So it's not being through what i mentioned in  gst_video_decoder_get_timestamp_at_offset, which is called only in have_frame. So it doesn't have to guess timestamp because pts is not GST_CLOCK_TIME_NONE, but out of segmet and dropped finally
Comment 9 Víctor Manuel Jáquez Leal 2016-12-14 15:29:47 UTC
Comment on attachment 339998 [details] [review]
decoder: h264: set out_frame's pts only if the frame has  available pts

We now know what is the problem. But this is not the way to fix it. We rather fix videodecoder base class, or build up our internal solution in vaapivideodecode as is done in videodecoder
Comment 10 Hyunjun Ko 2016-12-15 07:06:38 UTC
(In reply to Víctor Manuel Jáquez Leal from comment #9)
> Comment on attachment 339998 [details] [review] [review]
> decoder: h264: set out_frame's pts only if the frame has  available pts
> 
> We now know what is the problem. But this is not the way to fix it. We
> rather fix videodecoder base class, or build up our internal solution in
> vaapivideodecode as is done in videodecoder

Totally agreed.
I'll be working on this issue next week.
Comment 11 Hyunjun Ko 2016-12-26 06:37:28 UTC
IIUC, the problem explained above occurs only if decoder has to create cloned picture. This picture's pts is set by parent picture in gst_vaapi_picture_create, which means it doesn't need to set pts again.
Comment 12 Hyunjun Ko 2016-12-26 06:38:50 UTC
Created attachment 342474 [details] [review]
libs: decoder: h264: doesn't set codec frame's pts in case of cloned picture

If it's cloned picture, it has pts from parent picture already.
In addition, base decoder doesn't set valid pts to the frame
corresponding to cloned picture.

See gst_video_decoder_get_timestamp_at_offset in gstvideodecoder.c


This is similar to the previous patch, but I think this is better than it.
Comment 13 Víctor Manuel Jáquez Leal 2017-01-06 13:04:38 UTC
Review of attachment 342474 [details] [review]:

:D great!

::: gst-libs/gst/vaapi/gstvaapidecoder_h264.c
@@ +3269,3 @@
   base_picture->type = GST_VAAPI_PICTURE_TYPE_NONE;
   base_picture->view_id = pi->view_id;
   base_picture->voc = pi->voc;

since type, view_id and voc are also set in gst_vaapi_picture_create() when a picture clone is created, I wonder is should also put those assignation inside of the if. Also use G_LIKELY for optimization, since it is going to be the common case (not having a parent_picture)
Comment 14 Hyunjun Ko 2017-01-10 04:44:05 UTC
> since type, view_id and voc are also set in gst_vaapi_picture_create() when
> a picture clone is created, I wonder is should also put those assignation
> inside of the if. Also use G_LIKELY for optimization, since it is going to
> be the common case (not having a parent_picture)

Yes. I think so about other assignation.
And good idea about G_LIKELY. I have never seen this case except for this media.
Comment 15 Hyunjun Ko 2017-01-10 04:52:02 UTC
Created attachment 343209 [details] [review]
libs: decoder: h264: doesn't update some attributes in case of cloned picture

If it's cloned picture, it has pts from parent picture already.
In addition, base decoder doesn't set valid pts to the frame
corresponding to cloned picture.
Comment 16 Víctor Manuel Jáquez Leal 2017-01-10 12:07:48 UTC
Committed with some modifications in the commit log and the code commentary.

commit 58e7b575bbcaac8a9b1d442087ccf0dcb34313d5
Author: Hyunjun Ko <zzoon@igalia.com>
Date:   Tue Jan 10 13:49:27 2017 +0900

    libs: decoder: h264: don't update cloned attributes

    If the frame is a cloned picture, its PTS comes from its parent
    picture.  In addition, the base decoder doesn't set a valid PTS to
    the frame corresponding to the cloned picture.

    https://bugzilla.gnome.org/show_bug.cgi?id=774254
Comment 17 Víctor Manuel Jáquez Leal 2017-01-10 12:23:15 UTC
Also pushed for branch 1.10

commit c94b83ebdc6470fc28f599b7139b722dc16399df
Author: Hyunjun Ko <zzoon@igalia.com>
Date:   Tue Jan 10 13:49:27 2017 +0900

    libs: decoder: h264: don't update cloned attributes

    If the frame is a cloned picture, its PTS comes from its parent
    picture.  In addition, the base decoder doesn't set a valid PTS to
    the frame corresponding to the cloned picture.

    https://bugzilla.gnome.org/show_bug.cgi?id=774254