After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 796521 - msdk: Playback not smooth by using ximagesink.
msdk: Playback not smooth by using ximagesink.
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-bad
git master
Other Linux
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on: 796699
Blocks: 789886
 
 
Reported: 2018-06-07 02:10 UTC by Fei
Modified: 2018-11-03 14:25 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Fei 2018-06-07 02:10:06 UTC
Reproduce Steps:
============================================
$gst-launch-1.0 filesrc location=/media/ts/Sally.ts '!' tsdemux '!' mpegvideoparse '!' msdkmpeg2dec '!' msdkvpp '!' video/x-raw,format=NV12 '!' videoconvert '!' ximagesink

Video rendered not smooth. And if remove video/x-raw,format=NV12:
$ gst-launch-1.0 filesrc location=/media/ts/Sally.ts '!' tsdemux '!' mpegvideoparse '!' msdkmpeg2dec '!' msdkvpp '!' videoconvert '!' ximagesink

Video rendered looks good.
Comment 1 sreerenj 2018-06-27 21:50:27 UTC
A part of this issue is covered in https://bugzilla.gnome.org/show_bug.cgi?id=796699 . A vpp fix might also needed to close this bug.
Comment 2 sreerenj 2018-07-03 22:43:07 UTC
I think it is a performance issue with driver or mediasdk.

Somehow, dmabuf backed NV12 to regular NV12 VA surface (not dmabuf) operation is taking more time. Extremely slow and fps dropping more than 50%.

Use BGRx as output, it will work without any issue.
Comment 3 Nicolas Dufresne (ndufresne) 2018-07-18 19:22:14 UTC
Do you still reproduce this in master ? We have introduce processing-deadline now to address these issues.
Comment 4 Nicolas Dufresne (ndufresne) 2018-07-18 19:24:48 UTC
(In reply to sreerenj from comment #2) 
> Somehow, dmabuf backed NV12 to regular NV12 VA surface (not dmabuf)
> operation is taking more time. Extremely slow and fps dropping more than 50%.

Did you check the backing memory type ? Sounds like the NV12 data is non-cpu-cachable and BGRx data is cpu-cachable. The performance impact on CPU processing will be huge if that is the case.
Comment 5 sreerenj 2018-07-19 21:33:40 UTC
(In reply to Nicolas Dufresne (ndufresne) from comment #3)
> Do you still reproduce this in master ? 
Yes
Comment 6 sreerenj 2018-07-19 21:43:15 UTC
(In reply to Nicolas Dufresne (ndufresne) from comment #4)
> (In reply to sreerenj from comment #2) 
> > Somehow, dmabuf backed NV12 to regular NV12 VA surface (not dmabuf)
> > operation is taking more time. Extremely slow and fps dropping more than 50%.
> 
> Did you check the backing memory type ? Sounds like the NV12 data is
> non-cpu-cachable and BGRx data is cpu-cachable. The performance impact on
> CPU processing will be huge if that is the case.

IIUC the video memory is USWC (uncacheable, speculative write-combining) and regular memcpy() from uswc to system memory is always slow. But I don't know why BGRx giving better performance. Might be related to the size and alignment of BGRx data?..

This issue is reproducible with both gstreamer-vaapi and gst-msdk with any 720p videos.

gstreamer-vaapi:

GST_VAAPI_ENABLE_DIRECT_RENDERING=1 gst-launch-1.0 -v  filesrc location= Sally.ts ! tsdemux ! mpegvideoparse ! vaapimpeg2dec !  vaapipostproc !  video/x-raw, format=NV12 ! videoconvert ! xvimagesink

Here we Explicitly enable direct_rendering (internally use mapping of vasurface with vaDeriveImage).

In normal use cases, GStreamer-vaapi won't use direct rendering.

gst-msdk:

gst-msdk always use direct mapping. So the issue is easily reproducible with any pipeline, for eg:
gst-launch-1.0 -v  filesrc location= Sally.ts ! tsdemux ! mpegvideoparse ! msdkmpeg2dec ! msdkvpp !  video/x-raw, format=NV12 ! videoconvert ! xvimagesink
Comment 7 Nicolas Dufresne (ndufresne) 2018-07-19 21:47:47 UTC
Note that this issue is not new, it's been around for years already.
Comment 8 Nicolas Dufresne (ndufresne) 2018-07-19 21:54:03 UTC
The issue might come from the way read the NV12 memory in videoconvert. When you output BGRx, the code that reads the memory is simply do memcpy(), which in the converter, we will do random pixel access, and I'm thinking this could be extremely slow without cache. What's the reason for not producing cacheable buffers on memory coherant system like Intel platform ?
Comment 9 sreerenj 2018-07-23 22:28:50 UTC
It is possible to get cached and un-cached memory in the output.

Libva allows two methods to map video memory,

Method_1: directly map with vaDervieImage (it is supported in iHD driver and vaapi-intel-driver).
In usual use cases, the VASurface wrapping the tiled (Y-tiled) video memory and the memory mapped using vaDeriveImage() API is tiled too. In this case, driver uses the drm_intel_gem_bo_map_gtt() to map a tiled memory which generates  USWC memory.

Method_2: Create a VAImage structure using vaCreateImae api and user-specific video format. Then we download the vaSurface (which has decoded content) using shaders or vpp by invoking vaGetImage() api.
This will generate "Linear" memory and this is Cacheable by CPU.


We chose the method 2 as default in GStreaemr-vaapi if there is a requirement to map the video memory for software operations.

But gst-msdk only has support for method_1 and we stuck with USWC memory. The iHD driver wasn't properly supporting the vaGetImage(), not sure about the current status though.
Comment 10 GStreamer system administrator 2018-11-03 14:25:46 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/729.