After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 795195 - gst_vaapi_texture_glx_put_surface_unlocked is blocked 75-80% of the time
gst_vaapi_texture_glx_put_surface_unlocked is blocked 75-80% of the time
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gstreamer-vaapi
1.14.0
Other Linux
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2018-04-12 11:12 UTC by Daniel van Vugt
Modified: 2018-11-03 15:53 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
kabylake-rt.prof.pdf (23.25 KB, application/pdf)
2018-04-12 11:14 UTC, Daniel van Vugt
  Details
haswell-rt.prof.pdf (21.23 KB, application/pdf)
2018-04-12 11:14 UTC, Daniel van Vugt
  Details
fix-795195-v1.patch (1.48 KB, patch)
2018-04-16 10:29 UTC, Daniel van Vugt
reviewed Details | Review

Description Daniel van Vugt 2018-04-12 11:12:48 UTC
gst_vaapi_texture_glx_put_surface_unlocked is blocked 75-80% of the time.

This means if you need more than 20-25% CPU to decode and render a video, you're going to miss frames.

I noticed this with two different machines. While they both spend around 75% of their (real) time blocked in gst_vaapi_texture_glx_put_surface_unlocked, only the one with the lower CPU requirement (10%) was able to render the video smoothly. The other machine which requires 40% CPU time to play the same video stutters badly, probably because 40%+75% > 100%.
Comment 1 Daniel van Vugt 2018-04-12 11:14:17 UTC
Created attachment 370849 [details]
kabylake-rt.prof.pdf

Real time profile from Kaby Lake.
Comment 2 Daniel van Vugt 2018-04-12 11:14:50 UTC
Created attachment 370850 [details]
haswell-rt.prof.pdf

Real time profile from the Haswell.
Comment 3 Daniel van Vugt 2018-04-12 11:16:55 UTC
Note: On platforms that use EGL instead of GLX, there is no problem at all.
Comment 4 Daniel van Vugt 2018-04-12 11:18:26 UTC
Also, decoding to ximagesink avoids the problem and is much smoother than decoding to glimagesink (!)
Comment 5 Daniel van Vugt 2018-04-13 09:54:43 UTC
I've now got a working patch to solve this bug and will polish it up for review next week.

Although I seem to have incorrectly assumed that fixing this bug would solve my own totem playback issues. Seems it doesn't, but the improved efficiency will still be worth having.
Comment 6 Daniel van Vugt 2018-04-16 10:29:48 UTC
Created attachment 370982 [details] [review]
fix-795195-v1.patch

Here's a much simpler fix than what I was working with last week...
Comment 7 Víctor Manuel Jáquez Leal 2018-04-17 09:05:39 UTC
Review of attachment 370982 [details] [review]:

Thanks for the patch Daniel.

::: gst-libs/gst/vaapi/gstvaapitexture_glx.c
@@ +362,3 @@
 
+  /* Ensure vaPutSurface has really finished populating the pixmap */
+  glXWaitX ();

A couple comments:

1. This patch is focused only on Intel driver, where vaSyncSurfce boils down to DRM_IOCTL_I915_GEM_SET_DOMAIN ioctl (there are many comments in the intertubes complaining about the latency of this ioctl)

2. We need to test this patch on gallium driver, particularly in radeon, when decoding H.264, since it is where vaSyncSurface also flushes the pending operations.

3. Perhaps this is not the correct approach to this bug, perhaps the bug is in vaapi-intel-driver or even in the kernel. Thus, we would be hiding the bug

4. This patch replacing vaSyncSurface with glXWaitX looks to me similar like adding a sleep in the code :D in order to avoid a race condition.
Comment 8 Daniel van Vugt 2018-04-17 09:15:18 UTC
I can't help with gallium. Although I finally got hold of a radeon machine recently, I haven't yet got vaapi working with it. So that's a whole different issue.

While I'm not completely sure the patch is correct, I don't think comparing it to adding a sleep is correct either. vaSyncSurface is sleeping dramatically more than glXWaitX does. That's the bug.
Comment 9 Víctor Manuel Jáquez Leal 2018-04-17 09:34:09 UTC
(In reply to Daniel van Vugt from comment #8)

> While I'm not completely sure the patch is correct, I don't think comparing
> it to adding a sleep is correct either. 

Agree. I'm sorry. I shouldn't made that comparison.

> vaSyncSurface is sleeping
> dramatically more than glXWaitX does. That's the bug.

AFAIK, the bug is that the kernel's driver is taking too much to flush the pending operations on a single surface.
Comment 10 Daniel van Vugt 2018-04-17 09:51:39 UTC
I'm not really sure.

Maybe it would be helpful to forget this bug for a moment and have a look at what motivated it -> bug 795325
Comment 11 Víctor Manuel Jáquez Leal 2018-04-17 15:03:22 UTC
I asked in #intel-gfx

I just c&p the chat for sake of a real fix in intel-vaapi-driver

ceyusa	 hi.. I have seen a couple complains about the latency when calling DRM_IOCTL_I915_GEM_SET_DOMAIN.. do you have some info in this regard?
mkuoppal	ceyusa: changing the domain will take time as we might need to wait for rendering to finish and flush
ceyusa	mkuoppal: thanks
ceyusa	out of curiosity: do you have any reported bug related?
*	ickle lost irc history
ickle	ceyusa: related to what?
mkuoppal	set domain latency
ceyusa	that
ickle	ceyusa: friends don't let friends use set-domain
ceyusa	ickle: hehehe
ickle	if it weren't for random swapping, we could just let userspace handle cache domains
ceyusa	because in va, vaSyncSurface boils down to that ioctl
ickle	right, that's just a wait-for-rendering and manipulate cpu caches
ickle	SyncSurface should just be GEM_WAIT to avoid the cpu cache manipulation side-effects
ickle	and preferably have userspace fences to avoid touching the kernel until it wants to sleep+wait
ceyusa	agree
Comment 12 Daniel van Vugt 2018-04-18 01:19:32 UTC
Does that mean I should move this to an intel-vaapi-driver bug on github?
Comment 13 Daniel van Vugt 2018-04-18 12:52:39 UTC
Comment on attachment 370982 [details] [review]
fix-795195-v1.patch

I'm going to obsolete the glXWaitX patch. It's only right in theory. Whereas in practice it's possibly wrong if vaPutSurface is too low-level to be affected by it.

Plus I think frame rates are slightly reduced using that patch, suggesting it's not syncing the right thing.
Comment 14 GStreamer system administrator 2018-11-03 15:53:53 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer-vaapi/issues/90.