GNOME Bugzilla – Bug 795195
gst_vaapi_texture_glx_put_surface_unlocked is blocked 75-80% of the time
Last modified: 2018-11-03 15:53:53 UTC
gst_vaapi_texture_glx_put_surface_unlocked is blocked 75-80% of the time. This means if you need more than 20-25% CPU to decode and render a video, you're going to miss frames. I noticed this with two different machines. While they both spend around 75% of their (real) time blocked in gst_vaapi_texture_glx_put_surface_unlocked, only the one with the lower CPU requirement (10%) was able to render the video smoothly. The other machine which requires 40% CPU time to play the same video stutters badly, probably because 40%+75% > 100%.
Created attachment 370849 [details] kabylake-rt.prof.pdf Real time profile from Kaby Lake.
Created attachment 370850 [details] haswell-rt.prof.pdf Real time profile from the Haswell.
Note: On platforms that use EGL instead of GLX, there is no problem at all.
Also, decoding to ximagesink avoids the problem and is much smoother than decoding to glimagesink (!)
I've now got a working patch to solve this bug and will polish it up for review next week. Although I seem to have incorrectly assumed that fixing this bug would solve my own totem playback issues. Seems it doesn't, but the improved efficiency will still be worth having.
Created attachment 370982 [details] [review] fix-795195-v1.patch Here's a much simpler fix than what I was working with last week...
Review of attachment 370982 [details] [review]: Thanks for the patch Daniel. ::: gst-libs/gst/vaapi/gstvaapitexture_glx.c @@ +362,3 @@ + /* Ensure vaPutSurface has really finished populating the pixmap */ + glXWaitX (); A couple comments: 1. This patch is focused only on Intel driver, where vaSyncSurfce boils down to DRM_IOCTL_I915_GEM_SET_DOMAIN ioctl (there are many comments in the intertubes complaining about the latency of this ioctl) 2. We need to test this patch on gallium driver, particularly in radeon, when decoding H.264, since it is where vaSyncSurface also flushes the pending operations. 3. Perhaps this is not the correct approach to this bug, perhaps the bug is in vaapi-intel-driver or even in the kernel. Thus, we would be hiding the bug 4. This patch replacing vaSyncSurface with glXWaitX looks to me similar like adding a sleep in the code :D in order to avoid a race condition.
I can't help with gallium. Although I finally got hold of a radeon machine recently, I haven't yet got vaapi working with it. So that's a whole different issue. While I'm not completely sure the patch is correct, I don't think comparing it to adding a sleep is correct either. vaSyncSurface is sleeping dramatically more than glXWaitX does. That's the bug.
(In reply to Daniel van Vugt from comment #8) > While I'm not completely sure the patch is correct, I don't think comparing > it to adding a sleep is correct either. Agree. I'm sorry. I shouldn't made that comparison. > vaSyncSurface is sleeping > dramatically more than glXWaitX does. That's the bug. AFAIK, the bug is that the kernel's driver is taking too much to flush the pending operations on a single surface.
I'm not really sure. Maybe it would be helpful to forget this bug for a moment and have a look at what motivated it -> bug 795325
I asked in #intel-gfx I just c&p the chat for sake of a real fix in intel-vaapi-driver ceyusa hi.. I have seen a couple complains about the latency when calling DRM_IOCTL_I915_GEM_SET_DOMAIN.. do you have some info in this regard? mkuoppal ceyusa: changing the domain will take time as we might need to wait for rendering to finish and flush ceyusa mkuoppal: thanks ceyusa out of curiosity: do you have any reported bug related? * ickle lost irc history ickle ceyusa: related to what? mkuoppal set domain latency ceyusa that ickle ceyusa: friends don't let friends use set-domain ceyusa ickle: hehehe ickle if it weren't for random swapping, we could just let userspace handle cache domains ceyusa because in va, vaSyncSurface boils down to that ioctl ickle right, that's just a wait-for-rendering and manipulate cpu caches ickle SyncSurface should just be GEM_WAIT to avoid the cpu cache manipulation side-effects ickle and preferably have userspace fences to avoid touching the kernel until it wants to sleep+wait ceyusa agree
Does that mean I should move this to an intel-vaapi-driver bug on github?
Comment on attachment 370982 [details] [review] fix-795195-v1.patch I'm going to obsolete the glXWaitX patch. It's only right in theory. Whereas in practice it's possibly wrong if vaPutSurface is too low-level to be affected by it. Plus I think frame rates are slightly reduced using that patch, suggesting it's not syncing the right thing.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer-vaapi/issues/90.