GNOME Bugzilla – Bug 685282
dvdspu: figure out how to make it work with hardware decoders and subpicture overlays
Last modified: 2018-11-03 13:13:08 UTC
Getting the DVD SPU to paint generically is a requirement for allowing the DVD elements to plug / output hardware decoder caps. Here's a conversation we had about it on IRC: Sep 26 01:37:04 * thaytan wonders how SPU works Sep 26 01:37:19 <bilboed> thaytan, with vdpau ? Sep 26 01:37:24 <thaytan> *nod* Sep 26 01:37:35 <thaytan> how it should work, that is Sep 26 01:37:36 <bilboed> they have also VdpBitmapSurface Sep 26 01:38:01 <bilboed> so VdpVideoSurface => YUV stuff, VdpBitmapSurface => RGB stuff Sep 26 01:38:11 <bilboed> VdpOutputSurface => output of the compositor for display Sep 26 01:38:40 <bilboed> you can create VdpBitmapSurface and map(write) stuff on it Sep 26 01:39:01 <bilboed> then you give it to the compositor with coordinates (and whatever else) and that's basically it Sep 26 01:39:09 <thaytan> hmmm Sep 26 01:39:26 <bilboed> so as a result ... I'm also gonna have to figure out how to solve hw-compositing Sep 26 01:39:28 <thaytan> not sure I see how that fits into a GStreamer/rsndvdbin context Sep 26 01:39:31 <bilboed> in a generic fashion that is Sep 26 01:39:55 <bilboed> thaytan, was thinking you could slap GstOverlayMeta (or sth like that) with attached GstBuffers Sep 26 01:40:06 <bilboed> thaytan, maybe videomixer or some generic element could do that Sep 26 01:40:09 <thaytan> I guess it's a vdpspu element with video and subpicture inputs as currently with dvdspu Sep 26 01:40:25 <thaytan> except the video pad accepts vdp output surface caps Sep 26 01:40:38 <bilboed> I'd prefer to avoid creating yet-another-custom-element Sep 26 01:40:45 <thaytan> but, I don't know what it outputs Sep 26 01:41:12 <thaytan> bilboed: I don't know how to build it generically Sep 26 01:41:48 <thaytan> the dvdspu element uses the video stream passing through to a) paint onto b) uses the timestamps to crank the SPU state machine Sep 26 01:42:44 <bilboed> what do you need more apart from "put this RGB bitmap at these coordinates for this timestamp and this duration" Sep 26 01:43:48 <thaytan> it needs the timestamps and segment info on the video stream so it knows which pixels to generate Sep 26 01:44:04 <thaytan> it's more "here's a video frame, what's the overlay?" Sep 26 01:44:13 <thaytan> also, dvdspu works in YUV too Sep 26 01:44:13 <bilboed> sure, but it doesn't care about the *content* of that frame Sep 26 01:44:28 <thaytan> bilboed: not if it's not doing the compositing, no Sep 26 01:44:34 <bilboed> right Sep 26 01:45:11 <bilboed> so it could see the stream go through, watch/collect segment/timestamps and decide what subpicture to attach to it (without *actually* doing any processing and letting downstream handle that) Sep 26 01:45:13 <thaytan> but the model is still to pass the video buffer stream through the spu element so it can see the timing info it needs, right? Sep 26 01:45:22 <thaytan> oh, of course Sep 26 01:45:28 <thaytan> that's what I was suggesting, I guess I wasn't clear Sep 26 01:45:35 <__tim> thaytan, dvdspu should support GstVideoOverlayComposition imho Sep 26 01:45:38 <bilboed> I'm not *that* familiar with SPU Sep 26 01:45:42 <bilboed> also, what __tim said :) Sep 26 01:45:54 <bilboed> like that I don't have to solve it in 500 different elements Sep 26 01:45:56 <thaytan> ok, I guess I'll have to look at the GstVideoOverlayComposition API Sep 26 01:46:54 * bilboed is not looking forward "at all" to fixing this for cluster-gst Sep 26 01:47:12 <__tim> it's very dumb, you can provide one or more ARGB or AYUV rectangles and either use helper API to put them onto the raw video, or attach them to the buffer; the sink or whatever can then take over the overlaying using that Sep 26 01:47:40 <__tim> and it will do a bunch of conversions for you and cache them if the sink or whatever does the overlaying doesn't support what you supplied Sep 26 01:48:54 <thaytan> well, that sounds feasible - although less efficient than dvdspu painting natively if the composite is in software Sep 26 01:49:28 <thaytan> maybe it can be extended to add RLE AYUV rectangles as a format though? Sep 26 01:49:44 <__tim> thaytan, how sure are you of that? because basically you have to parse the RLE data for every single frame, right? is that really so much faster than blitting some ready-made rectangle using orc? Sep 26 01:50:04 <thaytan> dvdspu gets to skip a lot of transparent pixels Sep 26 01:52:21 <__tim> yeah, but it's if else and loops etc. You might be right, I'm just curious how much difference it actually makes. Also, you don't have to use the API to blit your pixels, you can still do that as you do now and only attach the AYUV rectangle if downstream supports that Sep 26 01:54:13 <__tim> it's just convenient because you only have one code path Sep 26 01:55:01 <thaytan> it sounds like a structural improvement
I've ported dvdspu to use the GstVideoOverlayComposition API, and I'd like some feedback. The patches work well on my side for either PGS or VobSub formats, using a hardware renderer for composition or by blending the overlay directly in the video frames. Note that I'm using an ARGB overlay. I agree that usually video is YUV so it would be faster to use an AYUV overlay for blending, however in my case I can only use ARGB for hardware composition, and there is currently no way to negotiate the supported overlay formats with the downstream elements. Right now a single overlay rectangle is allocated, and the SPU objects are rendered inside this rectangle. The size of the rectangle is made minimal on purpose: instead of allocating the full overlay size, only the surface of the cropped objects is allocated to minimize memory usage. So coordinates of the rendered objects are translated to match the rectangle position in the overlay. The patch also changes how coordinates of the SPU are calculated against the video frame. Usually in BD or DVD rips, black borders are removed and video might be scaled with the same aspect ratio. However, the SPU data is not corrected to account for the black borders removal or the scaling. In this case the SPU data was not rendered correctly (see Bug #663750). This patch tries to fix that by calculating the scaling/translation necessary to render the overlay as intended in the original content. The scaling can be done easily using the overlay composition API. When playing a DVD these heuristics are a noop since the video frame size matches the overlay size. Interlacing should be handled transparently by the overlay composition API so Bug #667224 should be fixed too. The performance level and memory usage should not be too far from the previous implementation, since before there were intermediate buffers for A/U/V blending. Performance could be improved a lot by caching the overlay composition, I'm not sure how to do that yet.
Created attachment 253844 [details] [review] dvdspu: render to ARGB overlay
Created attachment 253845 [details] [review] matroskademux: send custom event with DVD subtitles frame size This patch helps the downstream dvdspu element to render the VobSub overlay at the right position and size, in case the video was scaled or cropped to remove black borders. The VobSub SPU data coordinates are calculated against the size set in the IDX file, which is parsed here.
Created attachment 253846 [details] [review] dvdspu: handle frame size event from upstream Use the frame size from the IDX file to make VobSub overlay positioning and scale correct.
I've fixed some issues related to caps negotiation, vobsub artifacts and added overlay cache, please use the patches here instead: https://github.com/rawoul/gst-plugins-bad/commits/dvdspu
"The performance level and memory usage should not be too far from the previous implementation, since before there were intermediate buffers for A/U/V blending." The performance level is something you could actually measure. My recollection is that it made a big difference - it was one of the reasons I wrote it to render incrementally onto the frame instead of into a large caching buffer and blending the whole thing. I don't remember how big a difference - but it was something like the overlay consuming as much CPU as the MPEG decode. The intermediate AYUV buffers in the existing code don't trigger the same level of CPU consumption because a) it only does work for lines with actual pixels b) it only blends the minimum left-right span that contains coloured pixels c) the pixel buffers are small enough to usually still be in cache when they get blended Also, there was no floating point ops in the previous code.
Sorry - that may have sounded overly negative. Thanks for working on this - it's a cool set of changes :)
I agree that your blending code is probably faster when blending directly on an YUV frame and cropping out the transparent areas. The floating point operations can be easily removed by using an AYUV overlay instead of ARGB, which would make blending faster anyway. As I said, I'm using premultiplied ARGB because in my use case I can render the overlay in hardware with this format. Since the overlay is now cached, only the blending operation is done for each video frame after the overlay is rendered once, I don't think it should be that expensive. I'll try to run some benchmarks and report them. Your code was probably the best way to do it, unfortunately it only works for YUV planar video formats... And most importantly it doesn't allow hardware rendering. I'm open to any suggestions to make it faster :)
(In reply to comment #7) > Sorry - that may have sounded overly negative. Thanks for working on this - > it's a cool set of changes :) Thanks for looking into this :)
What's the status of this? Sounds very interesting :)
Just an idea, does anyone see any potential issues in adding multiple GstVideoOverlayRectangles to the composition instead of just one big rectangle? Basically during the RLE process we'd remember where the actual non transparent pixels were, group this into rectangles and then add rectangle for each area? The buffers could share the underlying framebuffer to prevent copying (or maybe not since most of the are would be transparent and unused). The benefit would be significantly reduced blending (since most transparent pixels wouldn't need to be blended), also downstream could adjust rectangle positions if the dimensions don't match (which is sometimes the case with ripped DVDs).
(In reply to comment #11) > Just an idea, does anyone see any potential issues in adding multiple > GstVideoOverlayRectangles to the composition instead of just one big rectangle? We do this for a while for custom OSD elements in s/w and h/w (vaapi/libva) based pipelines with ARGB overlay data and it works like charm. In h/w accelerated pipelines you don't have a benefit in terms of blending performance because vaapi and vdpau can do this completely in hardware. Though, in the case of vaapi there is a limit of max. 8 (vdpau-driver/libva) or 4 (current intel-driver) subpictures (i.e. GstVideoOverlayRectangles), iirc.
In my case it shouldn't impact performance because I do the blending in the screen framebuffer in hardware too. Your idea of repositioning the rectangles would have to be done in the dvdspu element though, since the blending with the video frame is done just after creating the rectangles when no downstream element can handle the overlay metadata.
I don't need to assume that my usecase would be generic enough. I can always have extra element downstream that does the repositioning and composition.
First of all, thanks for this work. I tested it a bit with one of the sample I have and it works well with cluttersink (+ overlay composition patch). By the way, what's the current state of the work ?
This is pretty neat indeed and the improvements are worthwhile. What exactly is blocking this ?
Nothing much, I think the AYUV-> ARGB conversion code should be removed for this to be merged upstream, the overlay composition api can handle it. The performance loss due when using the blending path have not been measured compared to the current optimized path, which only works on YUV videos. There's also a small issue with the frame-size and clut events sent from matroskademux. I need to make them sticky, but the event name has to be unique so that both events can be kept on the pad. So we have to change: application/x-gst-dvd, event = "dvd-set-frame-size" application/x-gst-dvd, event = "dvd-spu-clut-change" to something like application/x-gst-dvd-spu-clut-change application/x-gst-dvd-set-frame-size Otherwise the frame size or the CLUT will be lost on track change. Unfortunately I don't have time to work on this ATM, maybe Matthieu has made some progress on this ?
I've rebased my work on master and included proper caps negotation for the overlay composition meta, like was done in other plugins using overlay by Matthieu Bouron and Nicolas Dufresne. The patches can be found here: https://github.com/rawoul/gst-plugins-bad/commits/dvdspu https://github.com/rawoul/gst-plugins-good/commits/dvdspu The first iteration of the render patch works with ARGB format, and I've reverted to AYUV in the next patch. I'm keeping both patch in case we want to do some performance tests for the overlay in either ARGB or AYUV formats.
Would it be possible to rebase and attach the patch so we can finally review and merge ?
Created attachment 300885 [details] [review] matroskademux: send custom event with DVD subtitles frame size
Created attachment 300886 [details] [review] dvdspu: pass dvdspu argument to set_caps functions
Created attachment 300887 [details] [review] dvdspu: render to ARGB overlay
Created attachment 300888 [details] [review] dvdspu: allow suffix in dvd event name to allow multiple sticky dvd events
Created attachment 300889 [details] [review] dvdspu: handle frame size event from upstream
Created attachment 300890 [details] [review] dvdspu: cache overlay composition
Created attachment 300891 [details] [review] dvdspu: improve negotiation of overlay composition
I've rebased the github links against master and posted the patches. Ideally someone with embedded hardware with no overlay could check performance is OK when using those patches. Unfortunately the generic overlay blending code is not very much optimised.
Created attachment 309736 [details] [review] dvdspu: pass dvdspu argument to set_caps functions
Created attachment 309737 [details] [review] dvdspu: render to AYUV overlay
Created attachment 309738 [details] [review] dvdspu: render to ARGB overlay instead of AYUV
Created attachment 309739 [details] [review] dvdspu: allow suffix in dvd event name to allow multiple sticky dvd events
Created attachment 309740 [details] [review] dvdspu: handle frame size event from upstream
Created attachment 309741 [details] [review] dvdspu: cache overlay composition
Created attachment 309742 [details] [review] dvdspu: improve negotiation of overlay composition
I've updated the patches, and fixed a lot of small issues. I've tested with a few DVDs and also matroska files, which render fine. I've also improved the overlay composition negotiation again, based on the recent textoverlay work. The vobsub matroska embedded subtitles are now rendered much better than before, the scale / position now matches vlc and mpv for the same files. You can also find the patches here: https://github.com/rawoul/gst-plugins-bad/commits/dvdspu I hope this can be reviewed now !
I think somewhere down the road, it would be nice to have more generic SPU interface than sending all these events with application/x-gst-dvd prefix, but I'm not sure what that API should be. So far these patches look good. I'm still a little worried about the performance impact blending an entire display-area sized region. My memory is that it was a non-trivial impact in 2004-2005 when I wrote the original code. It's something we could optimise later though - perhaps keep an auxilliary array of "height" entries that stores the X position of the first non-transparent pixel for each line so it can skip blending. Instead of only scaling down to fit, is it worth considering using the set-frame-size event to allocate a larger downstream frame as the target for blending into? I think in any case a mechanism for communicating the intended display size to the sinks would be useful when using GstVideoOverlayComposition. I have a rip here where the video was originally 1080p, and they've trimmed the black-bars to enode 1920x800. When the black bars are added back in for display, the PGS titles should appear there - starting below Y=870 or so. Between dvbsuboverlay, dvdspu and textoverlay, we probably have enough experience to start factoring out a set of common overlay/SPU behaviours into a base class and have features like negotiation, colour conversion and software blending fallbacks in there. Besides all that, there is something wrong with these patches, but I don't know what yet. On my Back To The Future DVD, they make it not display certain subpicture entries in the bonus 'Animated Anecdotes' track, but only some.
Hi Jan, Thanks for taking a look. The patches on the github branch are more recent, mostly fixing issues with PGS rendering introduced by my previous patches. Now everything works fine, and I also handle the cropping window properly. Since 1.6 has been released, I'm now trying to make the individual patches easier to review (separating overlay meta negotiation and rendering in separate patches). I'll try to finish this work tonight. (In reply to Jan Schmidt from comment #36) > I think somewhere down the road, it would be nice to have more generic SPU > interface than sending all these events with application/x-gst-dvd prefix, > but I'm not sure what that API should be. > > So far these patches look good. > > I'm still a little worried about the performance impact blending an entire > display-area sized region. My memory is that it was a non-trivial impact in > 2004-2005 when I wrote the original code. It's something we could optimise > later though - perhaps keep an auxilliary array of "height" entries that > stores the X position of the first non-transparent pixel for each line so it > can skip blending. I agree, and the blending performance could be better if the overlay format could be dynamic instead of hardcoding ARGB or AYUV. We would use AYUV when blending to avoid conversions, and ARGB in other cases since we have no way yet to negotiate the overlay format with downstream, and most hardware works in ARGB. The rendering code does skip transparent pixels so I hope the impact is not too big. Clearing the full frame should be pretty fast. > > Instead of only scaling down to fit, is it worth considering using the > set-frame-size event to allocate a larger downstream frame as the target for > blending into? I think in any case a mechanism for communicating the > intended display size to the sinks would be useful when using > GstVideoOverlayComposition. I have a rip here where the video was > originally 1080p, and they've trimmed the black-bars to enode 1920x800. When > the black bars are added back in for display, the PGS titles should appear > there - starting below Y=870 or so. I'm not sure what to do here. Right now I have the exact same result as VLC for vobsub and PGS. mpv renders PGS slightly differently, it doesn't scale them. > > Between dvbsuboverlay, dvdspu and textoverlay, we probably have enough > experience to start factoring out a set of common overlay/SPU behaviours > into a base class and have features like negotiation, colour conversion and > software blending fallbacks in there. Yes, the overlay meta negotiation is not easy to get right and is a big copy paste in all the subtitle renderers. > > Besides all that, there is something wrong with these patches, but I don't > know what yet. On my Back To The Future DVD, they make it not display > certain subpicture entries in the bonus 'Animated Anecdotes' track, but only > some. The issue you face might be related to the overlay cache not being cleared in some cases, which I might have overlooked. Can you check if reverting the 'dvdspu: cache overlay composition' helps ?
Confirming what I said on IRC: Reverting the cache composition makes the animated subs play again. It looks like the composition caching only allows for subs to change when a new event or packet arrives, but actually VOB subs are a sequence of commands to be executed in turn at particular timestamps - we need a way for the vobsub implementation to flag that advancing the SPU did or did not change the output and use it to invalidate the cache.
(In reply to Arnaud Vrac from comment #37) > > > > Instead of only scaling down to fit, is it worth considering using the > > set-frame-size event to allocate a larger downstream frame as the target for > > blending into? I think in any case a mechanism for communicating the > > intended display size to the sinks would be useful when using > > GstVideoOverlayComposition. I have a rip here where the video was > > originally 1080p, and they've trimmed the black-bars to enode 1920x800. When > > the black bars are added back in for display, the PGS titles should appear > > there - starting below Y=870 or so. > > I'm not sure what to do here. Right now I have the exact same result as VLC > for vobsub and PGS. mpv renders PGS slightly differently, it doesn't scale > them. > To take an example sub from the test sample I'm talking about: Win ID 0 x 404 y 872 w 1111 h 142 So that means on the original disc, this was targeted to display from y=872 to 1014. With 140 pixel black bars ((1080 - 800) / 2) added to the 1920x800 video, then the video displays from 140 -> 940 on the screen. The sub should cover y = 872 to 1014, so should appear at the bottom of the video, with about 74 pixels sticking off the bottom. On a real disc, the video would actually be encoded at 1920x1080, so the placement would be a non-issue. btw, can you please update the remaining patches on the bug?
Created attachment 312279 [details] [review] dvdspu: render to AYUV overlay Updated the gstspu_fit_overlay_rectangle function to allow keeping the SPU window aspect ratio as much as possible when required. This is now enabled for PGS subtitles. Since black borders are removed on BD-rips, and subtitles are usually placed on top of the video and not in the black borders, the result is the same as on the original video.
Created attachment 312280 [details] [review] dvdspu: render to ARGB overlay instead of AYUV
Created attachment 312281 [details] [review] dvdspu: negotiate overlay composition meta with downstream
Created attachment 312282 [details] [review] dvdspu: cache overlay composition I haven't fixed the too aggressive caching yet, so this patch should not be pushed. It's useful for benchmarks.
Created attachment 312283 [details] [review] dvdspu: handle frame size event from upstream
I've updated the patches, they are rebased on master and are the same than on my github dvdspu branch. Jan, the PGS subtitles should look better now, can you confirm ? The cache issue is still here, I haven't had time to fix it yet. I have some numbers I got playing a DVD Rip as fast as possible, unfortunately they are not good when blending: $ time gst-launch-1.0 file://$PWD/12_MONKEYS.mkv ! matroskademux name=dmx dmx.video_0 ! queue ! avdec_mpeg2video ! d.video dmx.subtitle_0 ! queue ! d.subpicture dvdspu name=d ! fakesink blended AYUV overlay, with cache: 873.82s user 17.35s system 146% cpu 10:07.05 total blended AYUV overlay, no cache: 893.13s user 57.03s system 142% cpu 11:05.09 total blended ARGB overlay, with cache: 1191.11s user 17.00s system 130% cpu 15:23.82 total blended ARGB overlay, no cache: 1210.60s user 53.36s system 129% cpu 16:19.72 total no overlay (direct blending): 445.42s user 19.90s system 246% cpu 3:09.12 total So the new code is much slower when using the generic blending code. Most of the time is spent in gst_video_blend, and also converting ARGB to YUV before blending when the overlay is rendered to ARGB. I don't think we can ever match the original performance, since it is highly optimised for the blending YUV to YUV use case. I see multiple things that could be improved: - keep the old code path for blending. This could make the code pretty complicated though. - crop the overlay rectangles to only contain visible pixels. This could be done in two ways: * with GstCropMeta, unfortunately the overlay code does not account for this meta. * by tweaking the overlay GstVideoMeta width, height and plane offset. We must make sure the alignment is correct though. Caching doesn't help much when blending, since rendering the overlay is fast compared to the blending operation.
One option to improve the overall performance benefit here would be to support an overlay format that matches the DVD and PGS input format more closely - passing overlay regions in a run-length encoded format instead of as arrays of raw pixels that (for DVD/Bluray at least) are frequently mostly transparent pixels
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/78.