GNOME Bugzilla – Bug 668483
video: add support for global-alpha multiplicator for overlay rectangles
Last modified: 2012-03-27 08:23:50 UTC
Created attachment 205855 [details] [review] Add support for global alpha multiplicator to video-blend and gstoverlaycomposition interface Hi, the new Overlay Composition API (http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-base-libs/html/gst-plugins-base-libs-gstvideooverlaycomposition.html) can be used to overlay generic OSD elements over raw surfaces. An often used feature of osd elements is fading them in and out. This is done bei adjusting the alpha values of the overlayed pixels, which can be achieved on a per-pixel basis or -- if supported by the API -- globally for a whole overlay element/component/rectangle. The latter does not introduce performance overhead if supported by the blending engine, whereas the former does. This request asks to add support for a global-alpha multiplicator per composition rectangle to allow for efficient fading support of the new Overlay Composition API. 2 patches attached: - add global_alpha flag to the GstVideoOverlayRectangle struct and the relvant API calls and add support for global_alpha value in video_blend() func - update textoverlay, which already makes use of the new API We were not always sure whether the modification of the API calls where in line with the API design principles. Please let us know if you prefer further changes to the patches. Thanks for your review and for the new API which is a great improvement for OSD-elements, Holger
Created attachment 205856 [details] [review] textoverlay: update to support the new global-alpha param
Created attachment 206157 [details] [review] Add support for global alpha multiplicator to video-blend and gstoverlaycomposition interface updated patch for video-overlay-composition.c: don't apply global_alpha in gst_video_overlay_rectangle_(un)premultiply()
Thanks for working on this. > We were not always sure whether the modification of the API calls where > in line with the API design principles. The API hasn't been released yet, so we could still make changes like this if we wanted to. However, in this case I'm not sure if that's really necessary to add this parameter to the constructor. I think for most use cases defaulting to 1.0 is sufficient, and if anyone wants to set a different global alpha they can use the setter API after creating the rectangle, no? There's one bit that I think might be missing from the patch. At first glance it looks like the patch only caters for a consumer/renderer that supports global alpha, but it's not clear what happens if the overlay consumer/renderer doesn't support global alpha. If I'm not mistaken, it currently just ignores the global alpha parameter then. What I think should happen is that it automagically modifies the overlay pixel data (making it writable first if needed) and multiplies all the alpha values in it with the global alpha if the caller doesn't support global alpha. We'll need to add a flag to signal whether a non-1.0 global alpha is set, and for the consumer/renderer to signal to us whether they can handle it or not (in _get_pixels*()).
Thanks for your comments. > I think for most use cases defaulting to 1.0 is sufficient, and if anyone wants > to set a different global alpha they can use the setter API after creating the > rectangle, no? I agree, that was what I thought first, but then wanted to be complete in the constructing call. Will change ... > What I think should happen is that it automagically modifies the overlay pixel > data (making it writable first if needed) and multiplies all the alpha values > in it with the global alpha if the caller doesn't support global alpha. Ok, that would be a nice extension. That's basically the step we perform so far in the producer-plugin. I will move it into the overlay-composition code then and implement it equivalently to the PREMULTIPLIED_ALPHA flag/functionality, defaulting to 'handle global alpha inside OverlayComposition-API' and leave pixel-values untouched only if requested by consumer/renderer. (VA-API definesvaSetSubpictureGlobalAlpha() in its interface, but it's mostly unimplemented in the backends). I think we will need to backup the initial per-pixel alpha values inside the GstVideoOverlayRectangle because (for fading for example) multiple calls to rect_set_global_alpha() on the same pixel-data are to be expected and global_alpha multiplication should always be relative to the initial per-pixel alpha values. This backup-array will be treated like the pixel-values, i.e. not copied in a _copy()-call. One detail that is unclear to me is how you want seq_num handling to be done. The per-rectangle seq_num *must* be changed after a call to set_global_alpha() on the producer side so that the consumer/renderer can detect the changed pixel-data and update its specific overlays/subpictures (at least thats how its done in the vaapi-case). Do you think the seq_num should updated automatically or on demand? For example, consider the case of an already created overlay_composition+rects that will be updated per-chain call. producer-side calls: gst_video_overlay_composition_make_writable(comp); # -> unlikely updating the seq_nums gst_video_overlay_rectangle_set_global_alpha(rect1, val); Should a gst_video_overlay_rectangle_update_seq_num() be introduced here or make set_global_alpha() update the seq_num automatically? I would propose the last. Same holds for gst_video_overlay_rectangle_set_render_rectangle().
Created attachment 208266 [details] [review] Add support for global alpha multiplicator to video-blend and gstoverlaycomposition interface updated the patch Now global_alpha is applied automatically in get_pixels*() unless requested differently by caller. Some notes: - As proposed per-rectangle seq_num is updated automatically if global_alpha is changed. - The current implementation is 'optimized' for speed and saving memory. I first tried an implementation integrating nicely into the internal caching of preprocessed/scaled rectangles, meaning that after each application of global_alpha the resulting pixel-data was added to the rectangle cache. This turned out to be too inefficient in terms of CPU-usage and memory consumption and made osd-fading pratically unusable in our application on STBs. Therefore I kept caching only for scaled and un/premultiplied rectangle-data and implemented the application of global_alpha to be done on the fly in get_pixels_argb_internal() on the pixel-data per scaled rectangle. I think this is reasonable as the need for caching of faded overlays is questionable. This patch also fixes some issues in the current (now tested ;-) code that handles GST_VIDEO_OVERLAY_FORMAT_FLAG_PREMULTIPLIED_ALPHA: - crash ("double free") for the unscaled-case in gst_video_overlay_rectangle_get_pixels_argb_internal() - gst_video_overlay_rectangle_(un)premultiply(): use both the wrong channel for alpha - set FLAG_PREMULTIPLIED correctly when adding scaled-rects to cache - evaluate FLAG_PREMULTIPLIED correctly when searching rect-cache Thanks for reviewing, Holger
Thanks for the update. Could you please split out the bug fixes for the PREMULTIPLIED stuff into a separate patch? I'm sure it was somewhat tested and used, but perhaps not in all variants (a small unit test addition would be even better of course).
Heh. It was tested with the textoverlay case. I did not have test cases for the other three cases :D
Was not supposed to blame someone ;-) I tried adding some tests, but was always running in the timeout of the test framework with only the already existing tests. Some tracing showed that the orc-routines (orc_resample_bilinear_u32()) called by video_blend_scale_linear_RGBA() never return. Not sure whether this is a problem with my orc-setup or with the code itself, have no experience with orc.
Created attachment 208333 [details] [review] Fixes for premultiplied alpha patch 1/2: PREMULTIPLIED_ALPHA fixes
Created attachment 208334 [details] [review] Add support for global_alpha patch 2/2 (presupposes patch 1) some additional fixes
Thanks. Not sure why they wouldn't return with the existing tests either, but you can try running the test with something like -base/tests/check $ ORC_CODE=backup make libs/video.check (or ORC_CODE=emulate) and see if that makes a difference.
Comment on attachment 208333 [details] [review] Fixes for premultiplied alpha Pushed the fixes for premultiplied alpha, with some minor modifications here and there (e.g. use g_memdup()). I fixed the unpremultiplying/premultiplying functions differently (KISS). Also added a unit test. Next time "git format-patch" formatted patches would be great, ideally one per bug/issue. commit 87a9e5634e3c44a02bb1dad9233a2cbd2c4f5311 Author: Tim-Philipp Müller <tim.muller@collabora.co.uk> Date: Wed Mar 14 17:59:31 2012 +0000 tests: add unit test for premultiplied alpha handling of video overlay rectangles https://bugzilla.gnome.org/show_bug.cgi?id=668483 commit 2c4b379470963ea2baff1e85db4f4568e53e83d2 Author: Tim-Philipp Müller <tim.muller@collabora.co.uk> Date: Wed Mar 14 17:46:23 2012 +0000 video: overlay-composition: fix alpha premultiply and unpremultiply Fix component offsets for little endian systems. https://bugzilla.gnome.org/show_bug.cgi?id=668483 commit 027f5bb47182130ae1edb30de789e42f4305e2ca Author: Holger Kaelberer <hk@getslash.de> Date: Wed Mar 14 17:28:57 2012 +0000 video: overlay-composition: fix rectangle caching after alpha (un)premultiplying If we are asked to (un)premultiply,we need to create the new rectangle with the right flags, so we can find it properly on subsequent cache lookups (also because it's wrong otherwise). https://bugzilla.gnome.org/show_bug.cgi?id=668483 commit 7a21d1eb32a02139236378f66e8a245cba5d35e0 Author: Holger Kaelberer <hk@getslash.de> Date: Wed Mar 14 17:18:47 2012 +0000 video: overlay-composition: fix crash when doing premultiplied<->unpremultiplied alpha conversion We need to copy the pixels before messing with them, not least because the buffer creation code below assumes it's ok to take ownership. Fixes crash caused by double-free. https://bugzilla.gnome.org/show_bug.cgi?id=668483 commit 6b7f25a2f07082be5fb95d676bf90fda91511204 Author: Holger Kaelberer <hk@getslash.de> Date: Wed Mar 14 16:42:24 2012 +0000 video: overlay-composition: check the right flags when searching for a cached rectangle Compare the flags of the *cached* rectangle to the desired flags when checking for a suitable rectangle in the cache. https://bugzilla.gnome.org/show_bug.cgi?id=668483
> - As proposed per-rectangle seq_num is updated automatically if global_alpha is > changed. About the seq_num: I'm a bit undecided here. On the one hand, the seq_nums are in the end just an optimisation - if they are useful, fine, if not, tough luck. So updating the seq_num in _rectangle_set_global_alpha() isn't wrong and correctly signals that for all practical purposes the rectangle has changed. So it's an ok thing to do. We can go with that. However, from a consumer/overlay perspective, what one actually wants to know if whether the underlying pixel data changed, right? For a consumer that doesn't support global alpha, it doesn't make a difference. But for a consumer who does support global alpha - they want to know that the pixels are the same as last time, and only the global alpha changed, so they can use the old object and transform it accordingly. Or at least that's how I imagine it. Is this what APIs would support? Or does it not work like that anyway? (Ultimately it's just an optimisation, so not a blocker). If so, one could introduce an additional seqnum for the underlying pixels excl. global alpha basically. But then, perhaps a consumer that does support global alpha should just check the GstBuffer pointers it gets - we could put a seqnum into GST_BUFFER_OFFSET() or so which could be used along the same lines. On a side note, if the seqnum of the rectangle is updated like that in _rectangle_set_global_alpha(), the composition the rectangle belongs to (if any) won't know about the change, so min_seq_num_used might not be as high as it could be. Having said that, we don't provide API to query min_seq_num yet, so it doesn't matter anyway. > - The current implementation is 'optimized' for speed and saving memory. I > first tried an implementation integrating nicely into the internal caching of > preprocessed/scaled rectangles, meaning that after each application of > global_alpha the resulting pixel-data was added to the rectangle cache. This > turned out to be too inefficient in terms of CPU-usage and memory consumption Why inefficient in terms of CPU usage? As for memory consumption: there's no reason why one can't remove cached rectangles. It would be perfectly reasonable to only keep "the last" modified rectangle around. > and made osd-fading pratically unusable in our application on STBs. Therefore I > kept caching only for scaled and un/premultiplied rectangle-data and > implemented the application of global_alpha to be done on the fly in > get_pixels_argb_internal() on the pixel-data per scaled rectangle. I think this > is reasonable as the need for caching of faded overlays is questionable. That's fine with me. It's an implementation detail / internal optimisation anyway, and could still be added later if desired. So the case where you need to apply global alpha on the rectangle, but no scaling is needed/wanted is handled in the "scaling code path" as well, right? Some nitpicks: - in gst_video_overlay_rectangle_extract_alpha() and gst_video_overlay_rectangle_apply_global_alpha() please use the ARGB_A define I added with the above commits (it could be derived from the format using gst_video_* API, but there's no particular benefit at this point while we only support one format. Providing the compiler with the fixed number might lead to marginally faster code.) - update c1,c2,c3, alpha_channel in apply_global_alpha() to new ARGB_* defines as well - g_free() is NULL-safe (re. finalize and extract_alpha) - no need to check the return value of g_malloc(), it will always suceed (or abort). There's g_try_malloc() if desired, but probably not needed here. Just assume it will work and do away with the non-error handling in apply_global_alpha() and extract_alpha() I'd say :-) - in apply_global_alpha(), how sure are you that no one else is holding a ref to rect->pixels, meaning it mustn't me modified? perhaps do a gst_buffer_make_writable() which will make a copy if anyone else is holding a ref and keep the original buf if not. (This then leads us back to the seqnum question I guess...) - just use 4 instead of bpp everywhere - would really love some unit tests, but not a deal breaker ;)
> > However, from a consumer/overlay perspective, what one actually wants to know > if whether the underlying pixel data changed, right? For a consumer that > doesn't support global alpha, it doesn't make a difference. But for a consumer > who does support global alpha - they want to know that the pixels are the same > as last time, and only the global alpha changed, so they can use the old object > and transform it accordingly. Or at least that's how I imagine it. Is this what > APIs would support? Or does it not work like that anyway? Yes, I think it does, good point! A renderer supporting global-alpha itself should not need to re-upload unchanged pixel-data, This should be handled. > (Ultimately it's just > an optimisation, so not a blocker). If so, one could introduce an additional > seqnum for the underlying pixels excl. global alpha basically. ... and maybe add GstVideoOverlayFormatFlags param to gst_video_overlay_rectangle_get_seqnum(). (Although the API has already been released?) > But then, > perhaps a consumer that does support global alpha should just check the > GstBuffer pointers it gets Also a possibility, one that would already be possible without further changes, but maybe a bit hackish? > - we could put a seqnum into GST_BUFFER_OFFSET() or > so which could be used along the same lines. You lost me here. > > On a side note, if the seqnum of the rectangle is updated like that in > _rectangle_set_global_alpha(), the composition the rectangle belongs to (if > any) won't know about the change, so min_seq_num_used might not be as high as > it could be. Having said that, we don't provide API to query min_seq_num yet, > so it doesn't matter anyway. True, but this depends also on the usage of the API on the producer side. As there is no _composition_remove_rectangle() the only way for the producer to update an already existing rectangle for changed data is _composition_unref(cached_comp); _composition_new(cached_rect); foreach (cached_rects as r) ... _add_rectangle(r) and practically (as does textoverlay and our plugin by now), for convenience these steps are done per _chain() call, assuming that the renderer/vaapisink checks seq_nums per rectangle. This is, btw, a bit unclear in the API description. Also draft-subtitle-overlays.txt is not very clear about the expected handling of composition-lifetime vs. rectangle-lifetime. > > > > - The current implementation is 'optimized' for speed and saving memory. I > > first tried an implementation integrating nicely into the internal caching of > > preprocessed/scaled rectangles, meaning that after each application of > > global_alpha the resulting pixel-data was added to the rectangle cache. This > > turned out to be too inefficient in terms of CPU-usage and memory consumption > > Why inefficient in terms of CPU usage? > > As for memory consumption: there's no reason why one can't remove cached > rectangles. It would be perfectly reasonable to only keep "the last" modified > rectangle around. Well, the showstopper when testing the first code was actually CPU usage due to the massive memdup/memcpy done in each fading step/per frame for multiple composition rects. Therefore keeping "only the last" in the scaled_rectangles list is already too expensive as copying memory is involved. But "the last" is actually already always there, because the result of the last global_alpha multiplication is always in GstOverlayRectangle.pixels ;-) Ok, for the rest: will change or verify. Will also try again for the test-cases, but your hints for getting ORC work did not work here (timeout due to non-returning calls), so will have problems testing the tests :-) Please let me know how you decide for the seq_num issue, to avoid further re-coding ;-) Thanks, Holger
> > But then, perhaps a consumer that does support global > > alpha should just check the GstBuffer pointers it gets > > Also a possibility, one that would already be possible without further changes, > but maybe a bit hackish? > > > - we could put a seqnum into GST_BUFFER_OFFSET() or > > so which could be used along the same lines. > > You lost me here. A consumer can't 100% rely on the buffer pointer values to identify the same buffer (unless it keeps a ref to the buffer itself), since the buffer structure could be recycled by GSlice etc. GST_BUFFER_OFFSET() is a field in the buffer structure, which we could abuse to store a seqnum, which could then be used to uniquely identify buffers. > True, but this depends also on the usage of the API on the producer side. > As there is no _composition_remove_rectangle() the only way for the producer to > update an already existing rectangle for changed data is > > _composition_unref(cached_comp); > _composition_new(cached_rect); > foreach (cached_rects as r) > ... _add_rectangle(r) They should be able to do: comp = composition_make_writable (comp); rect = get_rectangle (comp, 0) rectange_set_global_alpha (rect, X) I didn't see the point of doing a remove_rectangle, since you have to make the composition writable anyway before doing that, and then you could just not bother and not cache the composition in the producer and only keep on to the rectangles and always create a new composition. But we could add a _remove_rectangle() or _steal_rectangle() if that makes things easier. > This is, btw, a bit unclear in the API description. Also > draft-subtitle-overlays.txt is not very clear about the expected handling of > composition-lifetime vs. rectangle-lifetime. Perhaps that could be expanded upon. It depends on the elements involved really. So far, all that caching is mostly theoretical (I think), not sure it's implemented anywhere (perhaps in gst-vaapi?) > Please let me know how you decide for the seq_num issue, to avoid further > re-coding ;-) I don't know what to do about that right now, let's just update the seq_num as you do and ignore the issue for now. I think this needs more thought. Perhaps document in _get_seqnum() that those aren't as useful as they may seem.
(In reply to comment #15) > > A consumer can't 100% rely on the buffer pointer values to identify the same > buffer (unless it keeps a ref to the buffer itself), since the buffer structure > could be recycled by GSlice etc. GST_BUFFER_OFFSET() is a field in the buffer > structure, which we could abuse to store a seqnum, which could then be used to > uniquely identify buffers. I see > > I didn't see the point of doing a remove_rectangle, since you have to make the > composition writable anyway before doing that, and then you could just not > bother and not cache the composition in the producer and only keep on to the > rectangles and always create a new composition. But we could add a > _remove_rectangle() or _steal_rectangle() if that makes things easier. Maybe for convenience, but everything is already possible with what is there. I think, I was just expecting something else, and was a bit confused about keeping rectangles and recreating their container in each step -- coming from an application context, where a 'composition' (or its equivalent) is something more static: you add children to it in response to key-events, and remove them when they're faded out ... ;-) I'm fine with the current interface. Yes, maybe a more verbose usage documentation in the overview part. > > I don't know what to do about that right now, let's just update the seq_num as > you do and ignore the issue for now. I think this needs more thought. Perhaps > document in _get_seqnum() that those aren't as useful as they may seem. Ok, added something there. > So the case where you need to apply global alpha on the rectangle, but no > scaling is needed/wanted is handled in the "scaling code path" as well, right? Yes, it was, which is in fact an unnecessary step. I changed this, with the new patch, no interim cached rect is created anymore. Thanks for your comments, I changed the minor details.
Created attachment 210274 [details] [review] Add support for global alpha multiplicator updated the patch
Created attachment 210277 [details] [review] Unit tests for global alpha Managed to write up some test-cases, too ;-) (Recompiling liborc from git fixed my problems). I was complete in iterating over all combinations of premultiplied/unpremultiplied vs. scaled/not-scaled vs. global alpha applied/not applied, and also tested the more theoretical combinations. Just throw away whatever you think is not necessary. Also note, that the tests take account of the fact that you end up with rounding errors, when you apply global-alpha to premultiplied pixel-data. Think this is a reasonable tradeoff you pay for performance.
Great, let's get this in: commit 35a17ac152a3563875624988eb94d3209aa19e80 Author: Tim-Philipp Müller <tim.muller@collabora.co.uk> Date: Sun Mar 25 00:31:41 2012 +0000 video: overlay-composition: blending micro-optimisation commit 79953f27a86c9c550e2ccaa68eb37d56ab63575a Author: Tim-Philipp Müller <tim.muller@collabora.co.uk> Date: Sun Mar 25 00:22:29 2012 +0000 video: overlay-composition: try to avoid floating point maths in inner loop Try to avoid floating point maths for each pixel to be blended in inner loop, and try to avoid the multiplication entirely for the most common case of the global alpha being 1. Could probably be refactored a bit more. commit 32679e1826267a7b63a69bfbc7f0f4c9c6addb15 Author: Tim-Philipp Müller <tim.muller@collabora.co.uk> Date: Sat Mar 24 19:47:10 2012 +0000 video: overlay-composition: some minor clean-ups extract_alpha and apply_global alpha always return TRUE really, so just do away with the return value. Convert a g_return_if_fail() into a g_assert(), since this is only to check internal consistency and not a guard for public API. Add some locking. https://bugzilla.gnome.org/show_bug.cgi?id=668483 commit 9d1b331004bb1d6dba91eaea294af6adcadc29a3 Author: Holger Kaelberer <hk@getslash.de> Date: Sat Mar 24 19:38:26 2012 +0000 tests: add unit test for video overlay composition global alpha support https://bugzilla.gnome.org/show_bug.cgi?id=668483 commit 76c0881549e73efb4995ac8b38d596d51d1cc0fe Author: Holger Kaelberer <hk@getslash.de> Date: Sat Mar 24 19:31:29 2012 +0000 video: overlay-composition: add support for global alpha multiplicator https://bugzilla.gnome.org/show_bug.cgi?id=668483
PS: I'm sure people would be interested to have a peek at your OSD elements based on this. Are they available somewhere?
Thanks Our OSD plugin is part of a project for one of our customers. I will clarify whether they agree on publishing it ...