GNOME Bugzilla – Bug 741030
theoradec: Sets video-meta width/height from padded values
Last modified: 2015-03-13 12:17:22 UTC
It looks like theoradec is setting width/height in the GstVideoMeta to padded width and height, which leads to assertion in VideoFrame when mapping. Width an and height should be the same as caps in GstVideoMeta, non-padded. ** (gst-launch-1.0:4595): CRITICAL **: gst_video_frame_map_id: assertion 'info->height == meta->height' failed Program received signal SIGTRAP, Trace/breakpoint trap.
+ Trace 234391
Thread 140737181808384 (LWP 4603)
$1 = (GstVideoInfo *) 0x7fffdc004158 (gdb) print *info $2 = {finfo = 0x7ffff0d8abe8 <formats+488>, interlace_mode = GST_VIDEO_INTERLACE_MODE_PROGRESSIVE, flags = GST_VIDEO_FLAG_NONE, width = 1920, height = 1080, size = 3110400, views = 1, chroma_site = GST_VIDEO_CHROMA_SITE_NONE, colorimetry = {range = GST_VIDEO_COLOR_RANGE_16_235, matrix = GST_VIDEO_COLOR_MATRIX_BT601, transfer = GST_VIDEO_TRANSFER_BT709, primaries = GST_VIDEO_COLOR_PRIMARIES_UNKNOWN}, par_n = 1, par_d = 1, fps_n = 24, fps_d = 1, offset = {0, 2073600, 2592000, 0}, stride = {1920, 960, 960, 0}, plane_size = { 2073600, 518400, 518400, 0}} (gdb) print *meta $3 = {meta = {flags = (GST_META_FLAG_POOLED | GST_META_FLAG_LOCKED), info = 0x7fffd8002800}, buffer = 0x7fffdc0051b0, flags = GST_VIDEO_FRAME_FLAG_NONE, format = GST_VIDEO_FORMAT_I420, id = 0, width = 1920, height = 1088, n_planes = 3, offset = {0, 2088960, 2611200, 0}, stride = {1920, 960, 960, 0}, map = 0x7ffff0b3a740 <default_map>, unmap = 0x7ffff0b3a8dd <default_unmap>}
Are you sure GstVideoMeta is supposed to be non-padded values ? I do not see documentation of whether yes or no, but I do see there is also a GstVideoCropMeta, and GstVideoMeta contains stride data, which would seem to make more sense if it was to use padded values.
No, it's being discussed. My own reflection was that having padded width/height around would allow receiver to validate the crop area instead of going out of bound if the crop is wrong. At the same time, theoradec uses 16x16 alignment and avoid chroma shifting issues where needed. Base on these documented constraints we could replace this code with VideoAlignment as being done in gst-libav. The benefit is that downstream don't need to know there is padding, all they need to handle is the stride.
*** Bug 741894 has been marked as a duplicate of this bug. ***
IMHO the width/height in the videometa should be the width/height that can be used to map the corresponding buffer and properly access all the pixels of the buffer. As a consequence that would mean that it shouldn't take the crop meta into account in general. The crop meta is something that is applied *on top* of the frame contained in the buffer, not something that is applied to the frame in the buffer (i.e. updating all the metas and buffer size etc). But this also means that the width/height on the caps are different to the width/height inside the videometa when the cropmeta is used. Because the width/height in the caps is supposed to be the visible one. So elements that support the cropmeta will have to properly handle the cropmeta (they claim to support it in the ALLOCATION query!), which means that they should also be aware of the uncropped width/height and use that to map the buffer (using the cropped width/height to map the buffer will break things!). An intelligent element could of course apply the crop meta to the buffer itself in some cases, but in that case there would be no crop meta left. E.g. for non-subsampled formats you could just update the width/height, strides and plane offsets inside the videometa.
Created attachment 295858 [details] [review] theoradec: Fix decoding in the presence of GstVideoCropMeta Store the video info of the internal frame decode width/height separate to the exposed (cropped) frame info, so that it can be used for mapping the downstream allocated video frame buffer correctly when using GstVideoCropMeta. Fixes playback of files with sizes that aren't a multiple of 16-pixels width or height.
It's a bit late to redefine which width/heights we pass around in GstVideoMeta and GstVideoCropMeta, isn't it?
(In reply to comment #6) > It's a bit late to redefine which width/heights we pass around in GstVideoMeta > and GstVideoCropMeta, isn't it? I don't think it has ever been defined before. This assert in GstVideoFrame is the first time we enforce something. The problem with setting cropped size in VideoMeta is mostly conceptual. Basically, a crop from theroadec can fall outside the "map region". This is prone to errors like short allocation, and also prevent generic API like GstVideo to validate anything. There would be another benefit on having VideoMeta having padded size, which would be that video filter could work just fine even if they don't know about crop meta. They would apply the effect on the entire surface. This would be a bit too much work, but would be fine for uniform effects like color transform (for overlays, not so much though). Note that I still think for this decoder, we should use the video alignment like libav do. The nature of padding is exactly the same.
(note that meanwhile, I have no personal objection in merging this patch, as it would be no different situation then it was before, but would work at least)
There are only a few cases where a transform could work on the entire padded surface safely, really. You can't even do colourspace transforms on them unless you can guarantee there'll be not subpixel blending from undefined padding pixels outside the cropped area. Perhaps it would be better to add new fields to GstVideoMeta to provide the entire uncropped size, or a new GstVideoBufferMeta (GstVideoMapMeta?) with a separate GstVideoInfo
(In reply to comment #9) > There are only a few cases where a transform could work on the entire padded > surface safely, really. You can't even do colourspace transforms on them unless > you can guarantee there'll be not subpixel blending from undefined padding > pixels outside the cropped area. Well, in the scenario you are using a crop, it is expected that the full padded image is sub-sampled aligned. The crop portion may not be though. If the crop area is what is sub-sample aligned, then you should be using VideoAlignment like libav is doing. Note that this is hypothetical, just like libav, theora alignment respect the subsampling, so both full image and sub-image are in fact aligned. > > Perhaps it would be better to add new fields to GstVideoMeta to provide the > entire uncropped size, or a new GstVideoBufferMeta (GstVideoMapMeta?) with a > separate GstVideoInfo The width and hight in VideoMeta is alread a duplicate from caps (hence use less if cropped). So I prefer Sebastion suggestion on comment 4. Which makes this size useful.
(In reply to comment #10) > (In reply to comment #9) > > There are only a few cases where a transform could work on the entire padded > > surface safely, really. You can't even do colourspace transforms on them unless > > you can guarantee there'll be not subpixel blending from undefined padding > > pixels outside the cropped area. > > Well, in the scenario you are using a crop, it is expected that the full padded > image is sub-sampled aligned. The crop portion may not be though. If the crop > area is what is sub-sample aligned, then you should be using VideoAlignment > like libav is doing. Note that this is hypothetical, just like libav, theora > alignment respect the subsampling, so both full image and sub-image are in fact > aligned. I meant in the general case, a transform can't operate on the padded buffer contents directly. There are a few specific cases where it can - such as converting from I420 to some other colourspace that has a 1:1 mapping between input and output pixels, but in the general case it can't - so all those elements need to understand crop metas anyway. Specific filters that always calculate one output pixel from the contents of one input pixel could work, but we don't really have a lot of those - maybe only videobalance? A colourspace conversion from RGB to I420, for example, can't operate on the uncropped buffer, because it will calculate sub-sampled chroma pixels at the edges that are corrupted by undefined data outside the crop region. > > Perhaps it would be better to add new fields to GstVideoMeta to provide the > > entire uncropped size, or a new GstVideoBufferMeta (GstVideoMapMeta?) with a > > separate GstVideoInfo > > The width and hight in VideoMeta is alread a duplicate from caps (hence use > less if cropped). So I prefer Sebastion suggestion on comment 4. Which makes > this size useful. I agree it's weird that you can't look at an allocated video buffer and work out which total useful working area it has, and that the extra accounting elements need to do seems wrong.
Created attachment 295905 [details] Theora video encoded from 1025x1025 with pic_x = pic_y = 1 The provided patch still fails with this video.
Failing pipeline: gst-play-1.0 /home/nicolas/test1.ogg --videosink="ximagesink"
An interesting fact is that the 1025x1025 video show up as 1026x1026
(In reply to comment #12) > Created an attachment (id=295905) [details] > Theora video encoded from 1025x1025 with pic_x = pic_y = 1 > > The provided patch still fails with this video. This video plays with xvimagesink, but is output as 1026x1026. It fails using ximagesink, with an assertion: ** (gst-play-1.0:9029): CRITICAL **: gst_video_frame_map_id: assertion 'info->width == meta->width' failed This assertion actually happens in videoconvert, because the GstVideoMeta on the buffer it is trying to map actually contains a width of 1040.
There is something interesting going on during the decoding: theoradec gets the header packet, and then negotiates to 1026x1026:
+ Trace 234606
Nice work, bugzilla. Failed to past the rest of the comment after the traceback. So, after the type_packet, theora proceeds to the decode a buffer, at which point it renegotiates to 1040x1040 uncropped buffer size, while keeping the caps the same. videofilter/videoconvert don't like that, and fails to map the result.
+ Trace 234607
As I mentioned on IRC before, I think this should work the following way: - CAPS event contains the display width/height (otherwise applications will become unhappy, GstDiscoverer reports wrong information, etc) - caps inside the ALLOCATION query contain the memory width/height - GstVideoMeta contains the memory width/height - GstVideoCropMeta contains how much should be cropped relative to the values inside the video meta and the ALLOCATION query caps Every other way of doing things will break in one way or another and is inconsistent. The above requires that every element that claims to support the CROP meta to behave like this instead of doing random things like they do currently. Every element that actually handles the CROP meta will need all 3 information available anyway (display size, memory size, crop size... and from 2 of this you can calculate the other). As a side note, this requires to pass the GstVideoInfo from the memory width/height caps to be passed to gst_video_frame_map()... which is the only reasonable value anyway. The display caps have completely different strides, plane offsets, etc.
If you have memory size + crop size, you can calculate display size, but other combinations will be missing some piece of info. I agree this sounds like the best way to make things work. I wonder what things like gst-libav decoders are doing at the moment - what are the caps on the allocation query? Why do things generally seem to work?
To answer my own question - it 'mostly works' because the only things using GstVideoCropMeta so far are x[v]imagesink, theoradec, mpeg2dec, the vaapi elements and a couple of other video sinks.
This used to work in 1.4, so should really be fixed for 1.6. Also happens with real-world files, e.g. gst-play-1.0 http://download.blender.org/durian/trailer/sintel_trailer-1080p.ogv
The only change since 1.4 is someone added asserts that things were mapping buffers with the right videoinfo, right?
In which case, I think the only reason that clip works in 1.4 is because it's OK to map a 1920x1088 pixel I420 buffer using 1920x1080 GstVideoInfo
Yes, it only fails now because we added assertions to check if parameters actually make sense. Or put differently, it only worked before by accident although code was passing wrong parameters :)
I get that, but it clearly seems to have worked for most people most of the time (seeing that we didn't get a lot of bug reports indicating otherwise either). It doesn't really matter why this did work. We should make it work again one way or another.
Sure, see comment #18. And we might want to remove the assertions again for a few major releases and convert them into GST_ERRORs.
The reason is never been an issue, is that most encoder use 0,0 as start offset (top-left aligned), which means the visible region is always inside the expected map region. This basically mean, the crop meta isn't being used in real life (including sintel here).
(In reply to Sebastian Dröge (slomo) from comment #18) > As I mentioned on IRC before, I think this should work the following way: > - CAPS event contains the display width/height (otherwise applications will > become unhappy, GstDiscoverer reports wrong information, etc) > - caps inside the ALLOCATION query contain the memory width/height > - GstVideoMeta contains the memory width/height > - GstVideoCropMeta contains how much should be cropped relative to the > values inside the video meta and the ALLOCATION query caps > > Every other way of doing things will break in one way or another and is > inconsistent. > > The above requires that every element that claims to support the CROP meta > to behave like this instead of doing random things like they do currently. > Every element that actually handles the CROP meta will need all 3 > information available anyway (display size, memory size, crop size... and > from 2 of this you can calculate the other). > > As a side note, this requires to pass the GstVideoInfo from the memory > width/height caps to be passed to gst_video_frame_map()... which is the only > reasonable value anyway. The display caps have completely different strides, > plane offsets, etc. I think this design (or redesign ?) works for this use case. It's going to require a large review and rework. About the allocation caps, a lot of baseclass will not cooperate. I think for a single (and aging) decoder, it's not worth before next development phase. We could simply always copy crop. (In reply to Jan Schmidt from comment #19) > I agree this sounds like the best way to make things work. I wonder what > things like gst-libav decoders are doing at the moment - what are the caps > on the allocation query? Why do things generally seem to work? gst-libav uses another mechanism. GstVideoAlignement. This mechanism is perfect fit as long as your crop area is aligned according to the pixel sub-sampling. Theora only recommend respecting this alignment, but does not enforce it. Allowing any random crop rectangle to be used. Going through the theora file I have, none (other then that one I have crafted) are miss-aligned. While the code seems to work with miss-aligned crop area, I'm not sure it's handling the chroma sub-sampling and sitting correctly. With the video alignment, how it works is that we setup a GstVideoMeta with proper offset and strides. It's a bit like creating a sub-image, where by keeping the stride you can offset the start of each plane and reduce the size to compensate.
Created attachment 298215 [details] [review] [PATCH] theoradec: Disable usage of crop meta This is a temporary workaround that simply disables usage of crop meta for now. https://bugzilla.gnome.org/show_bug.cgi?id=741030 --- ext/theora/gsttheoradec.c | 5 +++++ 1 file changed, 5 insertions(+)
That patch is in case we don't make it before 1.6. I won't merge it unless I am asked to.
(In reply to Nicolas Dufresne (stormer) from comment #28) > I think this design (or redesign ?) works for this use case. It's going to > require a large review and rework. About the allocation caps, a lot of > baseclass will not cooperate. I think for a single (and aging) decoder, it's > not worth before next development phase. We could simply always copy crop. I think this is how it always was supposed to be, and that the ALLOCATION query always contains the uncropped/memory caps while the CAPS event contains the cropped/display caps was necessary before already. The only difference is that we are now more picky about what is allowed to be passed to gst_video_frame_map(). So I don't think it's a very big change
(In reply to Sebastian Dröge (slomo) from comment #31) > I think this is how it always was supposed to be, and that the ALLOCATION > query always contains the uncropped/memory caps while the CAPS event > contains the cropped/display caps was necessary before already. The only > difference is that we are now more picky about what is allowed to be passed > to gst_video_frame_map(). So I don't think it's a very big change It's a big change because you need to review all the filters that exist to make sure they reject CROP meta if they don't support it. It's big change, because baseclasses are the one to choose the caps in the allocation query. It's also a big change because most sink expect the caps and the allocation caps to be the same.
So how do we proceed here? It seems there are four issues: a) base classes don't allow to set the caps in the ALLOCATION query, resulting in them being the display/cropped caps. So the sink only gets them in the query, but later e.g. the decoder sets the uncropped caps in the buffer pool configuration. That was always the case in theoradec and mpeg2dec before, so not really a new problem. b) theoradec (and possibly other code) assumes that passing a "random" video-info to GstVideoFrame API does something sensible. It now causes an assertion as it should Jan's patch should fix that, but apparently it breaks something else for some reason. Not sure why, it should make no difference to what we had before. c) Nicolas thinks that by default basetransform passes through metas in the ALLOCATION query if the subclass runs in in_place mode. I didn't check this but I hope it's not true and something else was weird in his tests :) basetransform should drop all metas by default, that's why the filter_meta vfunc exists. Subclasses like GstVideoFilter will than retain GstVideoMeta but will drop everything else. d) Many filters and textoverlay and things claim to support CROP meta, assuming that they don't need to do anything and in the worst case just also do their work on the bigger area and are just a bit slower because of that. That's wrong though, at least for elements that are positioning something. It will work as long as only cropping at the bottom or left is done, but otherwise the elements will have to translate. All these are not new problems except for b), as such don't seem like blockers. b) should be fixed, and Jan's patch should do that and we should get it in after making sure it doesn't break anything. The others should be fixed at some point Opinions?
(In reply to Sebastian Dröge (slomo) from comment #33) > c) Nicolas thinks that by default basetransform passes through metas in the > ALLOCATION query if the subclass runs in in_place mode. I didn't check this > but I hope it's not true and something else was weird in his tests :) > basetransform should drop all metas by default, that's why the filter_meta > vfunc exists. Subclasses like GstVideoFilter will than retain GstVideoMeta > but will drop everything else. I've checked, it's not true indeed, it drops all the metas by default. Which is the most sensible thing to do. So no worries there.
(In reply to Sebastian Dröge (slomo) from comment #33) > d) Many filters and textoverlay and things claim to support CROP meta, > assuming that they don't need to do anything and in the worst case just also > do their work on the bigger area and are just a bit slower because of that. > That's wrong though, at least for elements that are positioning something. > It will work as long as only cropping at the bottom or left is done, but > otherwise the elements will have to translate. Bottom or *right* but yes. In rare cases (color convert) it could work if the crop rectangle is inside the caps width and height. Arguably, should the caps be cropped (and allocation query caps not) ? It would help to know what to do if videocrop wanted to apply a crop meta.
(In reply to Sebastian Dröge (slomo) from comment #33) > All these are not new problems except for b), as such don't seem like > blockers. b) should be fixed, and Jan's patch should do that and we should > get it in after making sure it doesn't break anything. The others should be > fixed at some point > For your interest, Jan patch would fail in gst-validate small run. Didn't have time to dig into it.
(In reply to Nicolas Dufresne (stormer) from comment #35) > Arguably, should the caps be cropped (and allocation query caps not) ? > It would help to know what to do if videocrop wanted to apply a crop meta. That's the idea. Caps in the CAPS event and query are the display caps, in the ALLOCATION query they should be the uncropped caps (and in the bufferpool config). Currently they are not in the ALLOCATION query due to missing API in base classes.
(In reply to Nicolas Dufresne (stormer) from comment #36) > (In reply to Sebastian Dröge (slomo) from comment #33) > > All these are not new problems except for b), as such don't seem like > > blockers. b) should be fixed, and Jan's patch should do that and we should > > get it in after making sure it doesn't break anything. The others should be > > fixed at some point > > > > For your interest, Jan patch would fail in gst-validate small run. Didn't > have time to dig into it. Should we just get it in for now though? I mean currently gst-validate will fail with the assertion in gst_video_frame_map(), and the patch is at least a step in the right direction. And having it merged might allow us to easily get some more information on what goes wrong
(In reply to Sebastian Dröge (slomo) from comment #38) > (In reply to Nicolas Dufresne (stormer) from comment #36) > > (In reply to Sebastian Dröge (slomo) from comment #33) > > > All these are not new problems except for b), as such don't seem like > > > blockers. b) should be fixed, and Jan's patch should do that and we should > > > get it in after making sure it doesn't break anything. The others should be > > > fixed at some point > > > > > > > For your interest, Jan patch would fail in gst-validate small run. Didn't > > have time to dig into it. > > Should we just get it in for now though? I mean currently gst-validate will > fail with the assertion in gst_video_frame_map(), and the patch is at least > a step in the right direction. And having it merged might allow us to easily > get some more information on what goes wrong No, currently gst-validate don't hit these assertion. It breaks the case where there is no cropping, the one that still works, the only one currently tested in gst-validate iirc.
The only quick fix is to remove the assertions (maybe turned them into warnings)
(In reply to Nicolas Dufresne (stormer) from comment #39) > (In reply to Sebastian Dröge (slomo) from comment #38) > > (In reply to Nicolas Dufresne (stormer) from comment #36) > > > (In reply to Sebastian Dröge (slomo) from comment #33) > > > > All these are not new problems except for b), as such don't seem like > > > > blockers. b) should be fixed, and Jan's patch should do that and we should > > > > get it in after making sure it doesn't break anything. The others should be > > > > fixed at some point > > > > > > > > > > For your interest, Jan patch would fail in gst-validate small run. Didn't > > > have time to dig into it. > > > > Should we just get it in for now though? I mean currently gst-validate will > > fail with the assertion in gst_video_frame_map(), and the patch is at least > > a step in the right direction. And having it merged might allow us to easily > > get some more information on what goes wrong > > No, currently gst-validate don't hit these assertion. It breaks the case > where there is no cropping, the one that still works, the only one currently > tested in gst-validate iirc. I'm not sure how gst-validate would fail with this patch applied then - uncropped should have both the allocation and caps GstVideoMeta be the same and work regardless.
I ran gst-validate, but I don't see the failure you mentioned above with the theoradec patch applied. The only failure I can generate is when trying to play a video with odd width/height, gst-validate complains that the theoradec output caps have width=1026, which doesn't match what oggdemux is putting on the theoradec sinkpad (width=1025). Can you provide some more detail about the gst-validate failure you saw so I can reproduce it?
I'm re-running it with today's version, might have been a bug in the test. Do you realize that the performance gain of using crop meta for this decoder is negligible ? We always copy out anyway and the size difference between cropped and un-cropped is really small. I think the only thing it may improve is color correctness if chroma is not aligned (which is never the case in correctly encoded streams). And for this case it fails right now it seems (unless the attached stream is just proven broken, I don't know, I can try to generate more such streams). It feels like this failure is really important to fix, otherwise it's not worth keeping the crop code.
Indeed tests are clean, but my worry remains, since it fails with the provided test file and: gst-launch-1.0 filesrc location=test.ogg ! oggdemux ! theoradec ! videoconvert ! ximagesink ** (gst-launch-1.0:15368): CRITICAL **: gst_video_frame_map_id: assertion 'info->width == meta->width' failed Which is extremely weird, since in this case, theoradec should be cropping by itself.
Review of attachment 295858 [details] [review]: As there is no more visible regression now.
Review of attachment 298215 [details] [review]: Hopefully we don't need this one.
I talked about the ximagesink case a bit in comment #15/16/17 - that seems to be about the general confusion / mismatch on how cropmeta interacts with caps and the allocation/buffer GstVideoMetas. We need to extend the base classes to let sub-classes manage the separate GstVideoMetas properly - we should do that in a separate bug.
Well, in the ximagesink case, because videoconvert does not support CROP_META, there should be no CROP_META ;-P
Actually, videoconvert pretend to support CROP META. I'm not sure this is right. er the less, this use to work. The idea was that video_frame_map() was a convenience so you pass whatever you know about the caps, and it will figure-out the right size from the meta. Now that we have assertions, the convenience is gone. I think in the end the assertions are wrong, they should only make sure that the meta size is bigger or equal to the provided info.
I think I disagree with that. It seems dangerous to just allow inconsistent knowledge of the dimensions between what the element knows and what comes in the buffers. It seems like a problem waiting to happen.
I actually sent this patch, so at least while figuring-out we don't have a regressed state. commit 274984e83bf1bd501ae9ce2111e3a0b86597ecfb Author: Nicolas Dufresne <nicolas.dufresne@collabora.com> Date: Fri Mar 13 10:30:43 2015 +0000 video-frame: Relax width/height assertion When doing CROP META it is exepcted that the width and/or height in the GstVideoMeta is bigger or equal to the caps negotiated size. Generally speaking, the concern is mostly that the padded width/height is not being negotiated. That the size in the meta is bigger can lead to visual corruption if there is bugs, but not crash. The idea is that video frame allow not having to care about VideoMeta and pool caps (actually pool caps are only relevant for setting up the allocation, nothing else should be concerned).