GNOME Bugzilla – Bug 611157
video: API to signal stereoscopic and multiview video
Last modified: 2015-08-16 13:40:48 UTC
Videostreams in GStreamer need to be enhanced to support interleaving for stereoscopic video. Right now we have flags for interlaced video only [4]. 3D video can be packges as interlaced or side by side. In the case of side by side, it can be left/right or top/bottom packing. Left/right is popular for images as it also allows parallel or cross-eyed viewing. Top/bottom is more efficient as one can process the video without strides. I'd like to get comments how we could support this in order to have a agreed opinion for a potential GSoC project. http://gstreamer.freedesktop.org/wiki/Video3DSupport
I wonder if we rather would want to have the packing (side by side/over under/..) in the caps. Could be done similar to channels and channel-positions for audio.
After more thinking and research I am convinced we need caps extensions: The caps would tell whether it is a multichannel videostream (channels={1,2}) and in case of 2 how the frames are packed(channel-layout={mono,separate,stereo-interleaved,stereo-over-under,stereo-left-right}). The flags on the buffers would flag wheter it’s the left or right frame for channel-layout=separate (misusing GST_VIDEO_BUFFER_TFF is out of question as e.g. MVC Stereo High profile supports interlaced too).
I think we can drop the buffer flags and for now only support packed video (both views in one buffer): There is now some discussion about new caps here: http://gstreamer.freedesktop.org/wiki/NewGstVideo
Wim was suggesting to use http://cgit.freedesktop.org/~wtay/gstreamer/tree/?h=buffermeta for this, but I don't see how this can help here. We could get this in without the flags as the industry see to agree and the packed layouts.
How do you detect that an .avi (or .mov or .flv) demuxer video output is a 3D video stream? Does the 3D tags (or mime-type) are already defined for the most populars muxer/demuxer?
For avi/flv you will need to let the user tell that it is e.g. side by side. Yes, that is stupid, as e.g. for avi one could easily introduce a new chunk for it. Imho there is an extension for mp4/3gp in the work to add that info. Also if the code is h264 MVC it could be a stereo stream.
Any progress on this? Or are we waiting for 0.11 now? ;)
In ITU spec for H264 (annex D2.22 and annex H) it seems that the stereo video information is coded in SEI, I think that h264parse should be improved to extract this info. ITU spec: http://www.itu.int/rec/T-REC-H.264-201003-I/en I don't know how other codec like VC1 or VP8 handle 3D video.
I think we should do it in 0.11. Extracting the info e.g. in h264parse and mp4demux would be good preparatory step, but I don't see how to proceed if we miss the caps details and buffer flags.
(In reply to comment #8) > I don't know how other codec like VC1 or VP8 handle 3D video. Stereoscopic profiles were also added to MPEG-2 last year.
(In reply to comment #9) > I think we should do it in 0.11. Extracting the info e.g. in h264parse and > mp4demux would be good preparatory step, but I don't see how to proceed if we > miss the caps details and buffer flags. The problem is stereo and multiview buffer flags are not expressive enough, though they could serve as a hint that some more descriptive meta data is attached to the buffer. e.g. for side by side, you also have options to flip one view. Besides, the MVC standard is usually about "constituent frames" (0 or 1), the actual meaning of those is described in the SEI message. (just adding myself to Cc: for now, getting back later).
Adding myself to cc. The following bugs might be interesting for you guys(both are related with MVC): https://bugzilla.gnome.org/show_bug.cgi?id=694346 https://bugzilla.gnome.org/show_bug.cgi?id=685215
I have created a new bug for tracking mvc stream parsing support in h264_videoparser element. https://bugzilla.gnome.org/show_bug.cgi?id=696135 . Any comments regarding Stefan's caps proposal in comment_2 ?
This proposal is likely completely outdated, we need to re-think this in 1.0 terms. There is already some provision in the new video API, such as having multiple views/fields, but I think more signalling is needed. We need to see what the common variants/needs are, I don't think we can get away with something overly simple. (Note: I didn't actually look at the proposal again at this point)
Okay, I am taking some initiative to continue this work :) I would like to propose the following, typedef enum { GST_VIDEO_VIEW_TYPE_MONO, GST_VIDEO_VIEW_TYPE_STEREO, GST_VIDEO_VIEW_TYPE_MULTI } GstVideoViewType; typedef enum { GST_VIDEO_STEREO_SEQUENTIAL_PROGRESSIVE, GST_VIDEO_STEREO_SEQUENTIAL_ROW_INTERLEAVED, GST_VIDEO_STEREO_PACKED_ROW_INTERLEAVED, GST_VIDEO_STEREO_PACKED_COLUMN_INTERLEAVED, GST_VIDEO_STEREO_PACKED_SIDE_BY_SIDE, GST_VIDEO_STEREO_PACKED_TOP_BOTTOM, GST_VIDEO_STEREO_PACKED_CHECK_BOARD_INTERLEAVED, } GstVideoStereoType; typedef enum { GST_VIDEO_STEREO_TYPE_LEFT_VIEW_FIRST = (1 << 0), /* or we can keep separate flags for all combination. eg: LEFT_FRAME_AS_LEFT_VIEW, TOP_FRAME_AS_LEFT_VIEW etc: unnecessary but more clarity*/ GST_VIDEO_STEREO_TYPE_LEFT_VIEW_FLIPPED = (1 << 2) , GST_VIDEO_STEREO_TYPE_RIGHT_VIEW_FLIPPED = (1 << 3) } GstVideoStereoContentType;
Additional to this something will need to be defined to map the different views of GstVideoFrame (i.e. the id in gst_video_frame_map_id()) to the left/right/whatever frame. What's the use case of GST_VIDEO_VIEW_TYPE_MULTI?
(In reply to comment #16) > Additional to this something will need to be defined to map the different views > of GstVideoFrame (i.e. the id in gst_video_frame_map_id()) to the > left/right/whatever frame. > > What's the use case of GST_VIDEO_VIEW_TYPE_MULTI? To identify the MultiViewStreams. StereoScopic stream has only 2 views.
None of these fields or caps are needed, IMO.. See also docs/design/part-mediatype-video-raw.txt Basically: The views property defines the number of views, by default 1, 2 is stereo, >2 is multiview. With GstVideoMeta you define the different views. You can use the strides and offsets to do side-by-side or top/bottom or interleaved etc.. You can access each view (or field of view in interlaced) with the frame id, See also gst_buffer_get_video_meta_id(). Then you define a new GstMeta to define the meaning of each frame id. By default id 0 is left, id 1 is right (0, 2 for interlaced). The new meta is really only interesting if you do multiview so it is left undefined for now. The advantage of this is that it is completely backwards compatible. Color conversion, scaling and display will simple only work as before on frame 0 until they all become multiview aware. What is not possible with this is to interleave each view vertically. My thinking is that this requires (a) new pixel format(s).
I would like to share some details about the StereoscopicContents available: (This is my current understanding.. :)) So far I have seen two types of stereo videos: (for explanation I am taking h264 as stream and mp4 as container format) type1 : This stereo video content storage is based on MPEG Application format. To support this in gstreamer we doesn’t need to change anything in the parser or decoder. Stereo video would be either packed format (two views in a single frame) or sequential frames (only progressive). It supports 2 types of packed formats: 1) side_by_side 2)vertical_interleaved..Here left and right views are just like regular frames. But packed as single frame. This is *NOT* using any h264-stereoscopic extension or MultiView extension to encode the video. Instead each of the two views are packed and encoded as a single frame (using regular h264 profiles..). So for supporting this (i am using qtdemux as example): -- We need to parse the GstVideoStereoType in qtdemux. -- We can also parse the GstVideoStereoContentType in qtdemux. type2: This type is based on the the Annex-H extension of h264-spec .More optimized stereo encoding based on StereoScopic profiles.Unfortunately I haven’t seen these type video samples in any containers :)..We can parse the GstVideoStereoType from qtdemux. But we will get the GstVideoStereoContentType only during stream parsing (SEI palyloads), either from h264parse or from decoders supporting it... There are raw h264 samples available in web which is encoded with this extension and all of them are sequential_view_streams: means, "first_frame_is_left_view/second_frame_is_right_view" or "first_frame_is_right_view/second_frame_is_left_view" type...As per FramePackingArrangeMent info in h264 SEI payload, if the stream type is GST_VIDEO_STEREO_PACKED_ROW_INTERLEAVED, GST_VIDEO_STEREO_PACKED_COLUMN_INTERLEAVED, GST_VIDEO_STEREO_PACKED_SIDE_BY_SIDE, GST_VIDEO_STEREO_PACKED_TOP_BOTTOM, or GST_VIDEO_STEREO_PACKED_CHECK_BOARD_INTERLEAVED we need to do some upsampling for each view based on Figure D-1 to D-11 of H264_spec. Because each view has SAR which is equal to that the SAR of frame. Initially we can support SIDE-BY-SIDE_packed and sequential streams. Need more efforts: For handling GST_VIDEO_STEREO_PACKED_COLUMN_INTERLEAVED, we need some vertical_line_based de-interlacing method/or some elements to do that??... Likewise for GST_VIDEO_STEREO_PACKED_CHECK_BOARD_INTERLEAVED , we need quincunx_sampling.
As a reply to Comment_18: Hi Wim, Sorry, I am bit confused with the VideoMeta approach..Can you please check my comment_19: type1... For the streams which are stored based on Mpeg_Application_Format (I think most of the available S3D videos are based on this,not sure completely..), we will get the view_specific_information in demuxer level. And there is no view_information which is codec specific.So we won't get any view specific information with in decoder or parser. So I think somehow we need to communicate the view_specific information to downstream elements from demuxer itself..AFAIK it is not possible to add GstVideoMeta to buffers with in demuxers!! Am I missing something?
(In reply to comment #20) > ... we will > get the view_specific_information in demuxer level. And there is no > view_information which is codec specific. Ah ok, yes you need something elsewhere then. Either something in the caps or if it changes for each buffer, some flags or metadata.
I'd put it into the caps for encoded streams (e.g. h264) and let the decoder convert that into useful information in the GstVideoMeta (and a new meta for multiview content).
(In reply to comment #22) > I'd put it into the caps for encoded streams (e.g. h264) and let the decoder > convert that into useful information in the GstVideoMeta (and a new meta for > multiview content). Are you preferring to append the stereo_info to the existing codec-data/or to send it as a new codec-data??? Then it will leads to more works in parser i think :)... I prefer to add it as separate caps fields based on my initial proposal(comment 15) and then decoders/postprocessing elements can convert them to GstVideoMeta. I will dig into this a bit more and update later...
New caps field, don't mess with the codec_data :)
okay .. :)..I will add some utility functions to pbutils.
> I will add some utility functions to pbutils. If they're not codec-specific they should probably go into libgstvideo.
(In reply to comment #26) > > I will add some utility functions to pbutils. > > If they're not codec-specific they should probably go into libgstvideo. I think pbutils is the right place since the libgstvideo is for raw-video...I will add some patches first...
Created attachment 239990 [details] [review] pbutils: Add utility functions to handle Stereo video streams.
We might need a new buffer flag also (GST_BUFFER_FLAG_STEREO) to handle stereo_mono_mixed streams.
Hi, I am not convinced by the mono_stereo flag. Either you know that's mono or stereo, or you don't know and you can't really infer anything. :) Some of the stereo video types are not really masks. e.g. what would top-bottom-half|top-bottom-full mean? Probably use a mask that specifies half resolution instead. But then, you have also a possible issue with what would e.g. interleaved-row|half-resolution mean? In the end, why not make the whole thing plain ids like you have for fpa modes? BTW, you also have a gap in the enumerations.
BTW, have you got a chance to download the 2012 MPEG-2 spec with stereo support? You mentioned it in the patch, but marked it as "not implemented yet". I believe they should be similar to H.264 MVC flags.
I like the idea of having a flag mentioning left-view-first, this is symmetric with TFF for interlaced contents and this clearly expresses how to place constituent frames 0 and 1, respectively. Though, I am not convinced this is to be called GstStereoVideoArrangementType. Probably just make it GstStereoVideoFrameFlags?
For multiview, I think the -multi flag is irrelevant because it's up to the downstream element to select the view it wants to render. The decoder will simply produce all decoded frames and the associated view-ids. For that, we simply need to attach some view-id for example.
(In reply to comment #30) > Hi, I am not convinced by the mono_stereo flag. Either you know that's mono or > stereo, or you don't know and you can't really infer anything. :) Until now i have seen that only the MPEG-A is mentioning about stereo_mono mixed stream. The "svmi"(stereo video media information) atom in "stbl" container will give the information about whether this the stream is stereo/mono, how the constituent frames are arranged if it is a stereo frame etc..We will communicate this info to downstream elements through caps. Once it get negotiated, we only need to know whether the incoming buffer is stereo or mono. Which means mono_stereo flag is enough,,right? Does it make sense?? > Some of the stereo video types are not really masks. e.g. what would > top-bottom-half|top-bottom-full mean? Probably use a mask that specifies half > resolution instead. But then, you have also a possible issue with what would > e.g. interleaved-row|half-resolution mean? In the end, why not make the whole > thing plain ids like you have for fpa modes? BTW, you also have a gap in the > enumerations. The stereo types with suffix "HALF" indicates that the resolution has been reduced to half either horizontally or vertically in order to pack them to a single frame. Which means we need an up-sampling operation. I have seen that only MPEG-A's latest Amendment has separate specification type value for half/full arrangement. For other specs it is just top_bottom or side_by_side.And based on the description of up-sampling operations(as per 14496-10), we can assume that these all are belongs to "-HALF" type.We might remove "top-bottom-half|top-bottom-full" and "side-by-side-half|side-by-side-full" enums. I haven't seen any explanation about interleaved-row-{half, full} anywhere. :),,, FPA mode: you mean ,Frame packing arrangement type in h264? Yup, we can remove the "top-bottom-half|top-bottom-full"/"side-by-side-half|side-by-side-full" and make them as plane ids starting from zero. But unfortunately these id values are not unique for all specs :)..So we can provide our own unique ids and map them based on different Schemes like i did in the stereo-video-utils.c .
(In reply to comment #31) > BTW, have you got a chance to download the 2012 MPEG-2 spec with stereo > support? You mentioned it in the patch, but marked it as "not implemented yet". > I believe they should be similar to H.264 MVC flags. I haven't seen any 2012 release of MPEG-2 !!! But there are Amendments/corrigendas for mpeg-2:2000 . Do you have any link to buy/download the 13818-2:2012 (I don't think ISO/IEC published something for mpeg2 in 2012,,i could be wrong also:))?
(In reply to comment #35) > (In reply to comment #31) > > BTW, have you got a chance to download the 2012 MPEG-2 spec with stereo > > support? You mentioned it in the patch, but marked it as "not implemented yet". > > I believe they should be similar to H.264 MVC flags. > > I haven't seen any 2012 release of MPEG-2 !!! But there are > Amendments/corrigendas for mpeg-2:2000 . Do you have any link to buy/download > the 13818-2:2012 (I don't think ISO/IEC published something for mpeg2 in > 2012,,i could be wrong also:))? IIRC, it was scheduled for release in September 2012, thus superseding all amendments. What interests you might be the H.262 Amendment 4, which was consolidated into the "2012" MPEG-2 edition. It's only available to TIES members last I checked, that's probably why. I will try to sort this out internally, or be prepared to expense stuff. :)
(In reply to comment #32) > I like the idea of having a flag mentioning left-view-first, this is symmetric > with TFF for interlaced contents and this clearly expresses how to place > constituent frames 0 and 1, respectively. Though, I am not convinced this is to > be called GstStereoVideoArrangementType. Probably just make it > GstStereoVideoFrameFlags? GstVideoFrame is based on row-video..StereoVideoArrangement types are descriptions of encoded data also..:)..And theoretically stereo content in one frame is not one frame..right? Anyway I am not so instinct..we can change it based on other opinions ...:)
(In reply to comment #36) > (In reply to comment #35) > > (In reply to comment #31) > > > BTW, have you got a chance to download the 2012 MPEG-2 spec with stereo > > > support? You mentioned it in the patch, but marked it as "not implemented yet". > > > I believe they should be similar to H.264 MVC flags. > > > > I haven't seen any 2012 release of MPEG-2 !!! But there are > > Amendments/corrigendas for mpeg-2:2000 . Do you have any link to buy/download > > the 13818-2:2012 (I don't think ISO/IEC published something for mpeg2 in > > 2012,,i could be wrong also:))? > > IIRC, it was scheduled for release in September 2012, thus superseding all > amendments. What interests you might be the H.262 Amendment 4, which was > consolidated into the "2012" MPEG-2 edition. It's only available to TIES > members last I checked, that's probably why. I will try to sort this out > internally, or be prepared to expense stuff. :) Okay...:)..How about the ISO/IEC_13818-2:2000/Amd 1:2001, ISO/IEC 13818-2:2000/Amd 2:2007 and ISO/IEC 13818-2:2000/Amd 3:2010 . Do you know whether these includes the ANNEX-L of 13818-2 (which is the part for stereoscopic content AFAIK)
(In reply to comment #38) > (In reply to comment #36) > > (In reply to comment #35) > > > (In reply to comment #31) > > > > BTW, have you got a chance to download the 2012 MPEG-2 spec with stereo > > > > support? You mentioned it in the patch, but marked it as "not implemented yet". > > > > I believe they should be similar to H.264 MVC flags. > > > > > > I haven't seen any 2012 release of MPEG-2 !!! But there are > > > Amendments/corrigendas for mpeg-2:2000 . Do you have any link to buy/download > > > the 13818-2:2012 (I don't think ISO/IEC published something for mpeg2 in > > > 2012,,i could be wrong also:))? > > > > IIRC, it was scheduled for release in September 2012, thus superseding all > > amendments. What interests you might be the H.262 Amendment 4, which was > > consolidated into the "2012" MPEG-2 edition. It's only available to TIES > > members last I checked, that's probably why. I will try to sort this out > > internally, or be prepared to expense stuff. :) > > Okay...:)..How about the ISO/IEC_13818-2:2000/Amd 1:2001, ISO/IEC > 13818-2:2000/Amd 2:2007 and ISO/IEC 13818-2:2000/Amd 3:2010 . Do you know > whether these includes the ANNEX-L of 13818-2 (which is the part for > stereoscopic content AFAIK) Aha, you are right: http://www.itu.int/rec/T-REC-H.262-201202-T!Amd4 :)
(In reply to comment #33) > For multiview, I think the -multi flag is irrelevant because it's up to the > downstream element to select the view it wants to render. The decoder will > simply produce all decoded frames and the associated view-ids. For that, we > simply need to attach some view-id for example. Okay, So there won't be any case like multiple views packed together in a single frame..Instead it would be always sequential frames..right?... Anyway stereo-video-utils is only to handle the stereo_view .So it make sense to remove the -MULTI flag. Thanks.
I can see another use case for more buffer_flags: As per MPEG-A , there is a type of stream in which which, StereoType==GST_STEREO_VIDEO_SEQUENTIAL_VIEW_TRACKS. For these types of streams there are separate tracks for left_view frames and right_view frames. So I think somehow we need to mark the outgoing buffers as left_view or right_view based on the track_id.
Created attachment 240109 [details] [review] pbutils: Add utility functions to handle Stereo video parameters. This is a bit more cleaned version of previous patch.
Created attachment 240129 [details] [review] qtdemux: Add StereoVideo support for MPEG-A storage format. The MPEG-A format is providing an extension to ISO base media file format to store the StereoScopic content encoded with different codecs like h264 and mpeg4:2.The StereoVideo media information(svmi) atom is providing the stereo video specific parameters. The StereoVideo information for MPEG-A fromat can be supplied through the 'stvi' atom also (ref: ISO/IEC_14496-12, ISO/IEC_23000-11), which is not implemented in this patch. There are few stereo samples available to buy :) http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=54377
I think it is better to move the qtdemux patch to a new bug. So we will have four separate bugs to track s3d development in gstreamer, qtdemux, h264_codec_parser and h264_video_videoparser. But we need to finalize the pbutils API first.
Created attachment 240130 [details] [review] pbutils: Fix the documentation build warning I have noticed a documentation build warning associated with encoding-profile.c. Sorry to add it here, don't want to create a separate bug for that :)
Comment on attachment 240130 [details] [review] pbutils: Fix the documentation build warning Thanks.
I think it is better to add VideoMeta for sterobuffers from BaseDecoder. Any other suggestions? Otherwise it will lead to code duplication in many decoders. Also we need to write a couple of mapping function for different raw video formats to set stride and offset for different views in VideoMeta. Then the difficult task would be to write the composition element to combine the different views.. It seems that Matroska is also supporting the Stereo content storage..
The stereo-video-utils need few more changes + new APIs . And this needs more thought. Also it is better to move the stereo-video stuffs from pbutils to gst-libs/gst/video since we need to parse the stereo info from caps with in decoder to set the VideoMeta. Anyway pbutils is already linking with libgstvideo. So I think it doesn't matter to move the stuffs to gst-libs/gst/video. I will come up with some more patches and proposals later.
Created attachment 241090 [details] [review] video-frame: Add more Stereo specific buffer flags Introducing two new buffer flags GST_VIDEO_BUFFER_FLAG_STEREO and GST_VIDEO_BUFFER_FLAG_LEFT_VIEW to handle the Stereo video buffers. In a stereo-mono mixed stream, the buffer flag _FLAG_STEREO is using to specify whether the frame is stereo or mono. For the storage formats like MPEG-A, demuxers are responsible for setting this buffer flag. If the stereo video stream has different tracks for left-view and right-view, then the buffer flag _FLAG_LEFT_VIEW indicates frame as left-view. If unset, the frame is right-view. Again demuxers are responsible for setting this buffer flag.
Created attachment 241091 [details] [review] video-frame: Add new Stereo specific frame flags. Introducing new VideoFrame flags GST_VIDEO_FRAME_FLAG_STEREO, GST_VIDEO_FRAME_FLAG_H_FLIPPED and GST_VIDEO_FRAME_FLAG_V_FLIPPED. The _FLAG_STEREO indicates that the frame has stereo video content.This flag is to identify the type of frame in stereo-mono mixed stream. The _FLAG_H_FLIPPED indicates that the frame has stereo video content which has flipped horizontally. The _FLAG_V_FLIPPED indicates that the frame has stereo video content which has flipped vertically.
Created attachment 241092 [details] [review] video: Add StereoVideo support, Initial implementation. This includes utility functions to set StereoVideo information on caps which is intended for demuxers usually.
Created attachment 241093 [details] [review] video-stereo: Add APIs to parse stereo video information from caps It provides some structures and APIs to parse StereoVideoInformation from caps which is intended for decoderes which set VideoMeta for
Created attachment 241094 [details] [review] videodecoder: new API to set StereoVideoInformation. This one is a proposal only. Just a prototype: Subclass implementation can use the new api gst_video_decoder_set_stereo_info() to set the stereo video info to the base decoder. Upstream elements might not provide all the stereo details like horizontal_flip and vertical_flip. And these are needs to be parsed and set to the basedeocder by the individual decoders. By default, the base decoder will parse the stereo video information from upstream caps and cache them in GstStereoVideoInfo.
Created attachment 241096 [details] [review] Add videometa for left/right views (prototype) This is also a proposal: prototype only video-stereo: Added a new api gst_stereo_video_buffer_add_video_meta() to set videometa on buffer which has stereo video content. videodecoder: Add videometa to the buffers to handle left/right views just before pushing them downstream.
Created attachment 241097 [details] [review] qtdemux: Add StereoVideo support for MPEG-A storage format. The MPEG-A format is providing an extension to ISO base media file format to store the StereoScopic content encoded with different codecs like h264 and mpeg4:2.The StereoVideo media information(svmi) atom is providing the stereo video specific parameters. The StereoVideo information for MPEG-A fromat can be supplied through the 'stvi' atom also (ref: ISO/IEC_14496-12, ISO/IEC_23000-11), which is not implemented in this patch.
Patches 1,2,3 and 4 are to enable the stereovideo support. Patch 5 and 6 are proposals: and these patches are kind of prototype. Attachment-7 is adding s3d support in qtdemux (only handling mpeg-A, svmi header). Some comments about patch5 and patch6: I have added a method gst_stereo_video_buffer_add_video_meta() to video-stereo.c which is not implemented but just a proposal.Also I tried the side-by-side mapping for I420 with this method .But not added to the patch since the code is ugly. Need some proper way to map all raw formats against all combination of GstStereoVideoFrameType. I don't know how it would be :) Suppose the pipleline is something like this, filesrc ! demuxer ! decoder ! video3dcompositor ! xvimagesink , then the buffer allocated from pool have already associated video-meta which may have different strides. So we should handle this case also. It seems like a large set of combinations are possible , all raw formats should have mapping for all GstStereoVideoFrameType.
Is it possible for someone to review/push these patches ? Other s3d works are based on this.
Review of attachment 241090 [details] [review]: Why don't we put these flags into a GstMeta?
Review of attachment 241091 [details] [review]: ::: gst-libs/gst/video/video-frame.h @@ +39,3 @@ * @GST_VIDEO_FRAME_FLAG_RFF: The video frame has the repeat flag * @GST_VIDEO_FRAME_FLAG_ONEFIELD: The video frame has one field + * @GST_VIDEO_FRAME_FLAG_STEREO: The video frame has stereo content Should it be set for non-mixed-mode stereo all the time? @@ +41,3 @@ + * @GST_VIDEO_FRAME_FLAG_STEREO: The video frame has stereo content + * @GST_VIDEO_FRAME_FLAG_H_FLIPPED: The video frame has flipped horizontally + * @GST_VIDEO_FRAME_FLAG_V_FLIPPED: The video frame has flipped vertically This is different two left-right and bottom-top, right? It is really *flipping* for whatever reason
Review of attachment 241092 [details] [review]: ::: gst-libs/gst/video/video-stereo.c @@ +36,3 @@ +} StereoFrameType; + +static StereoFrameType sf_types[] = { constify @@ +61,3 @@ + +const gchar * +get_stereo_frame_string_from_type (GstStereoVideoFrameType type) frame_type_to_string() maybe? Also everything should be gst_video_* and GstVideo* @@ +85,3 @@ +const gchar * +gst_stereo_video_frame_get_type_string (GstStereoVideoScheme scheme, + guint frame_type) Shouldn't the frame_type be some enum type? @@ +168,3 @@ +const gchar * +gst_stereo_video_frame_get_layout_string (GstStereoVideoScheme scheme, + guint frame_layout) Shouldn't the frame_layout be some enum type? @@ +212,3 @@ +gst_stereo_video_caps_set_stereo_info (GstCaps * caps, + GstStereoVideoScheme scheme, GstVideoChannelLayout channel_layout, + guint frame_type, guint frame_layout) Shouldn't the frame_type and frame_layout be some enum type? ::: gst-libs/gst/video/video-stereo.h @@ +41,3 @@ + GST_STEREO_VIDEO_SCHEME_ISO_IEC_13818_2, + GST_STEREO_VIDEO_SCHEME_UNKNOWN +} GstStereoVideoScheme; GstVideo* and gst_video_* everywhere @@ +53,3 @@ +typedef enum { + GST_VIDEO_CHANNEL_LAYOUT_STEREO, + GST_VIDEO_CHANNEL_LAYOUT_MONO_STEREO, Maybe another value for MONO here? @@ +54,3 @@ + GST_VIDEO_CHANNEL_LAYOUT_STEREO, + GST_VIDEO_CHANNEL_LAYOUT_MONO_STEREO, + GST_VIDEO_CHANNEL_LAYOUT_UNKNOWN And multiview? @@ +76,3 @@ + * packed over-under in a single frame. + * @GST_STEREO_VIDEO_FRAME_TYPE_PACKED_CHECK_BOARD_INTERLEAVED: 2 views are + * packed in a single frame as check-board interleaved (quincunx sampling). Maybe call the enum value quincunx then, in case there are other similar check-board-like patterns in the future @@ +109,3 @@ + GST_STEREO_VIDEO_FRAME_LAYOUT_LEFT_VIEW_FIRST = (1 << 0), + GST_STEREO_VIDEO_FRAME_LAYOUT_HORIZONTALLY_FLIPPED = (1 << 2), + GST_STEREO_VIDEO_FRAME_LAYOUT_VERTICALLY_FLIPPED = (1 << 3), These last two flags are also proposed as buffer flags, and here they're for the caps. Why two places? Can it change frame-by-frame? @@ +110,3 @@ + GST_STEREO_VIDEO_FRAME_LAYOUT_HORIZONTALLY_FLIPPED = (1 << 2), + GST_STEREO_VIDEO_FRAME_LAYOUT_VERTICALLY_FLIPPED = (1 << 3), + GST_STEREO_VIDEO_FRAME_LAYOUT_UNKNOWN = (1 << 4) Is this a flags type or a enum? For flags UNKNOWN should be 0
Review of attachment 241093 [details] [review]: ::: gst-libs/gst/video/video-stereo.c @@ +69,3 @@ }; +static StereoFrameLayout sf_layouts[] = { constify @@ +114,3 @@ + +GstStereoVideoFrameLayout +get_stereo_frame_layout_from_string (const gchar * s) _to_string() variants too maybe ::: gst-libs/gst/video/video-stereo.h @@ +133,3 @@ + /* Caps doesn't have field for flipping flags */ + gboolean h_flipped; + gboolean v_flipped; Padding missing
Review of attachment 241094 [details] [review]: ::: gst-libs/gst/video/gstvideodecoder.h @@ +367,3 @@ +void gst_video_decoder_set_stereo_info (GstVideoDecoder *dec, + GstStereoVideoInfo *sv_info); This should probably be part of the GstVideoCodecState
These patches are too much to handle and reason about in one go, IMO. Let's step back and identify 2 cases: 1) compressed frames coming from a demuxer 2) uncompressed frames coming from a decoder For 2) we should use GstVideoMeta metadata on buffers and in the caps you have the number of views available. Only stereo is currently defined and we assume left is view 0, right is view 1. what's not possible with 2) currently? - we can't do separate left-right frames arriving in the decoder without decoder support. for this the decoder needs to accumulate 2 frames and then place them in the outgoing buffer with GstVideoMeta. Do we add this to the video decoder base class? The decoder needs to know which frame is left and right. - We can't do left/right interleaved every pixel or checkerboard or anything that is not a rectangular left/right part of the decoded image. For this we would need a new pixel format or the decoder needs to transform this to something we support. - We can't do flipping of planes horizontally or vertically. We could add this as flags on the metadata. horizontal flipping could be done with strides. This would also need support in video sinks or other elements. Maybe we would use separate metadata to define the transform on a video frame? - Something else? For 2) to work we need to pass the right info to the decoder because it is usually the demuxer that knows the layout etc of the frames. So we need a way to transport this info, the usual way is to do this with caps I would like some caps that is a simple string describing the layout, similar to the colorimetry caps field. The reason is that we don't want to negotiate N fields. Maybe also similar to how interlaced content works? I don't like the idea of passing this info with metadata, our parsers don't deal with metadata well and I have no idea if the metadata would make it to the decoders. It also sounds too complicated for what it is: - in separate frames (flags on buffers define left/right) - in one frame (which portion is left/right and where is it and how big is it) - mixed (some frames mono, others stereo, flag says what it is) > @GST_STEREO_VIDEO_FRAME_TYPE_SEQUENTIAL_PROGRESSIVE: Frame sequential type. > @GST_STEREO_VIDEO_FRAME_TYPE_SEQUENTIAL_ROW_INTERLEAVED: Sequential row > interleaved. What are these? how are frames transported to the decoder in these methods? I don't like how GstStereoVideoScheme creeps into the API. We should define an API to express 3D video in GStreamer, how to convert to this from any other scheme should be somewhere else and is not related.
I think these cases are handled in the patch sets. Demuxers will set all the parsed informations in caps. Decoders are setting the meta based on the caps from demuxer and internally parsed information (if anything). I did this in BaseDecoder and it provided an API to subclass implementation. All the video content handling should be the duty of video3dpostprocessing element + videosink. The flipping flags are indicating that the content is flipped or not. The content manipulation is again the duty of video3dpostprocessing element. Why do we need to accumulate two buffers in decoders??? The flags GST_STEREO_VIDEO_FRAME_TYPE_SEQUENTIAL_PROGRESSIVE and GST_STEREO_VIDEO_FRAME_TYPE_SEQUENTIAL_ROW_INTERLEAVED are indicating that frames are not packed together. What is the problem here? There is no GstStereoVideoScheme conversion in the patcehs...is there?? ! The GstStereoVideoScheme is basically just a helper for demuxer. The right place of this is in pbutils. But i added this to gst-libs/gst/video to avoid many duplications.
IMHO, it might be good to change the bug description to "Add StereoScopic Video Support" or something like that.
(In reply to comment #58) > Review of attachment 241090 [details] [review]: > > Why don't we put these flags into a GstMeta? These fields are something similar to _TFF, _RFF flags. And these flags needs to be get mapped during gst_map_frame_map_id() similar to other flags. I feels that it is the correct place.
(In reply to comment #59) > Review of attachment 241091 [details] [review]: > > ::: gst-libs/gst/video/video-frame.h > @@ +39,3 @@ > * @GST_VIDEO_FRAME_FLAG_RFF: The video frame has the repeat flag > * @GST_VIDEO_FRAME_FLAG_ONEFIELD: The video frame has one field > + * @GST_VIDEO_FRAME_FLAG_STEREO: The video frame has stereo content > > Should it be set for non-mixed-mode stereo all the time? Yes. Do you like to add a new _FLAG_MONO? > > @@ +41,3 @@ > + * @GST_VIDEO_FRAME_FLAG_STEREO: The video frame has stereo content > + * @GST_VIDEO_FRAME_FLAG_H_FLIPPED: The video frame has flipped horizontally > + * @GST_VIDEO_FRAME_FLAG_V_FLIPPED: The video frame has flipped vertically > > This is different two left-right and bottom-top, right? It is really *flipping* > for whatever reason It indicates that the frame is content is flipped(may be left view or top view, etc..). When the video3dpostproc element map the frame with gst_frame_map_id(), it will check this flag and will do the necessary steps if needed.
(In reply to comment #60) > Review of attachment 241092 [details] [review]: > > frame_type_to_string() maybe? > > Also everything should be gst_video_* and GstVideo* I was thinking that SteroVideo is giving more readability :). Will change this. > > @@ +85,3 @@ > +const gchar * > +gst_stereo_video_frame_get_type_string (GstStereoVideoScheme scheme, > + guint frame_type) > > Shouldn't the frame_type be some enum type? *NO*. Because the frame_type is the integer parsed from the encoded data. The Enum value is unique for gstreamer for each type. So these APIs are for demuxers, which is intended to parse the data and call this api. Because different Schemes have different values for the FrameType. for eg: in GST_VIDEO_STEREO_SCHEME_ISO_IEC_23000_11, 0x00 indicates packed_side_by_side. in GST_VIDEO_STEREO_SCHEME_ISO_IEC_ISO_IEC_14496_10, 0x00 indicates check_bord_interleaving. > > @@ +168,3 @@ > +const gchar * > +gst_stereo_video_frame_get_layout_string (GstStereoVideoScheme scheme, > + guint frame_layout) > > Shouldn't the frame_layout be some enum type? same explanation :) > > @@ +212,3 @@ > +gst_stereo_video_caps_set_stereo_info (GstCaps * caps, > + GstStereoVideoScheme scheme, GstVideoChannelLayout channel_layout, > + guint frame_type, guint frame_layout) > > Shouldn't the frame_type and frame_layout be some enum type? same explanation :) > > ::: gst-libs/gst/video/video-stereo.h > @@ +41,3 @@ > + GST_STEREO_VIDEO_SCHEME_ISO_IEC_13818_2, > + GST_STEREO_VIDEO_SCHEME_UNKNOWN > +} GstStereoVideoScheme; > > GstVideo* and gst_video_* everywhere > > @@ +53,3 @@ > +typedef enum { > + GST_VIDEO_CHANNEL_LAYOUT_STEREO, > + GST_VIDEO_CHANNEL_LAYOUT_MONO_STEREO, > > Maybe another value for MONO here? by default it is mono. May be it can change like this: GST_VIDEO_CHANNEL_LAYOUT_STEREO = 1 GST_VIDEO_CHANNEL_LAYOUT_MONO_STEREO = 2 any objection? > > @@ +54,3 @@ > + GST_VIDEO_CHANNEL_LAYOUT_STEREO, > + GST_VIDEO_CHANNEL_LAYOUT_MONO_STEREO, > + GST_VIDEO_CHANNEL_LAYOUT_UNKNOWN > > And multiview? I had a multiview flag before and then removed since we are only thinking about the stereo video at the moment. But no objection to add it if necessary. > @@ +76,3 @@ > + * packed over-under in a single frame. > + * @GST_STEREO_VIDEO_FRAME_TYPE_PACKED_CHECK_BOARD_INTERLEAVED: 2 views are > + * packed in a single frame as check-board interleaved (quincunx > sampling). > > Maybe call the enum value quincunx then, in case there are other similar > check-board-like patterns in the future Okay. :) > > @@ +109,3 @@ > + GST_STEREO_VIDEO_FRAME_LAYOUT_LEFT_VIEW_FIRST = (1 << 0), > + GST_STEREO_VIDEO_FRAME_LAYOUT_HORIZONTALLY_FLIPPED = (1 << 2), > + GST_STEREO_VIDEO_FRAME_LAYOUT_VERTICALLY_FLIPPED = (1 << 3), > > These last two flags are also proposed as buffer flags, and here they're for > the caps. Why two places? Can it change frame-by-frame? I just added it for completion: I added a comment also in the video-stereo.h.may be you missed it: /* Caps doesn't have field for flipping flags */ I only spec which supports this at the moement is GST_VIDEO_STEREO_SCHEME_ISO_IEC_ISO_IEC_14496_10 and we won't get this info from demuxer . Will get it only when parsing SEI headers. > > @@ +110,3 @@ > + GST_STEREO_VIDEO_FRAME_LAYOUT_HORIZONTALLY_FLIPPED = (1 << 2), > + GST_STEREO_VIDEO_FRAME_LAYOUT_VERTICALLY_FLIPPED = (1 << 3), > + GST_STEREO_VIDEO_FRAME_LAYOUT_UNKNOWN = (1 << 4) > > Is this a flags type or a enum? For flags UNKNOWN should be 0
(In reply to comment #62) > Review of attachment 241094 [details] [review]: > > ::: gst-libs/gst/video/gstvideodecoder.h > @@ +367,3 @@ > > +void gst_video_decoder_set_stereo_info (GstVideoDecoder *dec, > + GstStereoVideoInfo > *sv_info); > > This should probably be part of the GstVideoCodecState gst_video_stereo_info_init(), gst_video_stereo_info_from_caps and gst_video_stereo_buffer_add_video_meta() are only for BaseVideoDecoder. No other element needs to use this and not even for subclass implementation(As far as i know). The only thing which the subclass implementation needs to do is: invoke gst_video_decoder_set_stereo_info() if it parsed any stereo-video info. So I thought it is better to keep a separate structure instead of messing up with the current *Video utility structures existing.
"I would like some caps that is a simple string describing the layout, similar to the colorimetry caps field." So Wim likes use something like colorimetry caps field. This would be good i think. I will look into this. Any other suggestions ? I will re-arrange the APIs based on the feedback.
I am tired with restructuring the code each time :) Okay, so as per the current feedback from Slomo and Wim i would like to propose the following: Only one helper API for demuxers ------------------------------------------------------ /* The parameters frame_type_val and frame_layout_val are the values parsed from the encoded-stream. This API will find out the unique enum for these values based on the input GstVideoScheme. The caps field for stero-info will look like this: stereo-info: "GstVideoChannelLayout :GstVideoStereoFrameType : GstVideoStereoFrameLayout : GstVideoStereoFlip" which means stero-info would be the key and it's value is a concatenated string of enums : g_strdup_printf ("%d:%d:%d:%d",channel_layout, frame_type, frame_layout, frame_flip); ASFIK, the flip information is codec-specific and that we will get only from parser/deocder. so here we can set a default one (no flip) */ gboolean gst_video_stereo_caps_set_stereo_info (GstCaps * caps, GstVideoStereoScheme scheme, GstVideoChannelLayout channel_layout, guint frame_type_val, guint frame_layout_val); These are the main APIs for parsers, decoders and other elements ---------------------------------------------------------- /* Needs to intoduce one more enum GstVideoFrameFlip: (this was not there in the previous implementation) GST_VIDEO_STEREO_FRAME_FLIP_H_LEFT GST_VIDEO_STEREO_FRAME_FLIP_H_RIGHT GST_VIDEO_STEREO_FRAME_FLIP_V_LEFT GST_VIDEO_STEREO_FRAME_FLIP_V_RIGHT */ void gst_video_stereo_info_init (GstVideoStereoInfo * info); /* Return val: sscanf (color, "%d:%d:%d:%d", info->channel_layout, info->frame_type, info->frame_layout, info->frame_flip); */ gboolean gst_video_stereo_info_from_string (GstVideoStereoInfo *info, const gchar *stereo_info); /* Return val: g_strdup_printf ("%d:%d:%d:%d",info->channel_layout, info->frame_type, info->frame_layout, info->frame_flip) */ gchar * gst_video_stereo_info_to_string (GstVideoStereoInfo *info); gboolean gst_video_stereo_info_set_in_caps (GstVideoStereoInfo * info, GstCaps *caps); gboolean gst_video_stereo_info_from_caps (GstVideoStereoInfo * info, GstCaps *caps);
How would that Caps field be negotiated ?
(In reply to comment #72) > How would that Caps field be negotiated ? It is just a field like colorimetry. Did you see any problem with that? We are using videometa for handling left/right view in upstream elements which are coming after the decoder. The reason by which we need the caps fields is that for storage formats like MPEG-A, the stero-video-info is kind of codec independent and decoders/parsers have nothing to do with this.
(In reply to comment #73) > (In reply to comment #72) > > How would that Caps field be negotiated ? > > It is just a field like colorimetry. Did you see any problem with that? > We are using videometa for handling left/right view in upstream elements which errror: downstream elements which are ..... > are coming after the decoder. The reason by which we need the caps fields is > that for storage formats like MPEG-A, the stero-video-info is kind of codec > independent and decoders/parsers have nothing to do with this.
Being in the 3D video field (stereoscopic and autostereoscopic) for some time I would like to add my input on what I work with on a daily (commercial) basis. Typical 3D video sources I work with are: - One file with one video stream, having stereo 3D as either half or full resolution per eye. - One file with multiple video streams (anywhere from 2-9 in my case), either for stereo or autostereo playback. Last I checked the qtdemux element was hard coded to 8 max video streams so we've had to use other containers in some cases. - Two files with one video stream each, e.g. Video_LeftEye.mp4 and Video_RightEye.mp4, where playback of the two files must be synchronized as if it were one video file. This is common with dual-camera (genlocked of course) video setups. - Image sequences (packed and separate files per eye) from a very large/fast SAN (png, tif, dpx etc.) in the case of in-production video (films/TV etc). - Live video feed from a camera(s), typically using Blackmagic DeckLink capture cards (could use decklinksrc element if it had 3D support). - In-memory sources (video editor, IP stream) that must be played out to a 3D display (could use decklinksink element if it had 3D support). Potential input formats of video: Stereo: - Over/under - Left/right - Frame sequential - Horizontal/vertical interlaced (not very important on input side) - Checkerboard (not very important on input side) - separate video streams per eye Autostereo (multiview): - 3x3 matrix packed in one frame. Typical 8 or 9-view displays only have ~1/3rd the resolution of the panel's native format, so there isn't really any quality loss in this video format - 2D+depth where the extra views are generated (interpolated) internally, usually with OpenCV or proprietary algorithms - separate video streams per eye --- Potential output formats (support varies wildly by display): Stereo (half-res per eye): - Over/under (most used on active/passive 3DTVs and single/dual projector systems, recommended for passive since no scaling is necessary in most cases) - Left/right (most used on active/passive 3DTVs and single/dual projector systems, recommended for active) - Horizontal/Row interlaced (native format of passive 3DTVs and some single-view autostereo displays, requires no manual 3D setup on the display) - Vertical/Column interlaced (native format of some single-view autostereo displays) - Checkerboard (alternating half-res left/right eye pixels, mostly used on DLP active 3DTVs) - Anaglyph (mainly for viewing on 2D monitors, many different possible color mixtures: red/blue, red/cyan, red/green, green/magenta, yellow/blue (and then options for 100% color, 50%, other algorithms etc.) Stereo (full-res per eye): - Frame sequential (alternating left/right eye images, usually at 120hz as it's the native format of NVIDIA 3D Vision monitors, also supported by some 3DTVs and projectors) - HDMI 1.4a frame packing, requires physical hardware support in the video sink device/GPU (requires no manual 3D setup on the display, it is detected automatically), supported by Blackmagic DeckLink cards (but not in decklinksrc/sink elements) - "Dual-stream" output (usually HD-SDI only, separate physical cables for left/right eye), used by professional/medical displays and supported by Blackmagic DeckLink cards (but not in decklinksrc/sink elements) Autostereo (multiview): - Completely proprietary and display-dependent. Most use a lenticular lens or parallax barrier with 5, 8 or 9 views and require a GPU shader to interleave all required views into one packed frame using special repeating RGB/BGR patterns. The patterns themselves are also sometimes altered to adjust for optimal viewing distance, diminish cross-talk etc. but all non-optical adjustments affect image quality. It would be sufficient to have a GL video sink element with a custom fragment shader option to support most of these displays, but without a secondary texture for the repeating pattern, GPU usage will skyrocket with all the branching/modulo operations needed. Currently most display vendors I've seen are very reluctant to hand over their pixel pattern or optical parameters to allow 3rd party video player development. --- Most stereo 3D displays cannot auto-detect 3D video and switch modes accordingly (except if HDMI 1.4a is used, or in some cases the display may actually try to use image analysis (some Panasonic models do this)), so the display is usually forced into a certain 3D mode by the user. Sometimes it's required to throw away one of the eyes and only display one of them, in the case of a 2D display or a user who is uncomfortable watching 3D. There should be an option to choose which view (left or right) is used for the 2D image. Also, if the display is in a 3D-only mode, it is not sufficient to simply show one eye in fullscreen 2D, but two copies of the same eye as if you were displaying actual 3D. Example: 3DTV manually forced to left/right mode: 2D images must be displayed as a side-by-side left/left or right/right image. There is usually no way to query the current mode from a 3D display except for some that use a DB9 serial control port with proprietary protocol. Some displays render 3D video in reverse (left eye sees the image meant for right eye etc.) so we must also have an option to "swap eyes". Some displays also have a swap option. Mirror-based dual-camera rigs can require horizontal/vertical flipping of one eye to get a correct image. Some dual-camera systems have problems with correct horizontal/vertical convergence, and existing stereo 3D software and 3DTV's have a convergence adjustment option to help with this if it cannot be corrected optically. This would be a nice option to have but not totally necessary.
bparker, thanks for the extensive comment :) There's also support API added for this to libav, which looks much simpler than everything proposed here. For reference: http://patches.libav.org/patch/44899/
slomo, sure the libav patch is for stereo. But as it covers most things, thats probably fine. Maybe we should just ignore more esoteric multiview variants for now.
Did anything in here ever talk about multiview other than the topic? I think we should do different multiview things later with different APIs... and maybe generalize the concepts in 2.0.
Did we reach any conclusion? What else needed to get this feature in ?
Can we create a wiki-page / design draft file that lists all the formats that support stereoscopic / mutli view video, with links to online resources and a summary of features. I did something like that for the GstToc here: http://cgit.freedesktop.org/gstreamer/gstreamer/tree/docs/design/part-toc.txt#n137 If we get this until the weekend - we can discuss this during the hackfest to make this moving forward.
There is no conclusion yet. It needs careful consideration and review by multiple people to get this in. I may have an opportunity to work on this a bit in the near future.
Some more information to consider, and that would be my minimal requirements actually: 1. It is important to distinguish between view order index and view id. Reason: view order index determines the decoding/output order of the views within an access unit. The view id can be anything that the user/encoder generated. So, we normally always get view indices in increasing order: 0, 1, 2, etc. However, the view id can be anything: 2, 0, 1 for the previous AU, 1, 2, 0 for the next AU, etc. I suggest we have an id that maps to the view_id (or layer_id), and a flag that marks the start of an access unit. We could have a voc (view order count) field instead, but a flag marking the start of the access unit would be just fine IMHO. I am not inspired today, so my current name suggestion for that flag would be: "FFB" = "First Frame in Bundle", or "BFF": "Bundle First Field/Frame". bundle == sequence of views with same PTS. Why not use PTS change to detect new access unit (bundle) boundaries? Well, PTS is not accurate and there could be cases where it is never defined(?). 2. Add a means to query downstream that it will actually care about "multi-views" and "stereo3d" buffers. Reason: we don't need to decode all views or layers if we are not going to use them. e.g. it would be enough to stick to the base view/layer. However, because of (1), this means that we are not always going to display view0, but simply the first decoded view in the access unit, i.e. the one really matching the base stream. The other benefit is that, if downstream doesn't care of extra views, we can avoid a processing step to compose an S3D buffer. Anyway, in vaapi, I currently handle vid/voc (view_id / view_order_index) internally, and real view_id and a flag for the plug-in element layer, stuffed into the vaapi meta for now.
I know we definitely should create a Wiki, but here is another comment before I forget about it. :) The model by which we store multiple view components into a single GstVideoBuffer and map the desired view component by id from it is not going to work. This is implied by memory optimizations from the H.264 MVC decoding process. Indeed, the size of the DPB is not a multiple of the number of views. This means that you are not going to always have all view components from the same access unit available at the same time. The primary reason is this would require much more memory (video surfaces) otherwise. So, I am not really sure we want to keep all of them around beyond what the DPB handling rules require.
I'll be doing some work on stereoscopic / MVC handling soon
I put a new proposal for stereoscopic and MVC signalling and handling into gst-plugins-base git. You can see it at docs/design/part-stereo-multiview-video.markdown or at https://github.com/thaytan/gst-plugins-base/blob/e35cf6321642225092fd9c0342413bc4a9f38f91/docs/design/part-stereo-multiview-video.markdown I'll be working on implementation in the coming days, and things might change slightly but I think the overall design is viable.
(In reply to comment #85) > I put a new proposal for stereoscopic and MVC signalling and handling into > gst-plugins-base git. > > You can see it at docs/design/part-stereo-multiview-video.markdown or at > > https://github.com/thaytan/gst-plugins-base/blob/e35cf6321642225092fd9c0342413bc4a9f38f91/docs/design/part-stereo-multiview-video.markdown > > I'll be working on implementation in the coming days, and things might change > slightly but I think the overall design is viable. Nice! Didn't remember where I have seen the sequential-row-interleaved,it has been more than a year now :) May be added for completeness..
(In reply to comment #86) > (In reply to comment #85) > > I put a new proposal for stereoscopic and MVC signalling and handling into > > gst-plugins-base git. > > > > You can see it at docs/design/part-stereo-multiview-video.markdown or at > > > > https://github.com/thaytan/gst-plugins-base/blob/e35cf6321642225092fd9c0342413bc4a9f38f91/docs/design/part-stereo-multiview-video.markdown > > > > I'll be working on implementation in the coming days, and things might change > > slightly but I think the overall design is viable. > > Nice! > > Didn't remember where I have seen the sequential-row-interleaved,it has been > more than a year now :) May be added for completeness.. I think the "sequential-row-interleaved" was for handling the temporal interleaving mentioned in ITU-T H264, D.2.25, when the frame_packing_arrangement_type is equal to 5.
If so, that's the 'frame by frame' arrangement. There's no row-interleave, just successive left-right-left-right frames.
(In reply to comment #88) > If so, that's the 'frame by frame' arrangement. There's no row-interleave, just > successive left-right-left-right frames. Aha, right. Sorry for the comment 87.. I might seen the row-interleaved somewhere else. Better remove it for now..
Created attachment 304269 [details] [review] video: Add multiview/stereo support Add flags and enums to support multiview signalling in GstVideoInfo and GstVideoFrame, and the caps serialisation and deserialisation. videoencoder: Copy multiview settings from reference input state Add gst_video_multiview_* support API and GstVideoMultiviewMeta meta
Created attachment 304270 [details] [review] playbin: Implement multiview frame-packing overrides Add GstVideoMultiviewFramePacking enum, and the video-multiview-mode and video-multiview-flags properties on playbin. Use a pad probe to replace the multiview information in video caps sent out from uridecodebin. This is a part implementation only - for full correctness, it should also modify caps in caps events, accept-caps and allocation queries.
After a long hiatus, here's some code to look at. So far, this provides for implementing the various frame-packed and stereo modes, and has some placeholders for doing arbitrary MVC - but I think that'll need some more exploration as it's actually implemented. At the moment, I have changes for other modules (libav, ugly, bad) to actually use the new API - this is just the -base pieces.
Created attachment 304854 [details] [review] video: Add multiview/stereo support Add flags and enums to support multiview signalling in GstVideoInfo and GstVideoFrame, and the caps serialisation and deserialisation. videoencoder: Copy multiview settings from reference input state Add gst_video_multiview_* support API and GstVideoMultiviewMeta meta
Created attachment 304855 [details] [review] playbin: Implement multiview frame-packing overrides Add GstVideoMultiviewFramePacking enum, and the video-multiview-mode and video-multiview-flags properties on playbin. Use a pad probe to replace the multiview information in video caps sent out from uridecodebin. This is a part implementation only - for full correctness, it should also modify caps in caps events, accept-caps and allocation queries.
Some minor changes to the implementation. At the moment, I'm still working on what's needed for general (non-stereo) multiview support. The GstVideoMultiviewMeta is only needed for view labelling in that case, so I'm tempted to leave it out for now. For stereo handling, the caps and buffer changes are sufficient for signalling everything needed. For frame packed stereo, I took a different path than some suggested above. Since there's no sensible way to describe all the packed layouts with GstVideoInfo, everything needs to be taught explicitly anyway. So in this design, they are reported as 1 view in the caps / GstVideoInfo and it's up to elements that care to know how to handle them. Existing elements continue to treat them as a single buffer, which is no worse than the status quo.
Created attachment 304872 [details] [review] qtdemux: Add basic support for MPEG-A stereoscopic video The MPEG-A format provides an extension to the ISO base media file format to store stereoscopic content encoded with different codecs like H.264 and MPEG-4:2. The stereo video media information(svmi) atom declares the presence and storage method for the video. Stereo video information for MPEG-A can also be supplied through the 'stvi' atom (ref: ISO/IEC_14496-12, ISO/IEC_23000-11), which is not implemented in this patch. Also missing is support for stereo video encoded as separate video tracks for now. Based on a patch by Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Created attachment 304878 [details] [review] multiview: Add docs and disable the GstVideoMultiviewMeta API for now. Add docstrings and Since markers, and put new API into the docs. Disable the multiview meta for now - it's not needed until MVC support is finalised, and probably needs changing.
Created attachment 304881 [details] [review] qtdemux: Add basic support for MPEG-A stereoscopic video The MPEG-A format provides an extension to the ISO base media file format to store stereoscopic content encoded with different codecs like H.264 and MPEG-4:2. The stereo video media information(svmi) atom declares the presence and storage method for the video. Stereo video information for MPEG-A can also be supplied through the 'stvi' atom (ref: ISO/IEC_14496-12, ISO/IEC_23000-11), which is not implemented in this patch. Also missing is support for stereo video encoded as separate video tracks for now. Based on a patch by Sreerenj Balachandran <sreerenj.balachandran@intel.com>
This looks all pretty great and complete to me! (Apart from the missing signalling for MVC) I'm still not sure about the GstVideoInfo padding, anonymous unions in structures are not a C89 feature, but C11, and gave us trouble in the past (don't remembe details, sorry).
Meh. I think I should just put the padding back how it was - it doesn't break Linux or Windows where I've tested, and seems likely to not break anywhere else either - the struct layout should come out the same on other compilers.
The other option is to put the new fields inside an explicit union with the padding and require all consumers to use the accessor macros.
Created attachment 304917 [details] [review] multiview: Add docs and disable the GstVideoMultiviewMeta API for now. Add docstrings and Since markers, and put new API into the docs. Disable the multiview meta for now - it's not needed until MVC support is finalised, and probably needs changing. Modify the way the padding is consumed in GstVideoInfo, and consequently require code to use the accessor macros.
Created attachment 304919 [details] [review] qtdemux: Add basic support for MPEG-A stereoscopic video The MPEG-A format provides an extension to the ISO base media file format to store stereoscopic content encoded with different codecs like H.264 and MPEG-4:2. The stereo video media information(svmi) atom declares the presence and storage method for the video. Stereo video information for MPEG-A can also be supplied through the 'stvi' atom (ref: ISO/IEC_14496-12, ISO/IEC_23000-11), which is not implemented in this patch. Also missing is support for stereo video encoded as separate video tracks for now. Based on a patch by Sreerenj Balachandran <sreerenj.balachandran@intel.com>
In the absence of any other input, I'll go with this and land these today, with the body of the GL implementation in -bad to follow.
Pushed the -base/-good/-ugly/-libav changes
Created attachment 305121 [details] [review] gl libs: Add glviewconvert helper object Add API for a helper object that can convert between different stereoscopic video representations, and later do filtering of multiple view streams.
Created attachment 305122 [details] [review] glimagesink: Support multiview/stereoscopic video Support video with multiview info in the caps, transform it to mono anaglyph by default, but allow for configuring other output modes and handoff to the app via the draw signal.
Created attachment 305123 [details] [review] gl: Add glviewconvert, glstereomix and glstereosplit elements Conversion elements for transforming multiview/stereoscopic video
Created attachment 305124 [details] [review] 3dvideo: Add simple gtk example stereoscopic video player
These patches implement the view handling/conversions and anaglyph downmix in the GL plugin. They depend on the GstParentBufferMeta bug