Bug 611157 – video: API to signal stereoscopic and multiview video

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 611157 - video: API to signal stereoscopic and multiview video


Summary:	video: API to signal stereoscopic and multiview video


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-base
Version:	unspecified
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	1.5.90
Assigned To:	Jan Schmidt
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:	750039
Blocks:	711190 719333

Reported:	2010-02-26 09:14 UTC by Stefan Sauer (gstreamer, gtkdoc dev)
Modified:	2015-08-16 13:40 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
pbutils: Add utility functions to handle Stereo video streams. (12.38 KB, patch) 2013-03-27 21:56 UTC, sreerenj	none	Details \| Review
pbutils: Add utility functions to handle Stereo video parameters. (13.16 KB, patch) 2013-03-29 11:58 UTC, sreerenj	none	Details \| Review
qtdemux: Add StereoVideo support for MPEG-A storage format. (6.85 KB, patch) 2013-03-29 16:51 UTC, sreerenj	none	Details \| Review
pbutils: Fix the documentation build warning (852 bytes, patch) 2013-03-29 16:58 UTC, sreerenj	committed	Details \| Review
video-frame: Add more Stereo specific buffer flags (2.40 KB, patch) 2013-04-09 21:13 UTC, sreerenj	reviewed	Details \| Review
video-frame: Add new Stereo specific frame flags. (1.95 KB, patch) 2013-04-09 21:13 UTC, sreerenj	reviewed	Details \| Review
video: Add StereoVideo support, Initial implementation. (16.56 KB, patch) 2013-04-09 21:14 UTC, sreerenj	reviewed	Details \| Review
video-stereo: Add APIs to parse stereo video information from caps (6.36 KB, patch) 2013-04-09 21:15 UTC, sreerenj	reviewed	Details \| Review
videodecoder: new API to set StereoVideoInformation. (3.35 KB, patch) 2013-04-09 21:16 UTC, sreerenj	reviewed	Details \| Review
Add videometa for left/right views (prototype) (3.03 KB, patch) 2013-04-09 21:17 UTC, sreerenj	none	Details \| Review
qtdemux: Add StereoVideo support for MPEG-A storage format. (7.15 KB, patch) 2013-04-09 21:19 UTC, sreerenj	none	Details \| Review
video: Add multiview/stereo support (43.04 KB, patch) 2015-05-29 15:27 UTC, Jan Schmidt	none	Details \| Review
playbin: Implement multiview frame-packing overrides (7.61 KB, patch) 2015-05-29 15:28 UTC, Jan Schmidt	none	Details \| Review
video: Add multiview/stereo support (43.23 KB, patch) 2015-06-09 12:37 UTC, Jan Schmidt	committed	Details \| Review
playbin: Implement multiview frame-packing overrides (7.56 KB, patch) 2015-06-09 12:38 UTC, Jan Schmidt	committed	Details \| Review
qtdemux: Add basic support for MPEG-A stereoscopic video (7.14 KB, patch) 2015-06-09 15:36 UTC, Jan Schmidt	none	Details \| Review
multiview: Add docs and disable the GstVideoMultiviewMeta API for now. (11.53 KB, patch) 2015-06-09 16:39 UTC, Jan Schmidt	none	Details \| Review
qtdemux: Add basic support for MPEG-A stereoscopic video (7.14 KB, patch) 2015-06-09 16:55 UTC, Jan Schmidt	none	Details \| Review
multiview: Add docs and disable the GstVideoMultiviewMeta API for now. (18.97 KB, patch) 2015-06-10 06:52 UTC, Jan Schmidt	rejected	Details \| Review
qtdemux: Add basic support for MPEG-A stereoscopic video (7.14 KB, patch) 2015-06-10 07:01 UTC, Jan Schmidt	committed	Details \| Review
gl libs: Add glviewconvert helper object (77.85 KB, patch) 2015-06-12 08:57 UTC, Jan Schmidt	none	Details \| Review
glimagesink: Support multiview/stereoscopic video (33.14 KB, patch) 2015-06-12 08:57 UTC, Jan Schmidt	none	Details \| Review
gl: Add glviewconvert, glstereomix and glstereosplit elements (73.47 KB, patch) 2015-06-12 08:58 UTC, Jan Schmidt	none	Details \| Review
3dvideo: Add simple gtk example stereoscopic video player (34.80 KB, patch) 2015-06-12 08:58 UTC, Jan Schmidt	none	Details \| Review

Description Stefan Sauer (gstreamer, gtkdoc dev) 2010-02-26 09:14:13 UTC

Videostreams in GStreamer need to be enhanced to support interleaving for stereoscopic video. Right now we have flags for interlaced video only [4]. 3D video can be packges as interlaced or side by side. In the case of side by side, it can be left/right or top/bottom packing. Left/right is popular for images as it also allows parallel or cross-eyed viewing. Top/bottom is more efficient as one can process the video without strides.

I'd like to get comments how we could support this in order to have a agreed opinion for a potential GSoC project.
http://gstreamer.freedesktop.org/wiki/Video3DSupport

Comment 1 Stefan Sauer (gstreamer, gtkdoc dev) 2010-03-28 18:11:26 UTC

I wonder if we rather would want to have the packing (side by side/over under/..) in the caps. Could be done similar to channels and channel-positions for audio.

Comment 2 Stefan Sauer (gstreamer, gtkdoc dev) 2010-04-01 10:36:41 UTC

After more thinking and research I am convinced we need caps extensions:

The caps would tell whether it is a multichannel videostream (channels={1,2}) and in case of 2 how the frames are packed(channel-layout={mono,separate,stereo-interleaved,stereo-over-under,stereo-left-right}). The flags on the buffers would flag wheter it’s the left or right frame for channel-layout=separate (misusing GST_VIDEO_BUFFER_TFF is out of question as e.g. MVC Stereo High profile supports interlaced too).

Comment 3 Stefan Sauer (gstreamer, gtkdoc dev) 2010-09-02 12:38:19 UTC

I think we can drop the buffer flags and for now only support packed video (both views in one buffer):

There is now some discussion about new caps here:
http://gstreamer.freedesktop.org/wiki/NewGstVideo

Comment 4 Stefan Sauer (gstreamer, gtkdoc dev) 2010-09-16 12:24:41 UTC

Wim was suggesting to use 
http://cgit.freedesktop.org/~wtay/gstreamer/tree/?h=buffermeta
for this, but I don't see how this can help here.

We could get this in without the flags as the industry see to agree and the packed layouts.

Comment 5 Benjamin Gaignard 2011-02-09 13:47:10 UTC

How do you detect that an .avi (or .mov or .flv) demuxer video output is a 3D video stream?
Does the 3D tags (or mime-type) are already defined for the most populars muxer/demuxer?

Comment 6 Stefan Sauer (gstreamer, gtkdoc dev) 2011-02-09 20:51:39 UTC

For avi/flv you will need to let the user tell that it is e.g. side by side. Yes, that is stupid, as e.g. for avi one could easily introduce a new chunk for it. Imho there is an extension for mp4/3gp in the work to add that info.

Also if the code is h264 MVC it could be a stereo stream.

Comment 7 Sebastian Dröge (slomo) 2011-05-23 15:09:32 UTC

Any progress on this? Or are we waiting for 0.11 now? ;)

Comment 8 Benjamin Gaignard 2011-05-23 15:37:31 UTC

In ITU spec for H264 (annex D2.22 and annex H) it seems that the stereo video information is coded in SEI, I think that h264parse should be improved to extract this info.


ITU spec: 
http://www.itu.int/rec/T-REC-H.264-201003-I/en

I don't know how other codec like VC1 or VP8 handle 3D video.

Comment 9 Stefan Sauer (gstreamer, gtkdoc dev) 2011-05-23 18:18:10 UTC

I think we should do it in 0.11. Extracting the info e.g. in h264parse and mp4demux would be good preparatory step, but I don't see how to proceed if we miss the caps details and buffer flags.

Comment 10 Gwenole Beauchesne 2013-01-24 06:01:36 UTC

(In reply to comment #8)
> I don't know how other codec like VC1 or VP8 handle 3D video.

Stereoscopic profiles were also added to MPEG-2 last year.

Comment 11 Gwenole Beauchesne 2013-01-24 06:08:52 UTC

(In reply to comment #9)
> I think we should do it in 0.11. Extracting the info e.g. in h264parse and
> mp4demux would be good preparatory step, but I don't see how to proceed if we
> miss the caps details and buffer flags.

The problem is stereo and multiview buffer flags are not expressive enough, though they could serve as a hint that some more descriptive meta data is attached to the buffer. e.g. for side by side, you also have options to flip one view. Besides, the MVC standard is usually about "constituent frames" (0 or 1), the actual meaning of those is described in the SEI message.

(just adding myself to Cc: for now, getting back later).

Comment 12 sreerenj 2013-02-21 10:30:14 UTC

Adding myself to cc.

The following bugs might be interesting for you guys(both are related with MVC):

https://bugzilla.gnome.org/show_bug.cgi?id=694346

https://bugzilla.gnome.org/show_bug.cgi?id=685215

Comment 13 sreerenj 2013-03-19 13:09:19 UTC

I have created a new bug for tracking mvc stream parsing support in
h264_videoparser element.

https://bugzilla.gnome.org/show_bug.cgi?id=696135 .

Any comments regarding Stefan's caps proposal in comment_2 ?

Comment 14 Tim-Philipp Müller 2013-03-19 13:20:31 UTC

This proposal is likely completely outdated, we need to re-think this in 1.0 terms. There is already some provision in the new video API, such as having multiple views/fields, but I think more signalling is needed. We need to see what the common variants/needs are, I don't think we can get away with something overly simple. (Note: I didn't actually look at the proposal again at this point)

Comment 15 sreerenj 2013-03-23 15:30:11 UTC

Okay, I am taking some initiative to continue this work :)

I would like to propose the following,

typedef enum {
  GST_VIDEO_VIEW_TYPE_MONO,
  GST_VIDEO_VIEW_TYPE_STEREO,
  GST_VIDEO_VIEW_TYPE_MULTI
} GstVideoViewType;

typedef enum {
  GST_VIDEO_STEREO_SEQUENTIAL_PROGRESSIVE,
  GST_VIDEO_STEREO_SEQUENTIAL_ROW_INTERLEAVED,
  GST_VIDEO_STEREO_PACKED_ROW_INTERLEAVED,
  GST_VIDEO_STEREO_PACKED_COLUMN_INTERLEAVED,
  GST_VIDEO_STEREO_PACKED_SIDE_BY_SIDE,
  GST_VIDEO_STEREO_PACKED_TOP_BOTTOM,
  GST_VIDEO_STEREO_PACKED_CHECK_BOARD_INTERLEAVED,
} GstVideoStereoType;

typedef enum {
  GST_VIDEO_STEREO_TYPE_LEFT_VIEW_FIRST    = (1 << 0),
  /* or we can keep separate flags for all combination. eg: LEFT_FRAME_AS_LEFT_VIEW, TOP_FRAME_AS_LEFT_VIEW etc: unnecessary but more clarity*/
  GST_VIDEO_STEREO_TYPE_LEFT_VIEW_FLIPPED  = (1 << 2) ,
  GST_VIDEO_STEREO_TYPE_RIGHT_VIEW_FLIPPED = (1 << 3)
} GstVideoStereoContentType;

Comment 16 Sebastian Dröge (slomo) 2013-03-23 17:29:20 UTC

Additional to this something will need to be defined to map the different views of GstVideoFrame (i.e. the id in gst_video_frame_map_id()) to the left/right/whatever frame.

What's the use case of GST_VIDEO_VIEW_TYPE_MULTI?

Comment 17 sreerenj 2013-03-23 21:30:09 UTC

(In reply to comment #16)
> Additional to this something will need to be defined to map the different views
> of GstVideoFrame (i.e. the id in gst_video_frame_map_id()) to the
> left/right/whatever frame.
> 
> What's the use case of GST_VIDEO_VIEW_TYPE_MULTI?

To identify the MultiViewStreams. StereoScopic stream has only 2 views.

Comment 18 Wim Taymans 2013-03-23 23:53:09 UTC

None of these fields or caps are needed, IMO.. See also docs/design/part-mediatype-video-raw.txt

Basically:

 The views property defines the number of views, by default 1, 2 is stereo, >2 is multiview.

 With GstVideoMeta you define the different views. You can use the strides and
offsets to do side-by-side or top/bottom or interleaved etc.. You can access each view (or field of view in interlaced) with the frame id, See also gst_buffer_get_video_meta_id().

 Then you define a new GstMeta to define the meaning of each frame id. By default id 0 is left, id 1 is right (0, 2 for interlaced). The new meta is really only interesting if you do multiview so it is left undefined for now.

The advantage of this is that it is completely backwards compatible. Color conversion, scaling and display will simple only work as before on frame 0 until they all become multiview aware.

What is not possible with this is to interleave each view vertically. My thinking is that this requires (a) new pixel format(s).

Comment 19 sreerenj 2013-03-24 10:02:27 UTC

I would like to share some details about the StereoscopicContents available:
(This is my current understanding.. :))

So far I have seen two types of stereo videos: (for explanation I am taking h264 as stream and mp4 as container format)

type1 : This stereo video content storage is based on MPEG Application format. To support this in gstreamer we doesn’t need to change anything in the parser or decoder. Stereo video would be either packed format (two views in a single frame) or sequential frames (only progressive). It supports 2 types of packed formats: 1) side_by_side 2)vertical_interleaved..Here left and right views are just like regular frames. But packed as single frame. This is *NOT* using any h264-stereoscopic extension or MultiView extension to encode the video. Instead each of the two views are packed and encoded as a single frame (using regular h264 profiles..). So for supporting this (i am using qtdemux as example):
-- We need to parse the GstVideoStereoType in qtdemux.
-- We can also parse the GstVideoStereoContentType in qtdemux.

type2: This type is based on the the Annex-H extension of h264-spec .More optimized stereo encoding based on StereoScopic profiles.Unfortunately I haven’t seen these type video samples in any containers :)..We can parse the GstVideoStereoType from qtdemux. But we will get the GstVideoStereoContentType only during stream parsing (SEI palyloads), either from h264parse or from decoders supporting it... There are raw h264 samples available in web which is encoded with this extension and all of them are sequential_view_streams: means, "first_frame_is_left_view/second_frame_is_right_view" or "first_frame_is_right_view/second_frame_is_left_view" type...As per FramePackingArrangeMent info in h264 SEI payload, if the stream type is GST_VIDEO_STEREO_PACKED_ROW_INTERLEAVED,
GST_VIDEO_STEREO_PACKED_COLUMN_INTERLEAVED,
GST_VIDEO_STEREO_PACKED_SIDE_BY_SIDE,
GST_VIDEO_STEREO_PACKED_TOP_BOTTOM,
or GST_VIDEO_STEREO_PACKED_CHECK_BOARD_INTERLEAVED
we need to do some upsampling for each view based on Figure D-1 to D-11
of H264_spec. Because each view has SAR which is equal to that the SAR of frame.

Initially we can support SIDE-BY-SIDE_packed and sequential streams.

Need more efforts:
For handling GST_VIDEO_STEREO_PACKED_COLUMN_INTERLEAVED, we need some vertical_line_based de-interlacing method/or some elements to do that??... Likewise for GST_VIDEO_STEREO_PACKED_CHECK_BOARD_INTERLEAVED , we need quincunx_sampling.

Comment 20 sreerenj 2013-03-24 20:28:36 UTC

As a reply to Comment_18:

Hi Wim,

Sorry, I am bit confused with the VideoMeta approach..Can you please check my comment_19: type1...
 
For the streams which are stored based on Mpeg_Application_Format (I think most of the available S3D videos are based on this,not sure completely..), we will get the view_specific_information in demuxer level. And there is no view_information which is codec specific.So we won't get any view specific information with in decoder or parser. So I think somehow we need to communicate the view_specific information to downstream elements from demuxer itself..AFAIK it is not possible to add GstVideoMeta to buffers with in demuxers!!

Am I missing something?

Comment 21 Wim Taymans 2013-03-25 07:41:27 UTC

(In reply to comment #20)
> ... we will
> get the view_specific_information in demuxer level. And there is no
> view_information which is codec specific.

Ah ok, yes you need something elsewhere then. Either something in the caps or if it changes for each buffer, some flags or metadata.

Comment 22 Sebastian Dröge (slomo) 2013-03-25 08:40:38 UTC

I'd put it into the caps for encoded streams (e.g. h264) and let the decoder convert that into useful information in the GstVideoMeta (and a new meta for multiview content).

Comment 23 sreerenj 2013-03-26 12:38:21 UTC

(In reply to comment #22)
> I'd put it into the caps for encoded streams (e.g. h264) and let the decoder
> convert that into useful information in the GstVideoMeta (and a new meta for
> multiview content).

Are you preferring to append the stereo_info to the existing codec-data/or to send  it as a new codec-data??? Then it will leads to more works in parser i think :)...

I prefer to add it as separate caps fields based on my initial proposal(comment 15) and then decoders/postprocessing elements can convert them to GstVideoMeta.

I will dig into this a bit more and update later...

Comment 24 Sebastian Dröge (slomo) 2013-03-26 13:26:28 UTC

New caps field, don't mess with the codec_data :)

Comment 25 sreerenj 2013-03-27 14:23:19 UTC

okay .. :)..I will add some utility functions to pbutils.

Comment 26 Tim-Philipp Müller 2013-03-27 18:11:10 UTC

> I will add some utility functions to pbutils.

If they're not codec-specific they should probably go into libgstvideo.

Comment 27 sreerenj 2013-03-27 18:39:05 UTC

(In reply to comment #26)
> > I will add some utility functions to pbutils.
> 
> If they're not codec-specific they should probably go into libgstvideo.

I think pbutils is the right place since the libgstvideo is for raw-video...I will add some patches first...

Comment 28 sreerenj 2013-03-27 21:56:46 UTC

Created attachment 239990 [details] [review]
pbutils: Add utility functions to handle Stereo video streams.

Comment 29 sreerenj 2013-03-27 22:01:18 UTC

We might need a new buffer flag also (GST_BUFFER_FLAG_STEREO) to handle stereo_mono_mixed streams.

Comment 30 Gwenole Beauchesne 2013-03-28 08:42:51 UTC

Hi, I am not convinced by the mono_stereo flag. Either you know that's mono or stereo, or you don't know and you can't really infer anything. :)

Some of the stereo video types are not really masks. e.g. what would top-bottom-half|top-bottom-full mean? Probably use a mask that specifies half resolution instead. But then, you have also a possible issue with what would e.g. interleaved-row|half-resolution mean? In the end, why not make the whole thing plain ids like you have for fpa modes? BTW, you also have a gap in the enumerations.

Comment 31 Gwenole Beauchesne 2013-03-28 08:44:59 UTC

BTW, have you got a chance to download the 2012 MPEG-2 spec with stereo support? You mentioned it in the patch, but marked it as "not implemented yet". I believe they should be similar to H.264 MVC flags.

Comment 32 Gwenole Beauchesne 2013-03-28 08:49:07 UTC

I like the idea of having a flag mentioning left-view-first, this is symmetric with TFF for interlaced contents and this clearly expresses how to place constituent frames 0 and 1, respectively. Though, I am not convinced this is to be called GstStereoVideoArrangementType. Probably just make it GstStereoVideoFrameFlags?

Comment 33 Gwenole Beauchesne 2013-03-28 08:52:16 UTC

For multiview, I think the -multi flag is irrelevant because it's up to the downstream element to select the view it wants to render. The decoder will simply produce all decoded frames and the associated view-ids. For that, we simply need to attach some view-id for example.

Comment 34 sreerenj 2013-03-28 10:40:36 UTC

(In reply to comment #30)
> Hi, I am not convinced by the mono_stereo flag. Either you know that's mono or
> stereo, or you don't know and you can't really infer anything. :)

Until now i have seen that only the MPEG-A is mentioning about stereo_mono mixed stream. The "svmi"(stereo video media information) atom in "stbl"  container will give the information about whether this the stream is stereo/mono, how the constituent frames are arranged if it is a stereo frame etc..We will communicate this info to downstream elements through caps. Once it get negotiated, we only need to know whether the incoming buffer is stereo or mono. Which means mono_stereo flag is enough,,right?

Does it make sense??

> Some of the stereo video types are not really masks. e.g. what would
> top-bottom-half|top-bottom-full mean? Probably use a mask that specifies half
> resolution instead. But then, you have also a possible issue with what would
> e.g. interleaved-row|half-resolution mean? In the end, why not make the whole
> thing plain ids like you have for fpa modes? BTW, you also have a gap in the
> enumerations.

The stereo types with suffix "HALF" indicates that the resolution has been reduced to half either horizontally or vertically in order to pack them to a single frame. Which means we need an up-sampling operation.
I have seen that only MPEG-A's latest Amendment has separate specification type value for half/full arrangement. For other specs it is just top_bottom or side_by_side.And based on the description of up-sampling operations(as per 14496-10),  we can assume that these all are belongs to "-HALF" type.We might remove "top-bottom-half|top-bottom-full" and "side-by-side-half|side-by-side-full" enums.  

I haven't seen any explanation about interleaved-row-{half, full} anywhere. :),,,

FPA mode: you mean ,Frame packing arrangement type in h264? 
Yup, we can remove the "top-bottom-half|top-bottom-full"/"side-by-side-half|side-by-side-full" and make them as plane ids starting from zero.
But unfortunately these id values are not unique for all specs :)..So we can provide our own unique ids and map them based on different Schemes like i did in the stereo-video-utils.c .

Comment 35 sreerenj 2013-03-28 10:49:57 UTC

(In reply to comment #31)
> BTW, have you got a chance to download the 2012 MPEG-2 spec with stereo
> support? You mentioned it in the patch, but marked it as "not implemented yet".
> I believe they should be similar to H.264 MVC flags.

I haven't seen any 2012 release of MPEG-2 !!! But there are Amendments/corrigendas  for mpeg-2:2000 . Do you have any link to buy/download the  13818-2:2012 (I don't think ISO/IEC published something for mpeg2 in 2012,,i could be wrong also:))?

Comment 36 Gwenole Beauchesne 2013-03-28 10:57:43 UTC

(In reply to comment #35)
> (In reply to comment #31)
> > BTW, have you got a chance to download the 2012 MPEG-2 spec with stereo
> > support? You mentioned it in the patch, but marked it as "not implemented yet".
> > I believe they should be similar to H.264 MVC flags.
> 
> I haven't seen any 2012 release of MPEG-2 !!! But there are
> Amendments/corrigendas  for mpeg-2:2000 . Do you have any link to buy/download
> the  13818-2:2012 (I don't think ISO/IEC published something for mpeg2 in
> 2012,,i could be wrong also:))?

IIRC, it was scheduled for release in September 2012, thus superseding all amendments. What interests you might be the H.262 Amendment 4, which was consolidated into the "2012" MPEG-2 edition. It's only available to TIES members last I checked, that's probably why. I will try to sort this out internally, or be prepared to expense stuff. :)

Comment 37 sreerenj 2013-03-28 11:00:26 UTC

(In reply to comment #32)
> I like the idea of having a flag mentioning left-view-first, this is symmetric
> with TFF for interlaced contents and this clearly expresses how to place
> constituent frames 0 and 1, respectively. Though, I am not convinced this is to
> be called GstStereoVideoArrangementType. Probably just make it
> GstStereoVideoFrameFlags?

GstVideoFrame is based on row-video..StereoVideoArrangement types are
descriptions of encoded data also..:)..And theoretically stereo content in one
frame is not one frame..right?
Anyway I am not so instinct..we can change it based on other opinions ...:)

Comment 38 sreerenj 2013-03-28 11:05:13 UTC

(In reply to comment #36)
> (In reply to comment #35)
> > (In reply to comment #31)
> > > BTW, have you got a chance to download the 2012 MPEG-2 spec with stereo
> > > support? You mentioned it in the patch, but marked it as "not implemented yet".
> > > I believe they should be similar to H.264 MVC flags.
> > 
> > I haven't seen any 2012 release of MPEG-2 !!! But there are
> > Amendments/corrigendas  for mpeg-2:2000 . Do you have any link to buy/download
> > the  13818-2:2012 (I don't think ISO/IEC published something for mpeg2 in
> > 2012,,i could be wrong also:))?
> 
> IIRC, it was scheduled for release in September 2012, thus superseding all
> amendments. What interests you might be the H.262 Amendment 4, which was
> consolidated into the "2012" MPEG-2 edition. It's only available to TIES
> members last I checked, that's probably why. I will try to sort this out
> internally, or be prepared to expense stuff. :)

Okay...:)..How about the  ISO/IEC_13818-2:2000/Amd 1:2001, ISO/IEC 13818-2:2000/Amd 2:2007 and ISO/IEC 13818-2:2000/Amd 3:2010 . Do you know whether these includes the ANNEX-L of 13818-2 (which is the part for stereoscopic content AFAIK)

Comment 39 sreerenj 2013-03-28 11:07:12 UTC

(In reply to comment #38)
> (In reply to comment #36)
> > (In reply to comment #35)
> > > (In reply to comment #31)
> > > > BTW, have you got a chance to download the 2012 MPEG-2 spec with stereo
> > > > support? You mentioned it in the patch, but marked it as "not implemented yet".
> > > > I believe they should be similar to H.264 MVC flags.
> > > 
> > > I haven't seen any 2012 release of MPEG-2 !!! But there are
> > > Amendments/corrigendas  for mpeg-2:2000 . Do you have any link to buy/download
> > > the  13818-2:2012 (I don't think ISO/IEC published something for mpeg2 in
> > > 2012,,i could be wrong also:))?
> > 
> > IIRC, it was scheduled for release in September 2012, thus superseding all
> > amendments. What interests you might be the H.262 Amendment 4, which was
> > consolidated into the "2012" MPEG-2 edition. It's only available to TIES
> > members last I checked, that's probably why. I will try to sort this out
> > internally, or be prepared to expense stuff. :)
> 
> Okay...:)..How about the  ISO/IEC_13818-2:2000/Amd 1:2001, ISO/IEC
> 13818-2:2000/Amd 2:2007 and ISO/IEC 13818-2:2000/Amd 3:2010 . Do you know
> whether these includes the ANNEX-L of 13818-2 (which is the part for
> stereoscopic content AFAIK)


Aha, you are right: http://www.itu.int/rec/T-REC-H.262-201202-T!Amd4 :)

Comment 40 sreerenj 2013-03-28 11:11:12 UTC

(In reply to comment #33)
> For multiview, I think the -multi flag is irrelevant because it's up to the
> downstream element to select the view it wants to render. The decoder will
> simply produce all decoded frames and the associated view-ids. For that, we
> simply need to attach some view-id for example.

Okay, So there won't be any case like multiple views packed together in a single frame..Instead it would be always sequential frames..right?...

Anyway stereo-video-utils is only to handle the stereo_view .So it make sense to remove the  -MULTI flag.
Thanks.

Comment 41 sreerenj 2013-03-28 11:57:05 UTC

I can see another use case for more buffer_flags:

As per MPEG-A , there is a type of stream in which which,  StereoType==GST_STEREO_VIDEO_SEQUENTIAL_VIEW_TRACKS. For these types of streams there are separate tracks for left_view frames and right_view frames. 
So I think somehow we need to mark the outgoing buffers as left_view or right_view based on the track_id.

Comment 42 sreerenj 2013-03-29 11:58:38 UTC

Created attachment 240109 [details] [review]
pbutils: Add utility functions to handle Stereo video parameters.

This is a bit more cleaned version of previous patch.

Comment 43 sreerenj 2013-03-29 16:51:05 UTC

Created attachment 240129 [details] [review]
qtdemux: Add StereoVideo support for MPEG-A storage format.

The MPEG-A format is providing an extension to ISO base media file format to store the StereoScopic content encoded with different codecs like h264 and mpeg4:2.The StereoVideo media information(svmi) atom is providing the stereo video specific parameters.
The StereoVideo information for MPEG-A fromat can be supplied through the 'stvi' 
atom also (ref: ISO/IEC_14496-12, ISO/IEC_23000-11), which is not implemented in this patch.

There are few stereo samples available to buy :)
http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=54377

Comment 44 sreerenj 2013-03-29 16:55:49 UTC

I think it is better to move the qtdemux patch to a new bug. So we will have four separate bugs to track s3d development in gstreamer, qtdemux, h264_codec_parser and h264_video_videoparser. 
But we need to finalize the pbutils API first.

Comment 45 sreerenj 2013-03-29 16:58:39 UTC

Created attachment 240130 [details] [review]
pbutils: Fix the documentation build warning

I have noticed a documentation build warning associated with encoding-profile.c.
Sorry to add it here, don't want to create a separate bug for that :)

Comment 46 Stefan Sauer (gstreamer, gtkdoc dev) 2013-03-30 09:48:07 UTC

Comment on attachment 240130 [details] [review]
pbutils: Fix the documentation build warning

Thanks.

Comment 47 sreerenj 2013-04-02 22:07:10 UTC

I think it is better to add VideoMeta for sterobuffers from BaseDecoder. Any other suggestions?
Otherwise it will lead to code duplication in many decoders.
Also we need  to write a couple of mapping function for different raw video formats to set stride and offset for different views in VideoMeta.
Then the difficult task would be to write the composition element to combine the different views..

It seems that Matroska is also supporting the Stereo content storage..

Comment 48 sreerenj 2013-04-08 09:09:27 UTC

The stereo-video-utils need few more changes + new APIs . And this needs more thought.  Also it is better to move the stereo-video stuffs from pbutils to gst-libs/gst/video since we need to parse the stereo info from caps with in decoder to set the VideoMeta.  Anyway pbutils is already linking with libgstvideo. So I think it doesn't matter to move the stuffs to gst-libs/gst/video.
I will come up with some more patches and proposals later.

Comment 49 sreerenj 2013-04-09 21:13:06 UTC

Created attachment 241090 [details] [review]
video-frame: Add more Stereo specific buffer flags

Introducing two new buffer flags GST_VIDEO_BUFFER_FLAG_STEREO
and GST_VIDEO_BUFFER_FLAG_LEFT_VIEW to handle the Stereo video buffers.

In a stereo-mono mixed stream, the buffer flag _FLAG_STEREO is using
to specify whether the frame is stereo or mono. For the storage formats
like MPEG-A, demuxers are responsible for setting this buffer flag.

If the stereo video stream has different tracks for left-view and
right-view, then the buffer flag _FLAG_LEFT_VIEW indicates frame as
left-view. If unset, the frame is right-view. Again demuxers are
responsible for setting this buffer flag.

Comment 50 sreerenj 2013-04-09 21:13:51 UTC

Created attachment 241091 [details] [review]
video-frame: Add new Stereo specific frame flags.

Introducing new VideoFrame flags GST_VIDEO_FRAME_FLAG_STEREO,
GST_VIDEO_FRAME_FLAG_H_FLIPPED and GST_VIDEO_FRAME_FLAG_V_FLIPPED.

The _FLAG_STEREO indicates that the frame has stereo video content.This
flag is to identify the type of frame in stereo-mono mixed stream.

The _FLAG_H_FLIPPED indicates that the frame has stereo video content
which has flipped horizontally.
The _FLAG_V_FLIPPED indicates that the frame has stereo video content
which has flipped vertically.

Comment 51 sreerenj 2013-04-09 21:14:34 UTC

Created attachment 241092 [details] [review]
video: Add StereoVideo support, Initial implementation.

This includes utility functions to set StereoVideo information
on caps which is intended for demuxers usually.

Comment 52 sreerenj 2013-04-09 21:15:17 UTC

Created attachment 241093 [details] [review]
video-stereo: Add APIs to parse stereo video information from caps

It provides some structures and APIs to parse StereoVideoInformation
from caps which is intended for decoderes which set VideoMeta for

Comment 53 sreerenj 2013-04-09 21:16:56 UTC

Created attachment 241094 [details] [review]
videodecoder: new API to set StereoVideoInformation.

This one is a proposal only. Just a prototype:

Subclass implementation can use the new api
gst_video_decoder_set_stereo_info() to set the stereo video info
to the base decoder. Upstream elements might not provide all the
stereo details like horizontal_flip and vertical_flip. And
these are needs to be parsed and set to the basedeocder by the
individual decoders. By default, the base decoder will parse the stereo
video information from upstream caps and cache them in GstStereoVideoInfo.

Comment 54 sreerenj 2013-04-09 21:17:55 UTC

Created attachment 241096 [details] [review]
Add videometa for left/right views (prototype)

This is also a proposal: prototype only

video-stereo: Added a new api gst_stereo_video_buffer_add_video_meta()
to set videometa on buffer which has stereo video content.

videodecoder: Add videometa to the buffers to handle left/right views
just before pushing them downstream.

Comment 55 sreerenj 2013-04-09 21:19:22 UTC

Created attachment 241097 [details] [review]
qtdemux: Add StereoVideo support for MPEG-A storage format.

The MPEG-A format is providing an extension to ISO base media
file format to store the StereoScopic content encoded with different
codecs like h264 and mpeg4:2.The StereoVideo media information(svmi)
atom is providing the stereo video specific parameters.

The StereoVideo information for MPEG-A fromat can be supplied through
the 'stvi' atom also (ref: ISO/IEC_14496-12, ISO/IEC_23000-11), which
is not implemented in this patch.

Comment 56 sreerenj 2013-04-09 21:37:59 UTC

Patches 1,2,3 and 4 are to enable the stereovideo support.

Patch 5 and 6 are proposals: and these patches are kind of prototype.

Attachment-7 is adding s3d support in qtdemux (only handling mpeg-A, svmi header).

Some comments about patch5 and patch6:

I have added a method gst_stereo_video_buffer_add_video_meta() to video-stereo.c which is not implemented but just a proposal.Also I tried the side-by-side mapping for I420 with this method .But not added to the patch since the code is ugly. Need some proper way to map all raw formats against all combination of GstStereoVideoFrameType. I don't know how it would be :) 

Suppose the pipleline is something like this, 
filesrc ! demuxer ! decoder ! video3dcompositor ! xvimagesink , then the buffer allocated from pool have already associated video-meta which may have different strides. So we should handle this case also.

It seems like a large set of combinations are possible ,
all raw formats should have mapping for all GstStereoVideoFrameType.

Comment 57 sreerenj 2013-04-24 11:09:02 UTC

Is it possible for someone to review/push these patches ? Other s3d works are based on this.

Comment 58 Sebastian Dröge (slomo) 2013-04-24 12:45:26 UTC

Review of attachment 241090 [details] [review]:

Why don't we put these flags into a GstMeta?

Comment 59 Sebastian Dröge (slomo) 2013-04-24 12:46:38 UTC

Review of attachment 241091 [details] [review]:

::: gst-libs/gst/video/video-frame.h
@@ +39,3 @@
  * @GST_VIDEO_FRAME_FLAG_RFF: The video frame has the repeat flag
  * @GST_VIDEO_FRAME_FLAG_ONEFIELD: The video frame has one field
+ * @GST_VIDEO_FRAME_FLAG_STEREO: The video frame has stereo content

Should it be set for non-mixed-mode stereo all the time?

@@ +41,3 @@
+ * @GST_VIDEO_FRAME_FLAG_STEREO: The video frame has stereo content
+ * @GST_VIDEO_FRAME_FLAG_H_FLIPPED: The video frame has flipped horizontally
+ * @GST_VIDEO_FRAME_FLAG_V_FLIPPED: The video frame has flipped vertically

This is different two left-right and bottom-top, right? It is really *flipping* for whatever reason

Comment 60 Sebastian Dröge (slomo) 2013-04-24 13:07:14 UTC

Review of attachment 241092 [details] [review]:

::: gst-libs/gst/video/video-stereo.c
@@ +36,3 @@
+} StereoFrameType;
+
+static StereoFrameType sf_types[] = {

constify

@@ +61,3 @@
+
+const gchar *
+get_stereo_frame_string_from_type (GstStereoVideoFrameType type)

frame_type_to_string() maybe?

Also everything should be gst_video_* and GstVideo*

@@ +85,3 @@
+const gchar *
+gst_stereo_video_frame_get_type_string (GstStereoVideoScheme scheme,
+    guint frame_type)

Shouldn't the frame_type be some enum type?

@@ +168,3 @@
+const gchar *
+gst_stereo_video_frame_get_layout_string (GstStereoVideoScheme scheme,
+    guint frame_layout)

Shouldn't the frame_layout be some enum type?

@@ +212,3 @@
+gst_stereo_video_caps_set_stereo_info (GstCaps * caps,
+    GstStereoVideoScheme scheme, GstVideoChannelLayout channel_layout,
+    guint frame_type, guint frame_layout)

Shouldn't the frame_type and frame_layout be some enum type?

::: gst-libs/gst/video/video-stereo.h
@@ +41,3 @@
+  GST_STEREO_VIDEO_SCHEME_ISO_IEC_13818_2,
+  GST_STEREO_VIDEO_SCHEME_UNKNOWN
+} GstStereoVideoScheme;

GstVideo* and gst_video_* everywhere

@@ +53,3 @@
+typedef enum {
+  GST_VIDEO_CHANNEL_LAYOUT_STEREO,
+  GST_VIDEO_CHANNEL_LAYOUT_MONO_STEREO,

Maybe another value for MONO here?

@@ +54,3 @@
+  GST_VIDEO_CHANNEL_LAYOUT_STEREO,
+  GST_VIDEO_CHANNEL_LAYOUT_MONO_STEREO,
+  GST_VIDEO_CHANNEL_LAYOUT_UNKNOWN

And multiview?

@@ +76,3 @@
+ *     packed over-under in a single frame.
+ * @GST_STEREO_VIDEO_FRAME_TYPE_PACKED_CHECK_BOARD_INTERLEAVED: 2 views are
+ *     packed in a single frame as check-board interleaved (quincunx sampling).

Maybe call the enum value quincunx then, in case there are other similar check-board-like patterns in the future

@@ +109,3 @@
+  GST_STEREO_VIDEO_FRAME_LAYOUT_LEFT_VIEW_FIRST      = (1 << 0),
+  GST_STEREO_VIDEO_FRAME_LAYOUT_HORIZONTALLY_FLIPPED = (1 << 2),
+  GST_STEREO_VIDEO_FRAME_LAYOUT_VERTICALLY_FLIPPED   = (1 << 3),

These last two flags are also proposed as buffer flags, and here they're for the caps. Why two places? Can it change frame-by-frame?

@@ +110,3 @@
+  GST_STEREO_VIDEO_FRAME_LAYOUT_HORIZONTALLY_FLIPPED = (1 << 2),
+  GST_STEREO_VIDEO_FRAME_LAYOUT_VERTICALLY_FLIPPED   = (1 << 3),
+  GST_STEREO_VIDEO_FRAME_LAYOUT_UNKNOWN 	     = (1 << 4)

Is this a flags type or a enum? For flags UNKNOWN should be 0

Comment 61 Sebastian Dröge (slomo) 2013-04-24 13:09:46 UTC

Review of attachment 241093 [details] [review]:

::: gst-libs/gst/video/video-stereo.c
@@ +69,3 @@
 };
 
+static StereoFrameLayout sf_layouts[] = {

constify

@@ +114,3 @@
+
+GstStereoVideoFrameLayout
+get_stereo_frame_layout_from_string (const gchar * s)

_to_string() variants too maybe

::: gst-libs/gst/video/video-stereo.h
@@ +133,3 @@
+  /* Caps doesn't have field for flipping flags */
+  gboolean h_flipped;
+  gboolean v_flipped;

Padding missing

Comment 62 Sebastian Dröge (slomo) 2013-04-24 13:12:15 UTC

Review of attachment 241094 [details] [review]:

::: gst-libs/gst/video/gstvideodecoder.h
@@ +367,3 @@
 
+void             gst_video_decoder_set_stereo_info (GstVideoDecoder *dec,
+                                                    GstStereoVideoInfo *sv_info);

This should probably be part of the GstVideoCodecState

Comment 63 Wim Taymans 2013-04-24 13:19:25 UTC

These patches are too much to handle and reason about in one go, IMO. Let's
step back and identify 2 cases:

 1) compressed frames coming from a demuxer
 2) uncompressed frames coming from a decoder

For 2) we should use GstVideoMeta metadata on buffers and in the caps you have
the number of views available. Only stereo is currently defined and we assume
left is view 0, right is view 1.

what's not possible with 2) currently? 

 - we can't do separate left-right frames arriving in the decoder without
   decoder support. for this the decoder needs to accumulate 2 frames and then
   place them in the outgoing buffer with GstVideoMeta. Do we add this to the
   video decoder base class? The decoder needs to know which frame is left
   and right.
 - We can't do left/right interleaved every pixel or checkerboard or anything
   that is not a rectangular left/right part of the decoded image. For this
   we would need a new pixel format or the decoder needs to transform this
   to something we support.
 - We can't do flipping of planes horizontally or vertically. We could add this
   as flags on the metadata. horizontal flipping could be done with strides.
   This would also need support in video sinks or other elements. Maybe we
   would use separate metadata to define the transform on a video frame?
 - Something else?

For 2) to work we need to pass the right info to the decoder because it is
usually the demuxer that knows the layout etc of the frames. So we need a way
to transport this info, the usual way is to do this with caps

I would like some caps that is a simple string describing the layout, similar
to the colorimetry caps field. The reason is that we don't want to negotiate
N fields. Maybe also similar to how interlaced content works?

I don't like the idea of passing this info with metadata, our parsers don't
deal with metadata well and I have no idea if the metadata would make it to
the decoders. It also sounds too complicated for what it is:

 - in separate frames (flags on buffers define left/right)
 - in one frame (which portion is left/right and where is it and how big is it)
 - mixed (some frames mono, others stereo, flag says what it is)
 
> @GST_STEREO_VIDEO_FRAME_TYPE_SEQUENTIAL_PROGRESSIVE: Frame sequential type.
> @GST_STEREO_VIDEO_FRAME_TYPE_SEQUENTIAL_ROW_INTERLEAVED: Sequential row
>   interleaved.

What are these? how are frames transported to the decoder in these methods?

I don't like how GstStereoVideoScheme creeps into the API. We should define an
API to express 3D video in GStreamer, how to convert to this from any other
scheme should be somewhere else and is not related.

Comment 64 sreerenj 2013-04-24 15:31:01 UTC

I think these cases are handled in the patch sets.

Demuxers will set all the parsed informations in caps. Decoders are setting the meta based on the caps from demuxer and internally parsed information (if anything). I did this in BaseDecoder and it provided an API to subclass implementation. All the video content handling should be the duty of video3dpostprocessing element + videosink.

The flipping flags are indicating that the content is flipped or not. The content manipulation is again the duty of video3dpostprocessing element.

Why do we need to accumulate two buffers in decoders??? 

The flags GST_STEREO_VIDEO_FRAME_TYPE_SEQUENTIAL_PROGRESSIVE and GST_STEREO_VIDEO_FRAME_TYPE_SEQUENTIAL_ROW_INTERLEAVED are indicating that frames are not packed together. What is the problem here?

There is no GstStereoVideoScheme conversion in the patcehs...is there?? !

The GstStereoVideoScheme is basically just a helper for demuxer. The right place of this is in pbutils. But i added this to gst-libs/gst/video to avoid many duplications.

Comment 65 sreerenj 2013-04-26 08:12:11 UTC

IMHO, it might be good to change the bug description to "Add StereoScopic Video Support" or something like that.

Comment 66 sreerenj 2013-04-30 07:15:03 UTC

(In reply to comment #58)
> Review of attachment 241090 [details] [review]:
> 
> Why don't we put these flags into a GstMeta?

These fields are something similar to _TFF, _RFF flags. And these flags needs to be get mapped during gst_map_frame_map_id() similar to other flags. I feels that it is the correct place.

Comment 67 sreerenj 2013-04-30 07:19:41 UTC

(In reply to comment #59)
> Review of attachment 241091 [details] [review]:
> 
> ::: gst-libs/gst/video/video-frame.h
> @@ +39,3 @@
>   * @GST_VIDEO_FRAME_FLAG_RFF: The video frame has the repeat flag
>   * @GST_VIDEO_FRAME_FLAG_ONEFIELD: The video frame has one field
> + * @GST_VIDEO_FRAME_FLAG_STEREO: The video frame has stereo content
> 
> Should it be set for non-mixed-mode stereo all the time?

Yes. Do you like to add a new _FLAG_MONO?

> 
> @@ +41,3 @@
> + * @GST_VIDEO_FRAME_FLAG_STEREO: The video frame has stereo content
> + * @GST_VIDEO_FRAME_FLAG_H_FLIPPED: The video frame has flipped horizontally
> + * @GST_VIDEO_FRAME_FLAG_V_FLIPPED: The video frame has flipped vertically
> 
> This is different two left-right and bottom-top, right? It is really *flipping*
> for whatever reason

It indicates that the frame is content is flipped(may be left view or top view, etc..). When the video3dpostproc element map the frame with gst_frame_map_id(), it will check this flag and  will do the necessary steps if needed.

Comment 68 sreerenj 2013-04-30 07:33:49 UTC

(In reply to comment #60)
> Review of attachment 241092 [details] [review]:
> 
> frame_type_to_string() maybe?
> 
> Also everything should be gst_video_* and GstVideo*

I was thinking that SteroVideo is giving more readability :). Will change this.
> 
> @@ +85,3 @@
> +const gchar *
> +gst_stereo_video_frame_get_type_string (GstStereoVideoScheme scheme,
> +    guint frame_type)
> 
> Shouldn't the frame_type be some enum type?

*NO*. Because the frame_type is the integer parsed from the encoded data. The Enum value is unique for gstreamer for each type. So these APIs are for demuxers, which is intended to parse the data and call this api. Because different Schemes have different values for the FrameType.
for eg:
in GST_VIDEO_STEREO_SCHEME_ISO_IEC_23000_11, 0x00 indicates packed_side_by_side.
in  GST_VIDEO_STEREO_SCHEME_ISO_IEC_ISO_IEC_14496_10, 0x00 indicates check_bord_interleaving.
> 
> @@ +168,3 @@
> +const gchar *
> +gst_stereo_video_frame_get_layout_string (GstStereoVideoScheme scheme,
> +    guint frame_layout)
> 
> Shouldn't the frame_layout be some enum type?

same explanation :)
> 
> @@ +212,3 @@
> +gst_stereo_video_caps_set_stereo_info (GstCaps * caps,
> +    GstStereoVideoScheme scheme, GstVideoChannelLayout channel_layout,
> +    guint frame_type, guint frame_layout)
> 
> Shouldn't the frame_type and frame_layout be some enum type?

same explanation :)
> 
> ::: gst-libs/gst/video/video-stereo.h
> @@ +41,3 @@
> +  GST_STEREO_VIDEO_SCHEME_ISO_IEC_13818_2,
> +  GST_STEREO_VIDEO_SCHEME_UNKNOWN
> +} GstStereoVideoScheme;
> 
> GstVideo* and gst_video_* everywhere
> 
> @@ +53,3 @@
> +typedef enum {
> +  GST_VIDEO_CHANNEL_LAYOUT_STEREO,
> +  GST_VIDEO_CHANNEL_LAYOUT_MONO_STEREO,
> 
> Maybe another value for MONO here?

by default it is mono. May be it can change like this:
GST_VIDEO_CHANNEL_LAYOUT_STEREO = 1
GST_VIDEO_CHANNEL_LAYOUT_MONO_STEREO = 2

any objection?

> 
> @@ +54,3 @@
> +  GST_VIDEO_CHANNEL_LAYOUT_STEREO,
> +  GST_VIDEO_CHANNEL_LAYOUT_MONO_STEREO,
> +  GST_VIDEO_CHANNEL_LAYOUT_UNKNOWN
> 
> And multiview?

I had a multiview flag before and then removed since we are only thinking about the stereo video at the moment.
But no objection to add it if necessary.

> @@ +76,3 @@
> + *     packed over-under in a single frame.
> + * @GST_STEREO_VIDEO_FRAME_TYPE_PACKED_CHECK_BOARD_INTERLEAVED: 2 views are
> + *     packed in a single frame as check-board interleaved (quincunx
> sampling).
> 
> Maybe call the enum value quincunx then, in case there are other similar
> check-board-like patterns in the future

Okay. :)

> 
> @@ +109,3 @@
> +  GST_STEREO_VIDEO_FRAME_LAYOUT_LEFT_VIEW_FIRST      = (1 << 0),
> +  GST_STEREO_VIDEO_FRAME_LAYOUT_HORIZONTALLY_FLIPPED = (1 << 2),
> +  GST_STEREO_VIDEO_FRAME_LAYOUT_VERTICALLY_FLIPPED   = (1 << 3),
> 
> These last two flags are also proposed as buffer flags, and here they're for
> the caps. Why two places? Can it change frame-by-frame?

I just added it for completion: I added a comment also in the video-stereo.h.may be you missed it:
/* Caps doesn't have field for flipping flags */

I only spec which supports this at the moement is GST_VIDEO_STEREO_SCHEME_ISO_IEC_ISO_IEC_14496_10 and we won't get this info from demuxer . Will get it only when parsing SEI headers. 

> 
> @@ +110,3 @@
> +  GST_STEREO_VIDEO_FRAME_LAYOUT_HORIZONTALLY_FLIPPED = (1 << 2),
> +  GST_STEREO_VIDEO_FRAME_LAYOUT_VERTICALLY_FLIPPED   = (1 << 3),
> +  GST_STEREO_VIDEO_FRAME_LAYOUT_UNKNOWN          = (1 << 4)
> 
> Is this a flags type or a enum? For flags UNKNOWN should be 0

Comment 69 sreerenj 2013-04-30 07:43:27 UTC

(In reply to comment #62)
> Review of attachment 241094 [details] [review]:
> 
> ::: gst-libs/gst/video/gstvideodecoder.h
> @@ +367,3 @@
> 
> +void             gst_video_decoder_set_stereo_info (GstVideoDecoder *dec,
> +                                                    GstStereoVideoInfo
> *sv_info);
> 
> This should probably be part of the GstVideoCodecState

gst_video_stereo_info_init(), gst_video_stereo_info_from_caps and gst_video_stereo_buffer_add_video_meta() are only for BaseVideoDecoder. No other element needs to use this and not even for subclass implementation(As far as i know). 

The only thing which the subclass implementation needs to do is: invoke gst_video_decoder_set_stereo_info() if it parsed any stereo-video info.

So I thought it is better to keep a separate structure instead of messing up with the current *Video utility structures existing.

Comment 70 sreerenj 2013-04-30 07:46:38 UTC

"I would like some caps that is a simple string describing the layout, similar
to the colorimetry caps field." So Wim likes use something like colorimetry caps field. 
This would be good i think. I will look into this. 

Any other suggestions ?  I will re-arrange the APIs based on the feedback.

Comment 71 sreerenj 2013-05-03 14:31:06 UTC

I am tired with restructuring the code each time :)

Okay, so as per the current feedback from Slomo and Wim i would like to propose the following:

Only one helper API for demuxers
------------------------------------------------------

/* 

The parameters frame_type_val and frame_layout_val are the values parsed from the encoded-stream. This API will find out the unique enum for these values based on the input GstVideoScheme.
 
The caps field for stero-info will look like this: 

stereo-info: "GstVideoChannelLayout :GstVideoStereoFrameType :     GstVideoStereoFrameLayout : GstVideoStereoFlip" 

which means stero-info would be the key and it's value is a concatenated string of enums :
   g_strdup_printf ("%d:%d:%d:%d",channel_layout, frame_type, frame_layout, frame_flip);

ASFIK, the flip information is codec-specific and that we will get only from parser/deocder. so here we can set a default one (no flip)
        
*/
gboolean
gst_video_stereo_caps_set_stereo_info (GstCaps * caps,
                                       GstVideoStereoScheme scheme,
                                       GstVideoChannelLayout channel_layout,
                                       guint frame_type_val,
                                       guint frame_layout_val);


These are the main APIs for parsers, decoders and other elements 
----------------------------------------------------------

/* Needs to intoduce one more enum GstVideoFrameFlip: (this was not there in the previous implementation)
   GST_VIDEO_STEREO_FRAME_FLIP_H_LEFT
   GST_VIDEO_STEREO_FRAME_FLIP_H_RIGHT
   GST_VIDEO_STEREO_FRAME_FLIP_V_LEFT
   GST_VIDEO_STEREO_FRAME_FLIP_V_RIGHT 
*/


void
gst_video_stereo_info_init (GstVideoStereoInfo * info);

/*
Return val: sscanf (color, "%d:%d:%d:%d", info->channel_layout,
info->frame_type, info->frame_layout, info->frame_flip); 
*/
gboolean
gst_video_stereo_info_from_string (GstVideoStereoInfo *info, const gchar *stereo_info);

/* Return val: g_strdup_printf ("%d:%d:%d:%d",info->channel_layout, info->frame_type, info->frame_layout, info->frame_flip) 
*/ 
gchar *
gst_video_stereo_info_to_string (GstVideoStereoInfo *info);

gboolean
gst_video_stereo_info_set_in_caps (GstVideoStereoInfo * info, GstCaps *caps);

gboolean
gst_video_stereo_info_from_caps (GstVideoStereoInfo * info, GstCaps *caps);

Comment 72 Olivier Crête 2013-05-03 14:36:28 UTC

How would that Caps field be negotiated ?

Comment 73 sreerenj 2013-05-03 16:44:46 UTC

(In reply to comment #72)
> How would that Caps field be negotiated ?

It is just a field like colorimetry. Did you see any problem with that?
We are using videometa for handling left/right view in upstream elements which are coming after the decoder. The reason by which we need the caps fields is that for storage formats like MPEG-A, the stero-video-info is kind of codec independent and decoders/parsers have nothing to do with this.

Comment 74 sreerenj 2013-05-03 18:31:44 UTC

(In reply to comment #73)
> (In reply to comment #72)
> > How would that Caps field be negotiated ?
> 
> It is just a field like colorimetry. Did you see any problem with that?
> We are using videometa for handling left/right view in upstream elements which

errror: downstream elements which are .....

> are coming after the decoder. The reason by which we need the caps fields is
> that for storage formats like MPEG-A, the stero-video-info is kind of codec
> independent and decoders/parsers have nothing to do with this.

Comment 75 bparker 2013-11-20 17:49:25 UTC

Being in the 3D video field (stereoscopic and autostereoscopic) for some time I would like to add my input on what I work with on a daily (commercial) basis.

Typical 3D video sources I work with are:

  - One file with one video stream, having stereo 3D as either half or full resolution per eye.
  - One file with multiple video streams (anywhere from 2-9 in my case), either for stereo or autostereo playback. Last I checked the qtdemux element was hard coded to 8 max video streams so we've had to use other containers in some cases.
  - Two files with one video stream each, e.g. Video_LeftEye.mp4 and Video_RightEye.mp4, where playback of the two files must be synchronized as if it were one video file. This is common with dual-camera (genlocked of course) video setups.
  - Image sequences (packed and separate files per eye) from a very large/fast SAN (png, tif, dpx etc.) in the case of in-production video (films/TV etc).
  - Live video feed from a camera(s), typically using Blackmagic DeckLink capture cards (could use decklinksrc element if it had 3D support).
  - In-memory sources (video editor, IP stream) that must be played out to a 3D display (could use decklinksink element if it had 3D support).

  Potential input formats of video:

  Stereo:
  - Over/under
  - Left/right
  - Frame sequential
  - Horizontal/vertical interlaced (not very important on input side)
  - Checkerboard (not very important on input side)
  - separate video streams per eye

  Autostereo (multiview):
  - 3x3 matrix packed in one frame. Typical 8 or 9-view displays only have ~1/3rd the resolution of the panel's native format, so there isn't really any quality loss in this video format
  - 2D+depth where the extra views are generated (interpolated) internally, usually with OpenCV or proprietary algorithms
  - separate video streams per eye

---

  Potential output formats (support varies wildly by display):

  Stereo (half-res per eye):
  - Over/under (most used on active/passive 3DTVs and single/dual projector systems, recommended for passive since no scaling is necessary in most cases)
  - Left/right (most used on active/passive 3DTVs and single/dual projector systems, recommended for active)
  - Horizontal/Row interlaced (native format of passive 3DTVs and some single-view autostereo displays, requires no manual 3D setup on the display)
  - Vertical/Column interlaced (native format of some single-view autostereo displays)
  - Checkerboard (alternating half-res left/right eye pixels, mostly used on DLP active 3DTVs)
  - Anaglyph (mainly for viewing on 2D monitors, many different possible color mixtures: red/blue, red/cyan, red/green, green/magenta, yellow/blue (and then options for 100% color, 50%, other algorithms etc.)

  Stereo (full-res per eye):
  - Frame sequential (alternating left/right eye images, usually at 120hz as it's the native format of NVIDIA 3D Vision monitors, also supported by some 3DTVs and projectors)
  - HDMI 1.4a frame packing, requires physical hardware support in the video sink device/GPU (requires no manual 3D setup on the display, it is detected automatically), supported by Blackmagic DeckLink cards (but not in decklinksrc/sink elements)
  - "Dual-stream" output (usually HD-SDI only, separate physical cables for left/right eye), used by professional/medical displays and supported by Blackmagic DeckLink cards (but not in decklinksrc/sink elements)

  Autostereo (multiview):
  - Completely proprietary and display-dependent. Most use a lenticular lens or parallax barrier with 5, 8 or 9 views and require a GPU shader to interleave all required views into one packed frame using special repeating RGB/BGR patterns. The patterns themselves are also sometimes altered to adjust for optimal viewing distance, diminish cross-talk etc. but all non-optical adjustments affect image quality. It would be sufficient to have a GL video sink element with a custom fragment shader option to support most of these displays, but without a secondary texture for the repeating pattern, GPU usage will skyrocket with all the branching/modulo operations needed. Currently most display vendors I've seen are very reluctant to hand over their pixel pattern or optical parameters to allow 3rd party video player development.

---

Most stereo 3D displays cannot auto-detect 3D video and switch modes accordingly (except if HDMI 1.4a is used, or in some cases the display may actually try to use image analysis (some Panasonic models do this)), so the display is usually forced into a certain 3D mode by the user.

Sometimes it's required to throw away one of the eyes and only display one of them, in the case of a 2D display or a user who is uncomfortable watching 3D. There should be an option to choose which view (left or right) is used for the 2D image.
Also, if the display is in a 3D-only mode, it is not sufficient to simply show one eye in fullscreen 2D, but two copies of the same eye as if you were displaying actual 3D. Example: 3DTV manually forced to left/right mode: 2D images must be displayed as a side-by-side left/left or right/right image. There is usually no way to query the current mode from a 3D display except for some that use a DB9 serial control port with proprietary protocol.

Some displays render 3D video in reverse (left eye sees the image meant for right eye etc.) so we must also have an option to "swap eyes". Some displays also have a swap option.

Mirror-based dual-camera rigs can require horizontal/vertical flipping of one eye to get a correct image.

Some dual-camera systems have problems with correct horizontal/vertical convergence, and existing stereo 3D software and 3DTV's have a convergence adjustment option to help with this if it cannot be corrected optically. This would be a nice option to have but not totally necessary.

Comment 76 Sebastian Dröge (slomo) 2013-12-05 17:13:02 UTC

bparker, thanks for the extensive comment :)

There's also support API added for this to libav, which looks much simpler than everything proposed here. For reference: http://patches.libav.org/patch/44899/

Comment 77 Stefan Sauer (gstreamer, gtkdoc dev) 2013-12-06 13:46:08 UTC

slomo, sure the libav patch is for stereo. But as it covers most things, thats probably fine. Maybe we should just ignore more esoteric multiview variants for now.

Comment 78 Sebastian Dröge (slomo) 2013-12-06 13:54:27 UTC

Did anything in here ever talk about multiview other than the topic? I think we should do different multiview things later with different APIs... and maybe generalize the concepts in 2.0.

Comment 79 sreerenj 2014-03-11 08:32:08 UTC

Did we reach any conclusion?
What else needed to get this feature in ?

Comment 80 Stefan Sauer (gstreamer, gtkdoc dev) 2014-03-11 09:58:38 UTC

Can we create a wiki-page / design draft file that lists all the formats that support stereoscopic / mutli view video, with links to online resources and a summary of features. 
I did something like that for the GstToc here:
http://cgit.freedesktop.org/gstreamer/gstreamer/tree/docs/design/part-toc.txt#n137

If we get this until the weekend - we can discuss this during the hackfest to make this moving forward.

Comment 81 Tim-Philipp Müller 2014-03-11 18:12:58 UTC

There is no conclusion yet. It needs careful consideration and review by multiple people to get this in. I may have an opportunity to work on this a bit in the near future.

Comment 82 Gwenole Beauchesne 2014-04-29 08:10:14 UTC

Some more information to consider, and that would be my minimal requirements actually:

1. It is important to distinguish between view order index and view id. Reason: view order index determines the decoding/output order of the views within an access unit. The view id can be anything that the user/encoder generated. So, we normally always get view indices in increasing order: 0, 1, 2, etc. However, the view id can be anything: 2, 0, 1 for the previous AU, 1, 2, 0 for the next AU, etc.

I suggest we have an id that maps to the view_id (or layer_id), and a flag that marks the start of an access unit. We could have a voc (view order count) field instead, but a flag marking the start of the access unit would be just fine IMHO. I am not inspired today, so my current name suggestion for that flag would be: "FFB" = "First Frame in Bundle", or "BFF": "Bundle First Field/Frame". bundle == sequence of views with same PTS.

Why not use PTS change to detect new access unit (bundle) boundaries? Well, PTS is not accurate and there could be cases where it is never defined(?).

2. Add a means to query downstream that it will actually care about "multi-views" and "stereo3d" buffers. Reason: we don't need to decode all views or layers if we are not going to use them. e.g. it would be enough to stick to the base view/layer.

However, because of (1), this means that we are not always going to display view0, but simply the first decoded view in the access unit, i.e. the one really matching the base stream.

The other benefit is that, if downstream doesn't care of extra views, we can avoid a processing step to compose an S3D buffer.

Anyway, in vaapi, I currently handle vid/voc (view_id / view_order_index) internally, and real view_id and a flag for the plug-in element layer, stuffed into the vaapi meta for now.

Comment 83 Gwenole Beauchesne 2014-06-18 12:58:29 UTC

I know we definitely should create a Wiki, but here is another comment before I forget about it. :)

The model by which we store multiple view components into a single GstVideoBuffer and map the desired view component by id from it is not going to work. This is implied by memory optimizations from the H.264 MVC decoding process. Indeed, the size of the DPB is not a multiple of the number of views. This means that you are not going to always have all view components from the same access unit available at the same time. The primary reason is this would require much more memory (video surfaces) otherwise. So, I am not really sure we want to keep all of them around beyond what the DPB handling rules require.

Comment 84 Jan Schmidt 2014-06-30 11:15:48 UTC

I'll be doing some work on stereoscopic / MVC handling soon

Comment 85 Jan Schmidt 2014-10-01 17:26:52 UTC

I put a new proposal for stereoscopic and MVC signalling and handling into gst-plugins-base git.

You can see it at docs/design/part-stereo-multiview-video.markdown or at 

https://github.com/thaytan/gst-plugins-base/blob/e35cf6321642225092fd9c0342413bc4a9f38f91/docs/design/part-stereo-multiview-video.markdown

I'll be working on implementation in the coming days, and things might change slightly but I think the overall design is viable.

Comment 86 sreerenj 2014-10-01 22:38:23 UTC

(In reply to comment #85)
> I put a new proposal for stereoscopic and MVC signalling and handling into
> gst-plugins-base git.
> 
> You can see it at docs/design/part-stereo-multiview-video.markdown or at 
> 
> https://github.com/thaytan/gst-plugins-base/blob/e35cf6321642225092fd9c0342413bc4a9f38f91/docs/design/part-stereo-multiview-video.markdown
> 
> I'll be working on implementation in the coming days, and things might change
> slightly but I think the overall design is viable.

Nice!

Didn't remember where I have seen the sequential-row-interleaved,it has been more than a year now :) May be added for completeness..

Comment 87 sreerenj 2014-10-02 09:39:33 UTC

(In reply to comment #86)
> (In reply to comment #85)
> > I put a new proposal for stereoscopic and MVC signalling and handling into
> > gst-plugins-base git.
> > 
> > You can see it at docs/design/part-stereo-multiview-video.markdown or at 
> > 
> > https://github.com/thaytan/gst-plugins-base/blob/e35cf6321642225092fd9c0342413bc4a9f38f91/docs/design/part-stereo-multiview-video.markdown
> > 
> > I'll be working on implementation in the coming days, and things might change
> > slightly but I think the overall design is viable.
> 
> Nice!
> 
> Didn't remember where I have seen the sequential-row-interleaved,it has been
> more than a year now :) May be added for completeness..

I think the "sequential-row-interleaved" was for handling the temporal interleaving mentioned in  ITU-T H264, D.2.25, when the frame_packing_arrangement_type is equal to 5.

Comment 88 Jan Schmidt 2014-10-02 13:34:25 UTC

If so, that's the 'frame by frame' arrangement. There's no row-interleave, just successive left-right-left-right frames.

Comment 89 sreerenj 2014-10-02 14:57:48 UTC

(In reply to comment #88)
> If so, that's the 'frame by frame' arrangement. There's no row-interleave, just
> successive left-right-left-right frames.

Aha, right. Sorry for the comment 87.. I might seen the row-interleaved somewhere else. Better remove it for now..

Comment 90 Jan Schmidt 2015-05-29 15:27:33 UTC

Created attachment 304269 [details] [review]
video: Add multiview/stereo support

Add flags and enums to support multiview signalling in
GstVideoInfo and GstVideoFrame, and the caps serialisation and
deserialisation.

videoencoder: Copy multiview settings from reference input state

Add gst_video_multiview_* support API and GstVideoMultiviewMeta meta

Comment 91 Jan Schmidt 2015-05-29 15:28:13 UTC

Created attachment 304270 [details] [review]
playbin: Implement multiview frame-packing overrides

Add GstVideoMultiviewFramePacking enum, and the
video-multiview-mode and video-multiview-flags
properties on playbin.

Use a pad probe to replace the multiview information in
video caps sent out from uridecodebin.

This is a part implementation only - for full
correctness, it should also modify caps in caps events,
accept-caps and allocation queries.

Comment 92 Jan Schmidt 2015-05-29 15:30:46 UTC

After a long hiatus, here's some code to look at. So far, this provides for implementing the various frame-packed and stereo modes, and has some placeholders for doing arbitrary MVC - but I think that'll need some more exploration as it's actually implemented.

At the moment, I have changes for other modules (libav, ugly, bad) to actually use the new API - this is just the -base pieces.

Comment 93 Jan Schmidt 2015-06-09 12:37:54 UTC

Created attachment 304854 [details] [review]
video: Add multiview/stereo support

Add flags and enums to support multiview signalling in
GstVideoInfo and GstVideoFrame, and the caps serialisation and
deserialisation.

videoencoder: Copy multiview settings from reference input state

Add gst_video_multiview_* support API and GstVideoMultiviewMeta meta

Comment 94 Jan Schmidt 2015-06-09 12:38:09 UTC

Created attachment 304855 [details] [review]
playbin: Implement multiview frame-packing overrides

Add GstVideoMultiviewFramePacking enum, and the
video-multiview-mode and video-multiview-flags
properties on playbin.

Use a pad probe to replace the multiview information in
video caps sent out from uridecodebin.

This is a part implementation only - for full
correctness, it should also modify caps in caps events,
accept-caps and allocation queries.

Comment 95 Jan Schmidt 2015-06-09 12:43:06 UTC

Some minor changes to the implementation. At the moment, I'm still working on what's needed for general (non-stereo) multiview support. The GstVideoMultiviewMeta is only needed for view labelling in that case, so I'm tempted to leave it out for now. For stereo handling, the caps and buffer changes are sufficient for signalling everything needed.

For frame packed stereo, I took a different path than some suggested above. Since there's no sensible way to describe all the packed layouts with GstVideoInfo, everything needs to be taught explicitly anyway. So in this design, they are reported as 1 view in the caps / GstVideoInfo and it's up to elements that care to know how to handle them. Existing elements continue to treat them as a single buffer, which is no worse than the status quo.

Comment 96 Jan Schmidt 2015-06-09 15:36:03 UTC

Created attachment 304872 [details] [review]
qtdemux: Add basic support for MPEG-A stereoscopic video

The MPEG-A format provides an extension to the ISO base media
file format to store stereoscopic content encoded with different
codecs like H.264 and MPEG-4:2. The stereo video media information(svmi)
atom declares the presence and storage method for the video.

Stereo video information for MPEG-A can also be supplied through
the 'stvi' atom (ref: ISO/IEC_14496-12, ISO/IEC_23000-11), which
is not implemented in this patch.

Also missing is support for stereo video encoded as separate video tracks
for now.

Based on a patch by Sreerenj Balachandran <sreerenj.balachandran@intel.com>

Comment 97 Jan Schmidt 2015-06-09 16:39:11 UTC

Created attachment 304878 [details] [review]
multiview: Add docs and disable the GstVideoMultiviewMeta API for now.

Add docstrings and Since markers, and put new API into the docs.

Disable the multiview meta for now - it's not needed until MVC
support is finalised, and probably needs changing.

Comment 98 Jan Schmidt 2015-06-09 16:55:20 UTC

Created attachment 304881 [details] [review]
qtdemux: Add basic support for MPEG-A stereoscopic video

The MPEG-A format provides an extension to the ISO base media
file format to store stereoscopic content encoded with different
codecs like H.264 and MPEG-4:2. The stereo video media information(svmi)
atom declares the presence and storage method for the video.

Stereo video information for MPEG-A can also be supplied through
the 'stvi' atom (ref: ISO/IEC_14496-12, ISO/IEC_23000-11), which
is not implemented in this patch.

Also missing is support for stereo video encoded as separate video tracks
for now.

Based on a patch by Sreerenj Balachandran <sreerenj.balachandran@intel.com>

Comment 99 Tim-Philipp Müller 2015-06-09 18:48:18 UTC

This looks all pretty great and complete to me! (Apart from the missing signalling for MVC)

I'm still not sure about the GstVideoInfo padding, anonymous unions in structures are not a C89 feature, but C11, and gave us trouble in the past (don't remembe details, sorry).

Comment 100 Jan Schmidt 2015-06-10 04:53:55 UTC

Meh. I think I should just put the padding back how it was - it doesn't break Linux or Windows where I've tested, and seems likely to not break anywhere else either - the struct layout should come out the same on other compilers.

Comment 101 Jan Schmidt 2015-06-10 05:08:41 UTC

The other option is to put the new fields inside an explicit union with the padding and require all consumers to use the accessor macros.

Comment 102 Jan Schmidt 2015-06-10 06:52:52 UTC

Created attachment 304917 [details] [review]
multiview: Add docs and disable the GstVideoMultiviewMeta API for now.

Add docstrings and Since markers, and put new API into the docs.

Disable the multiview meta for now - it's not needed until MVC
support is finalised, and probably needs changing.

Modify the way the padding is consumed in GstVideoInfo, and
consequently require code to use the accessor macros.

Comment 103 Jan Schmidt 2015-06-10 07:01:31 UTC

Created attachment 304919 [details] [review]
qtdemux: Add basic support for MPEG-A stereoscopic video

The MPEG-A format provides an extension to the ISO base media
file format to store stereoscopic content encoded with different
codecs like H.264 and MPEG-4:2. The stereo video media information(svmi)
atom declares the presence and storage method for the video.

Stereo video information for MPEG-A can also be supplied through
the 'stvi' atom (ref: ISO/IEC_14496-12, ISO/IEC_23000-11), which
is not implemented in this patch.

Also missing is support for stereo video encoded as separate video tracks
for now.

Based on a patch by Sreerenj Balachandran <sreerenj.balachandran@intel.com>

Comment 104 Jan Schmidt 2015-06-11 00:30:49 UTC

In the absence of any other input, I'll go with this and land these today, with the body of the GL implementation in -bad to follow.

Comment 105 Jan Schmidt 2015-06-11 02:32:39 UTC

Pushed the -base/-good/-ugly/-libav changes

Comment 106 Jan Schmidt 2015-06-12 08:57:34 UTC

Created attachment 305121 [details] [review]
gl libs: Add glviewconvert helper object

Add API for a helper object that can convert between different
stereoscopic video representations, and later do filtering
of multiple view streams.

Comment 107 Jan Schmidt 2015-06-12 08:57:55 UTC

Created attachment 305122 [details] [review]
glimagesink: Support multiview/stereoscopic video

Support video with multiview info in the caps, transform
it to mono anaglyph by default, but allow for configuring
other output modes and handoff to the app via
the draw signal.

Comment 108 Jan Schmidt 2015-06-12 08:58:10 UTC

Created attachment 305123 [details] [review]
gl: Add glviewconvert, glstereomix and glstereosplit elements

Conversion elements for transforming multiview/stereoscopic video

Comment 109 Jan Schmidt 2015-06-12 08:58:26 UTC

Created attachment 305124 [details] [review]
3dvideo: Add simple gtk example stereoscopic video player

Comment 110 Jan Schmidt 2015-06-12 09:01:34 UTC

These patches implement the view handling/conversions and anaglyph downmix in the GL plugin. They depend on the GstParentBufferMeta bug