Bug 702004 – qtdemux: add support for the avc3 sample entry format of the AVC file format

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 702004 - qtdemux: add support for the avc3 sample entry format of the AVC file format


Summary:	qtdemux: add support for the avc3 sample entry format of the AVC file format


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	1.1.90
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-06-11 11:54 UTC by A Ashley
Modified:	2013-09-18 16:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch to add support for avc3 box (8.57 KB, patch) 2013-06-11 11:54 UTC, A Ashley	reviewed	Details \| Review
Added "dash" and "avc3" fourCC codes to qt_type_find (1.91 KB, patch) 2013-06-11 11:55 UTC, A Ashley	committed	Details \| Review
gst_h264_parse_set_caps fails to parse minimum sized avcC atom (1.58 KB, patch) 2013-06-11 11:55 UTC, A Ashley	needs-work	Details \| Review
Test file for AVC3 structure (1.39 MB, video/mp4) 2013-06-14 10:00 UTC, A Ashley		Details
adds support for avc3 sample format (8.94 KB, patch) 2013-06-18 12:36 UTC, A Ashley	committed	Details \| Review
add support for stream-format=avc3 to h264parser (4.15 KB, patch) 2013-08-05 15:02 UTC, A Ashley	needs-work	Details \| Review
add support for stream-format=avc3 to h264parser (10.08 KB, patch) 2013-08-30 13:10 UTC, A Ashley	committed	Details \| Review
exclude avc3 stream format from AV_CODEC_ID_H264 (1.90 KB, patch) 2013-08-30 13:12 UTC, A Ashley	committed	Details \| Review
Addition to patch #253605 (1.21 KB, patch) 2013-08-30 16:32 UTC, A Ashley	committed	Details \| Review

Description A Ashley 2013-06-11 11:54:04 UTC

Created attachment 246499 [details] [review]
Patch to add support for avc3 box

Amendment 2 of ISO/IEC 14496-15 (AVC file format) is defining a new
structure for fragmented MP4 called "avc3". The principal difference
between AVC1 and AVC3 is the location of the codec initialisation
data (e.g. SPS, PPS). In AVC1 this data is placed in the initial MOOV
box (moov.trak.mdia.minf.stbl.stsd.avc1) but in AVC3 this data goes in
the first sample of every fragment (i.e. the first sample in each mdat
box).  The principal reason for avc3 is to make it easier for client
implementations, because it removes the requirement to insert the
SPS+PPS in to the decoder pipeline every time there is a representation
change.

I have produce three patch files that propose changes to gst-plugins-base, gst-plugins-good and gst-plugins-bad.

The change gst-plugins-base to qt_type_find adds detection of the "avc3" brand and is optional.

The change to gst-plugins-good is that meat of the change which adds support for parsing the avc3 box.

The change to gst-plugins-bad is to fix a bug in the assumption of the minimum size of an avcC box.

I will provide a reference MP4 file that uses the avc3 layout.

I can submit these as separate tickets if that would be preferable.

Comment 1 A Ashley 2013-06-11 11:55:01 UTC

Created attachment 246500 [details] [review]
Added "dash" and "avc3" fourCC codes to qt_type_find

This patch adds detection of the "dash" and "avc3" compatible brands
in qt_type_find.

Comment 2 A Ashley 2013-06-11 11:55:50 UTC

Created attachment 246501 [details] [review]
gst_h264_parse_set_caps fails to parse minimum sized avcC  atom

The minimum size of an avcC atom with no SPS or PPS sets is 7 bytes,
but gst_h264_parse_set_caps assumes a minimum size of 8 bytes. This
is a problem when trying to decode MP4 files encoded using the avc3
file structure

Comment 3 Sebastian Dröge (slomo) 2013-06-11 12:40:54 UTC

Can you provide a sample file?

Comment 4 Sebastian Dröge (slomo) 2013-06-11 12:42:56 UTC

Review of attachment 246500 [details] [review]:

::: gst/typefind/gsttypefindfunctions.c
@@ +2972,2 @@
       tip = GST_TYPE_FIND_MAXIMUM;
       variant = "iso-fragmented";

Maybe the avc3 variant should get a different variant string, to be handled differently? What do you think?

Comment 5 Sebastian Dröge (slomo) 2013-06-11 12:43:55 UTC

Review of attachment 246501 [details] [review]:

Maybe this should get a new stream format, "avc3" instead of "avc"?

Comment 6 Sebastian Dröge (slomo) 2013-06-11 12:49:16 UTC

(In reply to comment #5)
> Review of attachment 246501 [details] [review]:
> 
> Maybe this should get a new stream format, "avc3" instead of "avc"?

Especially because this is to be handled completely different to normal AVC (there's no codec_data). So muxers and decoders probably need special handling or conversion. Ideally h264parse would allow to convert between normal AVC and AVC3.

Comment 7 Sebastian Dröge (slomo) 2013-06-11 12:52:35 UTC

Review of attachment 246499 [details] [review]:

::: gst/isomp4/qtdemux.c
@@ +10026,3 @@
+      caps = gst_caps_new_simple ("video/x-h264",
+          "stream-format", G_TYPE_STRING, "avc",
+          "alignment", G_TYPE_STRING, "au", NULL);

And this should be "avc3" then

Comment 8 A Ashley 2013-06-11 13:47:02 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > Review of attachment 246501 [details] [review] [details]:
> > 
> > Maybe this should get a new stream format, "avc3" instead of "avc"?
> 
> Especially because this is to be handled completely different to normal AVC
> (there's no codec_data). So muxers and decoders probably need special handling
> or conversion. Ideally h264parse would allow to convert between normal AVC and
> AVC3.

There is codec_data, but it is only 7 bytes long. It provides the profile and level information, but not the SPS or PPS.

My reason for keeping with the "avc" label for the stream format was to avoid having a difference in the file format from propagating further down the parse / decode chain.

Comment 9 A Ashley 2013-06-11 13:51:34 UTC

(In reply to comment #3)
> Can you provide a sample file?

Yes I am working on this right now. I hope to shortly be able to upload an MP4 file or provide a URL to a hosted version.

Comment 10 Sebastian Dröge (slomo) 2013-06-11 16:06:01 UTC

(In reply to comment #8)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > Review of attachment 246501 [details] [review] [details] [details]:
> > > 
> > > Maybe this should get a new stream format, "avc3" instead of "avc"?
> > 
> > Especially because this is to be handled completely different to normal AVC
> > (there's no codec_data). So muxers and decoders probably need special handling
> > or conversion. Ideally h264parse would allow to convert between normal AVC and
> > AVC3.
> 
> There is codec_data, but it is only 7 bytes long. It provides the profile and
> level information, but not the SPS or PPS.
> 
> My reason for keeping with the "avc" label for the stream format was to avoid
> having a difference in the file format from propagating further down the parse
> / decode chain.

The problem with that is that not everything handling avc1 will be able to handle avc3. So there needs to be distinction between the two, and conversion support in the parser (like it does for byte-stream<->avc already).

Comment 11 A Ashley 2013-06-14 10:00:48 UTC

Created attachment 246787 [details]
Test file for AVC3 structure

A short sequence from "Big Buck Bunny" (creative commons license) using the AVC3 structure.

Comment 12 A Ashley 2013-06-14 10:17:03 UTC

(In reply to comment #10)
> 
> The problem with that is that not everything handling avc1 will be able to
> handle avc3. So there needs to be distinction between the two, and conversion
> support in the parser (like it does for byte-stream<->avc already).

There is a difference between the boxes included in the MP4 file and the encapsulation of H.264 data within an MDAT box. Both avc1 and avc3 use the same NAL encapsulation for sample data in the MDAT box.

The h.264 parser downstream of qtdemux will see the same NAL structure for both avc1 and avc3. The two differences are:
* At the start of a fragment the first sample will have an SPS and PPS in it
* The codec data field will be smaller

Comment 13 Sebastian Dröge (slomo) 2013-06-14 10:33:39 UTC

Yes that's exactly my point, it is different to what elements nowadays expect when they get stream-format=avc. This can break in subtle ways, especially with hardware decoders. The problem I see is not that there's SPS/PPS inside the stream but that the codec_data is short.

As such there should be a different stream-format in the caps and conversion support in the parser.

Comment 14 Tim-Philipp Müller 2013-06-14 11:14:38 UTC

I haven't looked at the details yet, but my impression is that:

1) if we expose this directly as encapsulated, we would need to add a new stream-format for this

2) IMHO we should make the demuxer convert this internally to AVC, i.e. just make it extract the PPS/SPS from the first chunk and send updated caps if needed.

Comment 15 A Ashley 2013-06-18 12:36:14 UTC

Created attachment 247134 [details] [review]
adds support for avc3 sample format

The consensus from the review of patch 246499 is that a different stream type identifier should be used. This patch replaces 246499 and generates an "avc3" stream format caps event.

Comment 16 Tim-Philipp Müller 2013-06-18 12:50:26 UTC

In my opinion the demuxer should convert this to normal AVC (and change caps if needed). This is easier than making h264parse support yet another variant, and I don't see the benefit of exposing this variant if it's likely to remain mp4-specific in the foreseeable future anyway.

Comment 17 A Ashley 2013-06-18 15:14:41 UTC

(In reply to comment #16)
> In my opinion the demuxer should convert this to normal AVC (and change caps if
> needed). This is easier than making h264parse support yet another variant, and
> I don't see the benefit of exposing this variant if it's likely to remain
> mp4-specific in the foreseeable future anyway.

I think that h264parse is already handling both in-band SPS and out-of-band SPS. It already handles MPEG-2 TS based AVC streams (such as those from antenna or satellite broadcasts) that have SPS at regular intervals and ISOBMFF files that typically only have one SPS in the MOOV. Both are perfectly valid AVC streams, I am not sure we can easily say one is "normal" and one is "abnormal". 

Filtering out the SPS and PPS from a stream and converting it to a caps event is not going to be easy because it would require h.264 parsing of the sample data in qtdemux. The gstreamer code base already has h.264 parsing spread across multiple modules, and it is probably not a good idea to make it any worse.

Comment 18 Tim-Philipp Müller 2013-06-18 15:39:30 UTC

I'm not saying one is normal and one is abnormal, but avc3 is basically a mix between those two modes, which will make everything even more confusing, we barely handle those two modes properly. We have now stream-format=avc which means sps/pps are sent out-of-band and nals have no sync code and everything is framed in access units, and stream-format=byte-stream where sps/pps are in the bitstream and nals have sync codes.

If I understood comment #12 correctly, we don't have to parse the sample data (not much anyway), we can assume the sps/pps are in the first sample. It shouldn't be too difficult, even if it's a bit annoying.

It is true that we already have h264 parsing spread across multiple modules. We will eventually consolidate that when the codecparser library moves to -base. It's not clear to me whether that's really an argument for or against doing things as I suggest though. I think what has to be weighed is the complexity of making h264parse handle yet another mode, namely this mixed mode, and the complexity of parsing out the sps/pps and putting them into caps. The latter is considerably simpler, and easier to maintain in the long run IMHO. If there are opportunities for utility functions, they can be put into pbutils with the codec utils.

Comment 19 Sebastian Dröge (slomo) 2013-06-19 09:09:43 UTC

I don't really know what is better of these two options :) If it's really limited to MP4 I probably prefer Tim's suggestion

Comment 20 A Ashley 2013-06-19 15:18:48 UTC

(In reply to comment #18)
> 
> If I understood comment #12 correctly, we don't have to parse the sample data
> (not much anyway), we can assume the sps/pps are in the first sample. It
> shouldn't be too difficult, even if it's a bit annoying.
> 

I think the steps would be something like:
1. Use a NAL parser to find and remove the SPS and PPS NALs in the sample data and build lists of them
2. Patch codec_data to remove any existing SPS entries, PPS entries, the numberOfSequenceParameterSets and the numberOfPictureParameterSets fields
3. Append (numberOfSequenceParameterSets | 0xE0) to codec_data
4. For each SPS NAL: append sequenceParameterSetLength and SPS NAL to codec_data
5. Append numberOfPictureParameterSets to codec_data
6. For each PPS NAL: append pictureParameterSetLength and PPS NAL to codec_data
7. Push new codec_data downstream
8. profit! :)

Comment 21 Sebastian Dröge (slomo) 2013-07-01 07:58:54 UTC

Tim, do you think having this complexity added to the demuxer makes sense? As (almost) all this is already handled inside h264parse already I think I prefer the original approach of exposing it in the caps.

Comment 22 Tim-Philipp Müller 2013-07-01 09:28:43 UTC

I can live with either solution.

Comment 23 Wim Taymans 2013-07-01 09:31:58 UTC

We should try to leave formats in their most original format for as long as possible. It might not always be the least complex option but a good argument for leaving the format as-is is when you want to remux it.

Comment 24 A Ashley 2013-07-23 07:54:37 UTC

So where are we with this feature?

The consensus appears to be that we don't want to reformat AVC3 in to AVC1 structure.

Apart from a rebase of the patches to head, are there any other changes I should be working on to make them acceptable for incorporation?

Comment 25 Sebastian Dröge (slomo) 2013-07-23 08:04:18 UTC

AVC3 should be kept as different stream-format in the caps, and h264parse should be able to convert from/to AVC3.

Comment 26 Sebastian Dröge (slomo) 2013-07-26 10:33:03 UTC

Comment on attachment 247134 [details] [review]
adds support for avc3 sample format

Will push this (without the indention changes in the headers) after we have some support for it in h264parse. Without it's rather useless

Comment 27 A Ashley 2013-08-05 15:02:44 UTC

Created attachment 250888 [details] [review]
add support for stream-format=avc3 to h264parser

Comment 28 Sebastian Dröge (slomo) 2013-08-19 13:49:20 UTC

Review of attachment 250888 [details] [review]:

This only parses avc3 as I see it, but does not do any conversion at all between avc<->avc3 and avc3<->bytestream

Comment 29 A Ashley 2013-08-22 14:53:02 UTC

My understanding of h264parse is that in case of avc->bytestream h264parse will push the SPS and PPS from codec_data downstream at the start of the stream, at intervals controlled by "config-interval" and when there is a codec_data change. 

In the case of avc3->bytstream h264parse detects that there is already SPS/PPS in the stream and sets h264parse->push_codec to FALSE. Therefore avc3->bytstream is already supported, except for the stream type (hence the patch).

In the case of bystream->avc h264parse will generate codec_data caps from the parsed SPS/PPS in the stream. However it does not remove these SPS/PPS from the stream. This means that the current h264parse is actually doing bytestream->avc3 but with a copy of the SPS/PPS in codec_data. Therefore bytestream->avc3 is the same as bytestream->avc except that the codec_data shouldn't have any SPS/PPS in it. My current set of patches don't provide this filtering of the codec_data when outputting in avc3 mode.

Given the current behaviour of h264parse, conversion between avc<->avc3 is complicated. A full conversion from avc3->avc would remove all SPS/PPS from the stream and only provide them via codec_data. Changing h264parse so that avc never has SPS/PPS in-band might break existing software that makes use of this in-band data.

Perhaps we need a new stream type (let's say "avc1") for which we can make rules about SPS/PPS location?

|------------+-------------+-------------------|
|stream_type | SPS in-band | SPS in codec_data |
|------------+-------------+-------------------|
| avc        | maybe       | always            |
|------------+-------------+-------------------|
| avc1       | never       | always            |
|------------+-------------+-------------------|
| avc3       | always      | never             |
|------------+-------------+-------------------|

Comment 30 Sebastian Dröge (slomo) 2013-08-23 13:13:49 UTC

I think it should be sufficient for the conversions to always put PPS/SPS inband for AVC and only convert the codec_data as required. No decoder should fail on that.

Comment 31 A Ashley 2013-08-23 14:26:13 UTC

On that basis, next week I will start working on code to exclude SPS/PPS from codec_data when the output format is avc3.

Comment 32 Sebastian Dröge (slomo) 2013-08-26 07:14:08 UTC

Great, thanks :)

Comment 33 A Ashley 2013-08-30 13:10:18 UTC

Created attachment 253605 [details] [review]
add support for stream-format=avc3 to h264parser

Replaces patch 250888 with the agreed support for stream-format=avc3.
Also adds a unit test for stream-format=avc3

Comment 34 A Ashley 2013-08-30 13:12:22 UTC

Created attachment 253606 [details] [review]
exclude avc3 stream format from AV_CODEC_ID_H264

As part of the changes to support the "avc3" variant of the ISO-BMFF (see bug #702004) a new stream-format has been created (video/x-h264, stream-format="avc3", alignment="au") that requires changes to gstavcodecmap to exclude this format because avdec_h264 expects the SPS and PPS to be in the codec_data.

Comment 35 A Ashley 2013-08-30 13:26:09 UTC

The BBC have kindly provided some test streams for both AVC3 and AVC1 variants:
http://rdmedia.bbc.co.uk/dash/ondemand/bbb/

Comment 36 A Ashley 2013-08-30 16:32:14 UTC

Created attachment 253632 [details] [review]
Addition to patch #253605

By doing a diff between my local branch and a git am of patch #253605 I have discovered that patch #253605 is missing one AVC3 related change that got lost when squashing gst-indent related changes.

I've spent most of the afternoon trying to produce a clean patch to replace #253605 but I'm admitting defeat. I can start with a clean version of gsth264parse.c that produces no changes when gst-indent is used, apply the change given in this patch and then gst-indent wants to re-indent most of the file - but this only happens if I try to squash them to a single patch. Doing them as two patches doesn't cause a gst-indent cataclysm.

Comment 37 Sebastian Dröge (slomo) 2013-09-04 11:36:17 UTC

Thanks, I've merged all this now :)

For squashing changes you could use "git rebase --interactive", and then follow the instructions there (that's what I did now).

Comment 38 A Ashley 2013-09-05 07:47:50 UTC

Thank you for accepting the patches and for the git tip. I expect by the time I finally get to grips with git someone will have invented a new scm :)