GNOME Bugzilla – Bug 702004
qtdemux: add support for the avc3 sample entry format of the AVC file format
Last modified: 2013-09-18 16:47:51 UTC
Created attachment 246499 [details] [review]
Patch to add support for avc3 box
Amendment 2 of ISO/IEC 14496-15 (AVC file format) is defining a new
structure for fragmented MP4 called "avc3". The principal difference
between AVC1 and AVC3 is the location of the codec initialisation
data (e.g. SPS, PPS). In AVC1 this data is placed in the initial MOOV
box (moov.trak.mdia.minf.stbl.stsd.avc1) but in AVC3 this data goes in
the first sample of every fragment (i.e. the first sample in each mdat
box). The principal reason for avc3 is to make it easier for client
implementations, because it removes the requirement to insert the
SPS+PPS in to the decoder pipeline every time there is a representation
I have produce three patch files that propose changes to gst-plugins-base, gst-plugins-good and gst-plugins-bad.
The change gst-plugins-base to qt_type_find adds detection of the "avc3" brand and is optional.
The change to gst-plugins-good is that meat of the change which adds support for parsing the avc3 box.
The change to gst-plugins-bad is to fix a bug in the assumption of the minimum size of an avcC box.
I will provide a reference MP4 file that uses the avc3 layout.
I can submit these as separate tickets if that would be preferable.
Created attachment 246500 [details] [review]
Added "dash" and "avc3" fourCC codes to qt_type_find
This patch adds detection of the "dash" and "avc3" compatible brands
Created attachment 246501 [details] [review]
gst_h264_parse_set_caps fails to parse minimum sized avcC atom
The minimum size of an avcC atom with no SPS or PPS sets is 7 bytes,
but gst_h264_parse_set_caps assumes a minimum size of 8 bytes. This
is a problem when trying to decode MP4 files encoded using the avc3
Can you provide a sample file?
Review of attachment 246500 [details] [review]:
@@ +2972,2 @@
tip = GST_TYPE_FIND_MAXIMUM;
variant = "iso-fragmented";
Maybe the avc3 variant should get a different variant string, to be handled differently? What do you think?
Review of attachment 246501 [details] [review]:
Maybe this should get a new stream format, "avc3" instead of "avc"?
(In reply to comment #5)
> Review of attachment 246501 [details] [review]:
> Maybe this should get a new stream format, "avc3" instead of "avc"?
Especially because this is to be handled completely different to normal AVC (there's no codec_data). So muxers and decoders probably need special handling or conversion. Ideally h264parse would allow to convert between normal AVC and AVC3.
Review of attachment 246499 [details] [review]:
@@ +10026,3 @@
+ caps = gst_caps_new_simple ("video/x-h264",
+ "stream-format", G_TYPE_STRING, "avc",
+ "alignment", G_TYPE_STRING, "au", NULL);
And this should be "avc3" then
(In reply to comment #6)
> (In reply to comment #5)
> > Review of attachment 246501 [details] [review] [details]:
> > Maybe this should get a new stream format, "avc3" instead of "avc"?
> Especially because this is to be handled completely different to normal AVC
> (there's no codec_data). So muxers and decoders probably need special handling
> or conversion. Ideally h264parse would allow to convert between normal AVC and
There is codec_data, but it is only 7 bytes long. It provides the profile and level information, but not the SPS or PPS.
My reason for keeping with the "avc" label for the stream format was to avoid having a difference in the file format from propagating further down the parse / decode chain.
(In reply to comment #3)
> Can you provide a sample file?
Yes I am working on this right now. I hope to shortly be able to upload an MP4 file or provide a URL to a hosted version.
(In reply to comment #8)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > Review of attachment 246501 [details] [review] [details] [details]:
> > >
> > > Maybe this should get a new stream format, "avc3" instead of "avc"?
> > Especially because this is to be handled completely different to normal AVC
> > (there's no codec_data). So muxers and decoders probably need special handling
> > or conversion. Ideally h264parse would allow to convert between normal AVC and
> > AVC3.
> There is codec_data, but it is only 7 bytes long. It provides the profile and
> level information, but not the SPS or PPS.
> My reason for keeping with the "avc" label for the stream format was to avoid
> having a difference in the file format from propagating further down the parse
> / decode chain.
The problem with that is that not everything handling avc1 will be able to handle avc3. So there needs to be distinction between the two, and conversion support in the parser (like it does for byte-stream<->avc already).
Created attachment 246787 [details]
Test file for AVC3 structure
A short sequence from "Big Buck Bunny" (creative commons license) using the AVC3 structure.
(In reply to comment #10)
> The problem with that is that not everything handling avc1 will be able to
> handle avc3. So there needs to be distinction between the two, and conversion
> support in the parser (like it does for byte-stream<->avc already).
There is a difference between the boxes included in the MP4 file and the encapsulation of H.264 data within an MDAT box. Both avc1 and avc3 use the same NAL encapsulation for sample data in the MDAT box.
The h.264 parser downstream of qtdemux will see the same NAL structure for both avc1 and avc3. The two differences are:
* At the start of a fragment the first sample will have an SPS and PPS in it
* The codec data field will be smaller
Yes that's exactly my point, it is different to what elements nowadays expect when they get stream-format=avc. This can break in subtle ways, especially with hardware decoders. The problem I see is not that there's SPS/PPS inside the stream but that the codec_data is short.
As such there should be a different stream-format in the caps and conversion support in the parser.
I haven't looked at the details yet, but my impression is that:
1) if we expose this directly as encapsulated, we would need to add a new stream-format for this
2) IMHO we should make the demuxer convert this internally to AVC, i.e. just make it extract the PPS/SPS from the first chunk and send updated caps if needed.
Created attachment 247134 [details] [review]
adds support for avc3 sample format
The consensus from the review of patch 246499 is that a different stream type identifier should be used. This patch replaces 246499 and generates an "avc3" stream format caps event.
In my opinion the demuxer should convert this to normal AVC (and change caps if needed). This is easier than making h264parse support yet another variant, and I don't see the benefit of exposing this variant if it's likely to remain mp4-specific in the foreseeable future anyway.
(In reply to comment #16)
> In my opinion the demuxer should convert this to normal AVC (and change caps if
> needed). This is easier than making h264parse support yet another variant, and
> I don't see the benefit of exposing this variant if it's likely to remain
> mp4-specific in the foreseeable future anyway.
I think that h264parse is already handling both in-band SPS and out-of-band SPS. It already handles MPEG-2 TS based AVC streams (such as those from antenna or satellite broadcasts) that have SPS at regular intervals and ISOBMFF files that typically only have one SPS in the MOOV. Both are perfectly valid AVC streams, I am not sure we can easily say one is "normal" and one is "abnormal".
Filtering out the SPS and PPS from a stream and converting it to a caps event is not going to be easy because it would require h.264 parsing of the sample data in qtdemux. The gstreamer code base already has h.264 parsing spread across multiple modules, and it is probably not a good idea to make it any worse.
I'm not saying one is normal and one is abnormal, but avc3 is basically a mix between those two modes, which will make everything even more confusing, we barely handle those two modes properly. We have now stream-format=avc which means sps/pps are sent out-of-band and nals have no sync code and everything is framed in access units, and stream-format=byte-stream where sps/pps are in the bitstream and nals have sync codes.
If I understood comment #12 correctly, we don't have to parse the sample data (not much anyway), we can assume the sps/pps are in the first sample. It shouldn't be too difficult, even if it's a bit annoying.
It is true that we already have h264 parsing spread across multiple modules. We will eventually consolidate that when the codecparser library moves to -base. It's not clear to me whether that's really an argument for or against doing things as I suggest though. I think what has to be weighed is the complexity of making h264parse handle yet another mode, namely this mixed mode, and the complexity of parsing out the sps/pps and putting them into caps. The latter is considerably simpler, and easier to maintain in the long run IMHO. If there are opportunities for utility functions, they can be put into pbutils with the codec utils.
I don't really know what is better of these two options :) If it's really limited to MP4 I probably prefer Tim's suggestion
(In reply to comment #18)
> If I understood comment #12 correctly, we don't have to parse the sample data
> (not much anyway), we can assume the sps/pps are in the first sample. It
> shouldn't be too difficult, even if it's a bit annoying.
I think the steps would be something like:
1. Use a NAL parser to find and remove the SPS and PPS NALs in the sample data and build lists of them
2. Patch codec_data to remove any existing SPS entries, PPS entries, the numberOfSequenceParameterSets and the numberOfPictureParameterSets fields
3. Append (numberOfSequenceParameterSets | 0xE0) to codec_data
4. For each SPS NAL: append sequenceParameterSetLength and SPS NAL to codec_data
5. Append numberOfPictureParameterSets to codec_data
6. For each PPS NAL: append pictureParameterSetLength and PPS NAL to codec_data
7. Push new codec_data downstream
8. profit! :)
Tim, do you think having this complexity added to the demuxer makes sense? As (almost) all this is already handled inside h264parse already I think I prefer the original approach of exposing it in the caps.
I can live with either solution.
We should try to leave formats in their most original format for as long as possible. It might not always be the least complex option but a good argument for leaving the format as-is is when you want to remux it.
So where are we with this feature?
The consensus appears to be that we don't want to reformat AVC3 in to AVC1 structure.
Apart from a rebase of the patches to head, are there any other changes I should be working on to make them acceptable for incorporation?
AVC3 should be kept as different stream-format in the caps, and h264parse should be able to convert from/to AVC3.
Comment on attachment 247134 [details] [review]
adds support for avc3 sample format
Will push this (without the indention changes in the headers) after we have some support for it in h264parse. Without it's rather useless
Created attachment 250888 [details] [review]
add support for stream-format=avc3 to h264parser
Review of attachment 250888 [details] [review]:
This only parses avc3 as I see it, but does not do any conversion at all between avc<->avc3 and avc3<->bytestream
My understanding of h264parse is that in case of avc->bytestream h264parse will push the SPS and PPS from codec_data downstream at the start of the stream, at intervals controlled by "config-interval" and when there is a codec_data change.
In the case of avc3->bytstream h264parse detects that there is already SPS/PPS in the stream and sets h264parse->push_codec to FALSE. Therefore avc3->bytstream is already supported, except for the stream type (hence the patch).
In the case of bystream->avc h264parse will generate codec_data caps from the parsed SPS/PPS in the stream. However it does not remove these SPS/PPS from the stream. This means that the current h264parse is actually doing bytestream->avc3 but with a copy of the SPS/PPS in codec_data. Therefore bytestream->avc3 is the same as bytestream->avc except that the codec_data shouldn't have any SPS/PPS in it. My current set of patches don't provide this filtering of the codec_data when outputting in avc3 mode.
Given the current behaviour of h264parse, conversion between avc<->avc3 is complicated. A full conversion from avc3->avc would remove all SPS/PPS from the stream and only provide them via codec_data. Changing h264parse so that avc never has SPS/PPS in-band might break existing software that makes use of this in-band data.
Perhaps we need a new stream type (let's say "avc1") for which we can make rules about SPS/PPS location?
|stream_type | SPS in-band | SPS in codec_data |
| avc | maybe | always |
| avc1 | never | always |
| avc3 | always | never |
I think it should be sufficient for the conversions to always put PPS/SPS inband for AVC and only convert the codec_data as required. No decoder should fail on that.
On that basis, next week I will start working on code to exclude SPS/PPS from codec_data when the output format is avc3.
Great, thanks :)
Created attachment 253605 [details] [review]
add support for stream-format=avc3 to h264parser
Replaces patch 250888 with the agreed support for stream-format=avc3.
Also adds a unit test for stream-format=avc3
Created attachment 253606 [details] [review]
exclude avc3 stream format from AV_CODEC_ID_H264
As part of the changes to support the "avc3" variant of the ISO-BMFF (see bug #702004) a new stream-format has been created (video/x-h264, stream-format="avc3", alignment="au") that requires changes to gstavcodecmap to exclude this format because avdec_h264 expects the SPS and PPS to be in the codec_data.
The BBC have kindly provided some test streams for both AVC3 and AVC1 variants:
Created attachment 253632 [details] [review]
Addition to patch #253605
By doing a diff between my local branch and a git am of patch #253605 I have discovered that patch #253605 is missing one AVC3 related change that got lost when squashing gst-indent related changes.
I've spent most of the afternoon trying to produce a clean patch to replace #253605 but I'm admitting defeat. I can start with a clean version of gsth264parse.c that produces no changes when gst-indent is used, apply the change given in this patch and then gst-indent wants to re-indent most of the file - but this only happens if I try to squash them to a single patch. Doing them as two patches doesn't cause a gst-indent cataclysm.
Thanks, I've merged all this now :)
For squashing changes you could use "git rebase --interactive", and then follow the instructions there (that's what I did now).
Thank you for accepting the patches and for the git tip. I expect by the time I finally get to grips with git someone will have invented a new scm :)