GNOME Bugzilla – Bug 796519
Add AV1 codec parser
Last modified: 2018-11-03 14:25:40 UTC
I just want to let you know that I started work on an AV1 codecparser - it will be called gstav1parser - and I'll try to mimic the style of gstvp9parser.c This work is part of a GSOC 2018 project, mentored by Intel Media And Audio For Linux
*** Bug 796518 has been marked as a duplicate of this bug. ***
Excellent! If you have any questions or want opinions on anything feel free to pop into #gstreamer on FreeNode IRC.
Created attachment 372967 [details] [review] AV1 codecparser library preview gstav1parser: Add AV1 codecparser library This is a first preview version of the av1 codec parser library. The library consists of the two files gstav1parser.c and gstav1parser.h Additionally a testcase (av1parser.c) is provided. It contains a test sequence taken from the aom testdata set, with one key and one inter- frame. Feedback is highly appriciated. :) Open Questions: * Is the library interface appropiate? * Is it OK to use GNU C specific extentions like compound statement expressions? (checked with #ifdef __GNUC__) * Is the layout of the structs holding AV1 syntax elements sensible? * Does anybody know a good way to generate test data? I investigated the aom source (dump_obu only provides obu type/size information, inspect/AOMAnalyzer analyzes at the tile level). Till now I didn't find a tool in the aom source tree that analyzes the bitstream on a level matching the codecparser. This work is part of a GSOC 2018 project, mentored by Intel Media And Audio For Linux. https://bugzilla.gnome.org/show_bug.cgi?id=796519
Review of attachment 372967 [details] [review]: ::: tests/check/libs/av1parser.c @@ +60,3 @@ + bzero (&seq_header, sizeof (seq_header)); + bzero (&frame, sizeof (frame)); + bzero was deprecated, memset is preferred.
All of the other codecparser APIs(h264, vp9 etc) accept a pointer to data(which needs to be parsed) as the input argument of APIs, why av1parser designed to accept BitReader instead of raw data? This is asking the user to deal with BitReader in order to invoke the parsing APIs which is not an ideal way IMHO.
Review of attachment 372967 [details] [review]: Just a high level review. I agree with the comment that it's uncommon to expose a BitReader in the API. You could have a context that holds one if needed. Though, in general we use the GST_READ* macro as BitReader is slow (requires a lot of function calls). I haven't looked at the code yet, and haven't looked at the API. Is AV1 like VP8/9, which cannot be frames from raw data ? Or is it more like JPEG or H264 byte-stream ? ::: gst-libs/gst/codecparsers/gstav1parser.h @@ +8,3 @@ + * modify it under the terms of the GNU Library General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. I think GNU Lesser 2.1 should be used in new code. @@ +25,3 @@ + */ + /** + * SECTION:gstav1parser This needs to be added to the doc sections in docs/ @@ +30,3 @@ + * + * For more details about the structures, you can refer to the + * specifications: https://aomediacodec.github.io/av1-spec/av1-spec.pdf Should be mark down annotation. Something like [specification](https://..) @@ +31,3 @@ + * For more details about the structures, you can refer to the + * specifications: https://aomediacodec.github.io/av1-spec/av1-spec.pdf + * Add "Since: 1.16" @@ +52,3 @@ + + +//TODO: Import all Defines from SPEC Use C style comment, /* */ @@ +143,3 @@ +/** + * GstAV1OBUType: + * Doc/TODO @@ +389,3 @@ + +typedef enum { + GST_AV1_REF_INTRA_FRAME, All other enum used explicit =. @@ +416,3 @@ + +/** + * _GstAV1OBUHeader: Remove _ here, you want to refer to the typedef, not the struct. @@ +432,3 @@ + +struct _GstAV1OBUHeader { + guint8 obu_forbidden_bit; We don't usually put reserve bits into parsed structure. @@ +485,3 @@ + * initial_display_delay_minus_1 is not specified for this + * operating point. + * initial_display_delay_minus_1: plus 1 specifies, for operating point i, the number of decoded frames Missing @. @@ +592,3 @@ + guint8 separate_uv_delta_q; + guint8 BitDepth; + guint8 NumPlanes; use underscore notation instead of camel case. @@ +724,3 @@ + guint8 itu_t_t35_country_code; + guint8 itu_t_t35_country_code_extention_byte; + // itu_t_t35_payload_bytes - not supported at the moment Without structure padding, you'll need to add this in, as adding it later would be an ABI break. @@ +1134,3 @@ + guint8 UsesLr; + GstAV1FrameRestorationType FrameRestorationType[GST_AV1_MAX_NUM_PLANES]; + guint32 LoopRestorationSize[GST_AV1_MAX_NUM_PLANES]; No camel case please. @@ +1552,3 @@ + guint8 anchor_tile_col; + guint16 tile_data_size_minus_1; + //guint8 coded_tile_data[]; //skipped Would need to reserve that space in the struct for ABI stability. ::: tests/check/meson.build @@ +70,3 @@ ] + Spurious new line.
many thanks for your feedback. For most of them I can just go ahead and implement the changes. I do want to note two things: 1.) About CamelCase - I tried to stick two the AV1 Bitstream Spec - and in my understanding its using underscore notation for "direct" read bits from the stream and CamelCase if there is some calculation involved. -> here the question is what is better: Sticking to the original notation from the Spec? Or following the gstreamer standard layout? 2.) About reserving space in the structs. itu_t_t35_payload_bytes[] is not defined in the Spec - thus it is unclear to me how many bytes to reserve. Quote from the Spec: "Note: The exact syntax of itu_t_t35_payload_bytes is not defined in this specification. External specifications can define the syntax. From a decoder perspective, it suffices to say that decoders should ignore the entire OBU if they don’t understand it. The last byte of the valid content of the data is considered to be the last byte that is not equal to zero. This rule is to prevent the dropping of valid bytes by systems that interpret trailing zero bytes as a padding continuation of the trailing bits in an OBU. This implies that when any payload data is present for this OBU type, at least one byte of the payload data (including the trailing bit) shall not be equal to 0." -> Should I follow the advise and ignore the entire OBU? coded_tile_data[] - its size is variable with a maximum of 64KB per tile list entry. (and there are max. 512 of tile_list_entries) ... I can't allocate this statically. -> Maybe I should reserve a pointer for this? (this would also need a API function for freeing the space after use)
thanks for your repsone. I will have to study gstvp9parser and its use case a bit more to get the interface right.
(In reply to Georg Ottinger from comment #7) > many thanks for your feedback. For most of them I can just go ahead and > implement the changes. > > I do want to note two things: > > 1.) About CamelCase - I tried to stick two the AV1 Bitstream Spec - and in > my understanding its using underscore notation for "direct" read bits from > the stream and CamelCase if there is some calculation involved. > > -> here the question is what is better: Sticking to the original notation > from the Spec? Or following the gstreamer standard layout? My personal preference is to respect coding style, you can introduce a namespace for your attribute to differentiate the calculated one. > > 2.) About reserving space in the structs. > itu_t_t35_payload_bytes[] is not defined in the Spec - thus it is unclear to > me how many bytes to reserve. > > Quote from the Spec: "Note: The exact syntax of itu_t_t35_payload_bytes is > not defined in this specification. External specifications can > define the syntax. From a decoder perspective, it suffices to say that > decoders should ignore the entire OBU if they don’t understand it. The last > byte of the valid content of the data is considered to be the last byte that > is not equal > to zero. This rule is to prevent the dropping of valid bytes by systems that > interpret trailing zero bytes as a padding continuation of the trailing bits > in an OBU. This implies that when any payload data is present for this OBU > type, at > least one byte of the payload data (including the trailing bit) shall not be > equal to 0." > > -> Should I follow the advise and ignore the entire OBU? > > coded_tile_data[] - its size is variable with a maximum of 64KB per tile > list entry. (and there are max. 512 of tile_list_entries) ... I can't > allocate this statically. > > -> Maybe I should reserve a pointer for this? (this would also need a API > function for freeing the space after use) If the spec does not specify a maximum, dynamic allocation is required, and then a free function is required too.
(In reply to sreerenj from comment #5) > All of the other codecparser APIs(h264, vp9 etc) accept a pointer to > data(which needs to be parsed) as the input argument of APIs, why av1parser > designed to accept BitReader instead of raw data? > This is asking the user to deal with BitReader in order to invoke the > parsing APIs which is not an ideal way IMHO. When coding the test case for me it seemed the more straight forward way to use Bitreader as an argument - cause I am dealing here with a Stream of different OBUs with variable sizes. The VP9 codecparser exposes one parsing function (frame_header), AV1 has to deal with a number of differen OBUs (frame, frame_header, tile_group, sequence, metadata, tile_list). BitReader comes in handy, cause it is taking care about the bit-position accounting. If I change the interface to "const guint *data, gsize size" - I will loose the information how many bits/bytes were actually processed and handle this one layer above. That said I have no objections to change the interface as suggested.
Regarding the reserved_fields, Usually, we don't keep fields in public API structures to keep track of "reserved_bits". But for AV1, maybe it is better to keep those as it is for now. So that we can add fields in the same place without breaking ABI in future (unless other developers have objection). For itu_t_t35_payload_bytes, it is for external specifications. I would prefer to drop this field. What you can do is to add GST_PADDING/GST_PADDING_LARGE (for eg: https://cgit.freedesktop.org/gstreamer/gstreamer/tree/gst/gsttask.h#n154) so that we can extend in case if needed in the future.
(In reply to Nicolas Dufresne (ndufresne) from comment #6) > Review of attachment 372967 [details] [review] [review]: > > Just a high level review. I agree with the comment that it's uncommon to > expose a BitReader in the API. You could have a context that holds one if > needed. Though, in general we use the GST_READ* macro as BitReader is slow > (requires a lot of function calls). I haven't looked at the code yet, and > haven't looked at the API. > > Is AV1 like VP8/9, which cannot be frames from raw data ? Or is it more like > JPEG or H264 byte-stream ? > AV1 is packetized very similar to vp8/vp9.
(In reply to Georg Ottinger from comment #10) > (In reply to sreerenj from comment #5) > > All of the other codecparser APIs(h264, vp9 etc) accept a pointer to > > data(which needs to be parsed) as the input argument of APIs, why av1parser > > designed to accept BitReader instead of raw data? > > This is asking the user to deal with BitReader in order to invoke the > > parsing APIs which is not an ideal way IMHO. > > When coding the test case for me it seemed the more straight forward way to > use Bitreader as an argument - cause I am dealing here with a Stream of > different OBUs with variable sizes. The VP9 codecparser exposes one parsing > function (frame_header), AV1 has to deal with a number of differen OBUs > (frame, frame_header, tile_group, sequence, metadata, tile_list). BitReader > comes in handy, cause it is taking care about the bit-position accounting. > If I change the interface to "const guint *data, gsize size" - I will loose > the information how many bits/bytes were actually processed and handle this > one layer above. > > That said I have no objections to change the interface as suggested. Please Correct me if I'm wrong, IIUC AV1 is it always packetized (one full frame in a buffer with all necessary OBUs in it). right? If so, why can't we do it similar to vp9? Only one API which accepts the input raw data and parse+fill all the necessary structures (sequece, frame etc) if the corresponding obus are present? Or do you think it is possible to have the container formats to send only OBUs to the decoder instead of the whole frame (similar to H264 and HEVC)?
> Please Correct me if I'm wrong, > IIUC AV1 is it always packetized (one full frame in a buffer with all > necessary OBUs in it). right? If so, why can't we do it similar to vp9? Only > one API which accepts the input raw data and parse+fill all the necessary > structures (sequece, frame etc) if the corresponding obus are present? > > Or do you think it is possible to have the container formats to send only > OBUs to the decoder instead of the whole frame (similar to H264 and HEVC)? hmm, I am not 100% sure and I must admit that I don't fully understand the packetized concept - but I can sum up what I think AV1 does: AV1 is structured in 1.) temporal units 2.) frame units 3.) OBUs A temporal unit for example might look like this: *Temporal Delimiter OBU *Sequence Header OBU *Frame OBU it is marked by a "Temporal Delimiter OBU" and lasts before the next "Temporal Delimiter OBU" starts. but it can also look like this (for subsequent frames): * Temporal Delimiter OBU * Frame OBU Note: all subsequent temporal units need the original Sequence Header information in order to be decodeable. A Frame OBU is basically the SUM of a Frame Header OBU and a Tile Group OBU (minus one OBU header) and it is on its own a "frame unit" But a frame unit might also look like this: * Frame Header OBU * 1st Tile Group OBU * ... * Xth Tile Group OBU -> so there are a number of ways how temporal units and/or frame units might be structured. It might also be possible that within a temporal unit multiple frame units reside. Annex B for example additionally stores the information how much bytes the actual temporal unit and also the frame unit(s) occupy. (in this case the size of the "packet" is known) - but this is only for Annex B - in low overhead bitstream format (the standard) this is not the case. For example the output of dump_obu: ----- $ ./dump_obu ~/SourceCode/aom_testdata/av1-1-b8-01-size-16x16.ivf Temporal unit 0 OBU type: OBU_TEMPORAL_DELIMITER extension: no length: 2 OBU type: OBU_SEQUENCE_HEADER extension: no length: 12 OBU type: OBU_FRAME extension: no length: 169 TU size: 183 OBU overhead: 3 Temporal unit 1 OBU type: OBU_TEMPORAL_DELIMITER extension: no length: 2 OBU type: OBU_FRAME extension: no length: 77 TU size: 79 OBU overhead: 2 File total OBU overhead: 5
> Or do you think it is possible to have the container formats to send only > OBUs to the decoder instead of the whole frame (similar to H264 and HEVC)? About the Container format for example in when storing AV1 in IVF - each temporal unit is put into a IVF Frame. Providing a one-stop vp9-like interface would have the following issues: 1.) In order to decode the whole frame, there might be still a dependency on the previous Seqeunce Header OBU. 2.) The temporal unit can contain more than one frame unit 3.) The frame unit can be structured in different ways (for example multiple tile groups) 4.) The temporal units might hold additonal OBUs like metadata and/or tile list.
(In reply to Georg Ottinger from comment #14) > > Please Correct me if I'm wrong, > > IIUC AV1 is it always packetized (one full frame in a buffer with all > > necessary OBUs in it). right? If so, why can't we do it similar to vp9? Only > > one API which accepts the input raw data and parse+fill all the necessary > > structures (sequece, frame etc) if the corresponding obus are present? > > > > Or do you think it is possible to have the container formats to send only > > OBUs to the decoder instead of the whole frame (similar to H264 and HEVC)? > > hmm, I am not 100% sure and I must admit that I don't fully understand the > packetized concept - but I can sum up what I think AV1 does: > > AV1 is structured in > 1.) temporal units > 2.) frame units > 3.) OBUs > > A temporal unit for example might look like this: > > *Temporal Delimiter OBU > *Sequence Header OBU > *Frame OBU > > it is marked by a "Temporal Delimiter OBU" and lasts before the next > "Temporal Delimiter OBU" starts. > > > but it can also look like this (for subsequent frames): > > * Temporal Delimiter OBU > * Frame OBU > > Note: all subsequent temporal units need the original Sequence Header > information in order to be decodeable. I think we are dealing the same thing in vp9 parser by 1) keeping a void * structure(GstVp9ParserPrivate *) in public header 2) few attributes kept inside public header. Is it different from what you are talking about? > > > A Frame OBU is basically the SUM of a Frame Header OBU and a Tile Group OBU > (minus one OBU header) and it is on its own a "frame unit" > > But a frame unit might also look like this: > * Frame Header OBU > * 1st Tile Group OBU > * ... > * Xth Tile Group OBU > > -> so there are a number of ways how temporal units and/or frame units might > be structured. > > It might also be possible that within a temporal unit multiple frame units > reside. It looks very similar to Superframes in vp9. We don't deal superframes in vp9codecparser, but gstreamer-vaapi handle this:https://cgit.freedesktop.org/gstreamer/gstreamer-vaapi/tree/gst-libs/gst/vaapi/gstvaapidecoder_vp9.c#n540 I don't remember why we did it like that :(, may be for simplicity.. > > Annex B for example additionally stores the information how much bytes the > actual temporal unit and also the frame unit(s) occupy. (in this case the > size of the "packet" is known) - but this is only for Annex B - in low > overhead bitstream format (the standard) this is not the case. So I guess the container formats (eg:webm) only will communicate the size of OBUs, won't allow sending sequence header (or any other obu) separately. Also, I don't think we have any example cases to look for this at the moment. > > For example the output of dump_obu: > > ----- > $ ./dump_obu ~/SourceCode/aom_testdata/av1-1-b8-01-size-16x16.ivf > Temporal unit 0 > OBU type: OBU_TEMPORAL_DELIMITER > extension: no > length: 2 > OBU type: OBU_SEQUENCE_HEADER > extension: no > length: 12 > OBU type: OBU_FRAME > extension: no > length: 169 > TU size: 183 > OBU overhead: 3 > Temporal unit 1 > OBU type: OBU_TEMPORAL_DELIMITER > extension: no > length: 2 > OBU type: OBU_FRAME > extension: no > length: 77 > TU size: 79 > OBU overhead: 2 > File total OBU overhead: 5
(In reply to sreerenj from comment #16) > (In reply to Georg Ottinger from comment #14) > > > Please Correct me if I'm wrong, > > > IIUC AV1 is it always packetized (one full frame in a buffer with all > > > necessary OBUs in it). right? If so, why can't we do it similar to vp9? Only > > > one API which accepts the input raw data and parse+fill all the necessary > > > structures (sequece, frame etc) if the corresponding obus are present? > > > > > > Or do you think it is possible to have the container formats to send only > > > OBUs to the decoder instead of the whole frame (similar to H264 and HEVC)? > > > > hmm, I am not 100% sure and I must admit that I don't fully understand the > > packetized concept - but I can sum up what I think AV1 does: > > > > AV1 is structured in > > 1.) temporal units > > 2.) frame units > > 3.) OBUs > > > > A temporal unit for example might look like this: > > > > *Temporal Delimiter OBU > > *Sequence Header OBU > > *Frame OBU > > > > it is marked by a "Temporal Delimiter OBU" and lasts before the next > > "Temporal Delimiter OBU" starts. > > > > > > but it can also look like this (for subsequent frames): > > > > * Temporal Delimiter OBU > > * Frame OBU > > > > Note: all subsequent temporal units need the original Sequence Header > > information in order to be decodeable. > > I think we are dealing the same thing in vp9 parser by 1) keeping a void * > structure(GstVp9ParserPrivate *) in public header 2) few attributes kept > inside public header. > > Is it different from what you are talking about? Agree with Sree, maybe we need to hide the detail in private cpp. If we expose too many things to the user, they need to know too many details about the vc1 spec. > > > > > > > A Frame OBU is basically the SUM of a Frame Header OBU and a Tile Group OBU > > (minus one OBU header) and it is on its own a "frame unit" > > > > But a frame unit might also look like this: > > * Frame Header OBU > > * 1st Tile Group OBU > > * ... > > * Xth Tile Group OBU > > > > -> so there are a number of ways how temporal units and/or frame units might > > be structured. > > > > It might also be possible that within a temporal unit multiple frame units > > reside. > > > It looks very similar to Superframes in vp9. > We don't deal superframes in vp9codecparser, but gstreamer-vaapi handle > this:https://cgit.freedesktop.org/gstreamer/gstreamer-vaapi/tree/gst-libs/ > gst/vaapi/gstvaapidecoder_vp9.c#n540 > > I don't remember why we did it like that :(, may be for simplicity.. > > > > > > Annex B for example additionally stores the information how much bytes the > > actual temporal unit and also the frame unit(s) occupy. (in this case the > > size of the "packet" is known) - but this is only for Annex B - in low > > overhead bitstream format (the standard) this is not the case. > > So I guess the container formats (eg:webm) only will communicate the size of > OBUs, won't allow sending sequence header (or any other obu) separately. > Also, I don't think we have any example cases to look for this at the moment. > > > > > > For example the output of dump_obu: > > > > ----- > > $ ./dump_obu ~/SourceCode/aom_testdata/av1-1-b8-01-size-16x16.ivf > > Temporal unit 0 > > OBU type: OBU_TEMPORAL_DELIMITER > > extension: no > > length: 2 > > OBU type: OBU_SEQUENCE_HEADER > > extension: no > > length: 12 > > OBU type: OBU_FRAME > > extension: no > > length: 169 > > TU size: 183 > > OBU overhead: 3 > > Temporal unit 1 > > OBU type: OBU_TEMPORAL_DELIMITER > > extension: no > > length: 2 > > OBU type: OBU_FRAME > > extension: no > > length: 77 > > TU size: 79 > > OBU overhead: 2 > > File total OBU overhead: 5
+struct _GstAV1Parser { + struct { + guint8 SeenFrameHeader; + guint8 temporal_id; + guint8 spatial_id; + guint64 obu_start_position; + GstAV1Size obu_size; + GstAV1OBUType obu_type; + } state; + + GstAV1ReferenceFrameInfo ref_info; + //References + GstAV1SequenceHeaderOBU *seq_header; + GstAV1FrameHeaderOBU *frame_header; +}; + I have a concern about this. The temproal_id, frame_header may only be related to one frame, why we put this in parser? I suggest we follow some logic like this 1. If some information only applied in on a frame we can put it to FrameHeader. 2. If some information applied to multi frame if caller need knows this information, we can put it to the public. if the caller does not need to know this information, we can put it to a private c file thanks
> I have a concern about this. The temproal_id, frame_header may only be > related to one frame, why we put this in parser? > I suggest we follow some logic like this > 1. If some information only applied in on a frame we can put it to > FrameHeader. > 2. If some information applied to multi frame > if caller need knows this information, we can put it to the public. > if the caller does not need to know this information, we can put it > to a private c file > thanks this explaination was very helpful. I should have guessed it before that private means private (like in a object orientate way)
Here is some info about AV1s High Level Syntax: https://gstconf.ubicast.tv/permalink/v12586e9be9ee55t8f6p/#start=248 and https://youtu.be/yKEDf5-2sT4?t=172 and slides here: https://gstreamer.freedesktop.org/data/events/gstreamer-conference/2017/Thomas%20Daede%20-%20AV1:%20The%20Quest%20is%20Nearly%20Complete.pdf
I also try to imagine what possible combinations of OBU compositions the parser should be able to handle: 1. The first frame in a Sequence might look like this: * Temp Delimiter OBU * Sequence OBU * Frame OBU 2. Following frames in its simplest form may look like this * Temp Delimiter OBU * Frame OBU 3. The Superframe / B-Frame like case may look like this * Temp Delimiter OBU * Frame 1 OBU * Frame 2 OBU 4. The parallel encoded / tile groups individuall decodable case may look like this * Temp Delimiter OBU * Frame Header OBU * Tile Group 1 OBU ... * Tile Group N OBU 5. combining case 3 and 4 i might get a structure similar to this one: * Temp Delimiter OBU * Frame Header 1 OBU * Tile Group 1.1 OBU ... * Tile Group 1.N OBU * Frame Header 2 OBU * Tile Group 2.1 OBU ... * Tile Group 2.N OBU Note: All the above cases may be even more advanced by adding Metadata and/or Tile Info OBUs -> Should the goal of the codecparser library be one superfunction capable of parsing all variations of OBU compositions (within a temporal unit)?
(In reply to Georg Ottinger from comment #21) > I also try to imagine what possible combinations of OBU compositions the > parser should be able to handle: > > 1. The first frame in a Sequence might look like this: > > * Temp Delimiter OBU > * Sequence OBU > * Frame OBU > > 2. Following frames in its simplest form may look like this > > * Temp Delimiter OBU > * Frame OBU > > 3. The Superframe / B-Frame like case may look like this > > * Temp Delimiter OBU > * Frame 1 OBU > * Frame 2 OBU > > 4. The parallel encoded / tile groups individuall decodable case may look > like this > > * Temp Delimiter OBU > * Frame Header OBU > * Tile Group 1 OBU > ... > * Tile Group N OBU > > 5. combining case 3 and 4 i might get a structure similar to this one: > > * Temp Delimiter OBU > * Frame Header 1 OBU > * Tile Group 1.1 OBU > ... > * Tile Group 1.N OBU > > * Frame Header 2 OBU > * Tile Group 2.1 OBU > ... > * Tile Group 2.N OBU > > Note: All the above cases may be even more advanced by adding Metadata > and/or Tile Info OBUs > > -> Should the goal of the codecparser library be one superfunction capable > of parsing all variations of OBU compositions (within a temporal unit)? The question is whether AV1 allows the existence of individual OBUs (eg: sending only a tile group as an rtp packet)? If the answer is "YES", it may make sense to add individual parsing APIs for each OBUs. But it requires packetization stuff similar to H264. Probably an API for parsing the OBU header, then based on obu header type, user invoke individual APIs. But if the answer is "NO", a single API should be enough IMHO. Not sure whether I'm missing something, but so far everything looks very similar to VP9 and I prefer following the same structure (single API) for AV1 too. Let's wait for suggestions from other developers before proceeding with a patch.
Review of attachment 372967 [details] [review]: ::: gst-libs/gst/codecparsers/gstav1parser.h @@ +439,3 @@ + guint32 obu_size; + guint8 obu_temporal_id; + guint8 obu_spatial_id; I would prefer to use a separate structure to hold the ObuExtendedHeader values in ObuHeader. eg: see the GstH264NalUnit in h264parser, based on the extension type we add GstH264NalUnitExtensionMVC values in GstH264NalUnits
> The question is whether AV1 allows the existence of individual OBUs (eg: > sending only a tile group as an rtp packet)? I think the answer is a "YES" - Thomas Daede is explaining that a Tile-Group OBU is bascially the smallest entity to put in an Container / RTP Packet. it is really worth watching (just 3 minutes) of his explaination about the AV1 high level syntax: https://gstconf.ubicast.tv/permalink/v12586e9be9ee55t8f6p/#start=248 --- About AV1 bitstream - I think there starting point was VP9 but evolved into something more granular (which is not unsimilar to H.264)
Created attachment 373048 [details] [review] gstav1parser: Add AV1 codecparser library Preview 2
Then we probably need an alignment field, e.g. frame vs tile-group ?
Review of attachment 373048 [details] [review]: ::: gst-libs/gst/codecparsers/gstav1parser.c @@ +39,3 @@ +#include "gstav1parser.h" + +#define GST_AV1_DEBUG_HELPER() GST_DEBUG("entering function %s", __func__) IMHO, there's no need to print __func__ because GST_DEBUG already does that for you. By default it show file name, line number and function name. So basically you should print "enter". Nevertheless, this is cool for development, not for merging :D @@ +67,3 @@ +#ifdef __GNUC__ + /*GNU C allows using compound statements as expressions,thus returning calling function on error is possible */ +#define gst_av1_read_bits(br, nbits) ({GstAV1ParserResult _br_retval; guint64 _br_bits=gst_av1_read_bits_checked(br,(nbits),&_br_retval); GST_AV1_EVAL_RETVAL_LOGGED(_br_retval); _br_bits;}) This macro is hard to follow and prone to error Check https://webassemblycode.com/while0-macros-use/ It should be like #define GST_AV1_READ_BITS(br, nbits) \ do { \ GstAV1ParserResult _br_retval; \ guint64 br_bits = gst_av1_read_bits_checked(br, (nbits), &_br_retval); \ GST_AV1_EVAL_RETVAL_LOGGED(_br_retval); \ _br_bit; \ } while(0) Though, I don't know if that actually works in this returning value mechanism. @@ +154,3 @@ + return x; + return y; +} glib already offers MAX macro: https://developer.gnome.org/glib/stable/glib-Standard-Macros.html#MAX:CAPS Also MIN @@ +173,3 @@ + + return z; +} This looks like CLAMP https://developer.gnome.org/glib/stable/glib-Standard-Macros.html#CLAMP:CAPS @@ +600,3 @@ + + if (seq_header->reduced_still_picture_header) { + seq_header->timing_info_present_flag = 0; If the structure is already initialized by 0, it doesn't worth to set zero particular variables. Another option would be, not to initialize the structure, but fill it completely, variable by variable. The issue would be if in the future a new variable is added into the structure. ::: gst-libs/gst/codecparsers/gstav1parser.h @@ +49,3 @@ +G_BEGIN_DECLS + +#define GUINT32_MAX UINT32_MAX There is already G_MAXUINT32 https://developer.gnome.org/glib/stable/glib-Basic-Types.html#G-MAXUINT32:CAPS @@ +130,3 @@ + * + */ + If I recall correctly, this space will break the gtkdoc generation... It is a little bit complex, but it would be nice if you could check the correct generation of the documentation. @@ +1624,3 @@ + +GST_CODEC_PARSERS_API +GstAV1ParserResult gst_av1_parse_obu_header (GstAV1Parser * parser, GstBitReader * br, GstAV1OBUHeader * obu_header, gsize annexb_sz); I GstAV1Parser have to be created in the heap, it can contain its GstBitReader, isn't it? In that way, if I understand correctly, there wouldn't be no need to pass it around as a parameter.
> In that way, if I understand correctly, there wouldn't be no need to pass it > around as a parameter. :) - that was my starting point ... I think things will be more clear after our talk on 16/17.
> + if (seq_header->reduced_still_picture_header) { > + seq_header->timing_info_present_flag = 0; > > If the structure is already initialized by 0, it doesn't worth to set zero > particular variables. I kind of like the Idea to leave this line in place, because it makes the functions look more like the Spec
(In reply to sreerenj from comment #12) > (In reply to Nicolas Dufresne (ndufresne) from comment #6) > > Review of attachment 372967 [details] [review] [review] [review]: > > Is AV1 like VP8/9, which cannot be frames from raw data ? Or is it more like > > JPEG or H264 byte-stream ? > > > AV1 is packetized very similar to vp8/vp9. This is not exactly true anymore, there is a Annex B format somewhat like H.264's own Annex B. So you can just store a "raw" AV1 bitstream into a file and parse it back.
Some thoughts about AV1 Annex B, as far as I understood it. AV1's Annex B is providing size information about Temporal Units and Frames Units and the size of the single OBUs. But if Annex B is not used it is mandatory that OBUs include their obu_size field. So in both modes "Annex B" or "LOBF" (Low overhead bitstream format) the size information of the OBU is present.
Quote from the Spec: "This specification defines a low-overhead bitstream format as a sequence of the OBU syntactical elements defined in this section. When using this format, obu_has_size_field must be equal to 1. For applications requiring a format where it is easier to skip through frames or temporal units, a length-delimited bitstream format is defined in Annex B." as a side note: the mandatoriness obu_has_size_field only showed up in the Spec around 14. July 2018.
Created attachment 373286 [details] [review] gstav1parser: Add AV1 codecparser library Preview 3
I'm confused with the usage of gst_av1_parse_get_first_obu() and gst_av1_parse_get_next_obu(). Does the main intention be to deal with the annex-b format? If so, Would it be hard to club the two APIs into a general one if it is not annex-b? In h264 and hevc, we use separate APIs for different internal formats. For eg: gst_h264_parser_identify_nalu() for NAL aligned (annex-b) and gst_h264_parser_identify_nalu_avc() for packed format.
FYI, I started writing a parser element using this library. Nothing exciting yet, pretty much only the boilerplate so far: https://gitlab.collabora.com/xclaesse/gst-plugins-bad/tree/av1parse
(In reply to Xavier Claessens from comment #35) > FYI, I started writing a parser element using this library. Nothing exciting > yet, pretty much only the boilerplate so far: > https://gitlab.collabora.com/xclaesse/gst-plugins-bad/tree/av1parse thanks for the info - your feedback if the libraries API is appropiat is appriciated
(In reply to sreerenj from comment #34) > I'm confused with the usage of gst_av1_parse_get_first_obu() and > gst_av1_parse_get_next_obu(). gst_av1_parse_get_first_obu() expects a data chunk (e.g. some kind of packet) this packet can hold one of more OBUs. In the current Version of the parser library - the data needs to be "OBU-aligened" in the case of low overhead bitstream format (Section 5) - Whereas in Annex B the data has to be "TemporalUnit-aligned" As an example a Temporal Unit consists of at least two OBUs: Frame OBU Temporal Delimiter OBU In this case gst_av1_parse_get_first_obu() returns the Frame OBU - and gst_av1_parse_get_next_obu() would return the Temporal Delimiter OBU. Because both OBUs might be in one packet gst_av1_parse_get_first_obu() also stores the referencs to the packet as well as current position information, so that gst_av1_parse_get_next_obu() only needs *parser and *obu as parameters.
Created attachment 373331 [details] [review] libs: av1parser: gtk-doc generation and gitignore some missing bits
Created attachment 373332 [details] [review] WIP: fix documentation Documentation for gtk-doc https://developer.gnome.org/gtk-doc-manual/stable/documenting_symbols.html.en https://developer.gnome.org/gtk-doc-manual/stable/documenting_syntax.html.en
Georg: I just added the boilerplate to build the documentation (attachment 373331 [details] [review]) -- when using autotools, I didn't test it with meson attachment 373332 [details] [review] is a work in progress to fix the problems when generating the documentation. Please complete the documentation fixes and merge both patches into yours. Regarding the parser itself, I still have my doubts when using #define to declare functions, I would rather use inlined functions... dunno...
> Regarding the parser itself, I still have my doubts when using #define to > declare functions, I would rather use inlined functions... dunno... I guess you made a point here - I also think that the BitReader Functions are checking boundaries a bit to often. So maybe when tuning the the library for speed it would be a good moment to rework the BitReader functions altogether.
Review of attachment 373286 [details] [review]: as this parser needs to allocate memory and free it, perhaps it is required to add the glib's annotation for autoclean pointers. ::: gst-libs/gst/codecparsers/gstav1parser.c @@ +167,3 @@ +#define gst_av1_bit_reader_skip(br, nbits) if (!gst_bit_reader_skip(br,(nbits))) { \ + GST_LOG("Skip Bit Error at line %d",__LINE__); \ + return GST_AV1_PARSER_SKIPBITS_ERROR; \ inlined function instead? @@ +186,3 @@ + +static gint +gst_av1_helpers_FloorLog2 (guint32 x) As this is an inside helper, I would keep the low_case format (not CamelCase), without the namespace for example, _floor_log2(.. @@ +198,3 @@ + +static gint +gst_av1_helper_TileLog2 (gint blkSize, gint target) ditto _tile_log2(.. @@ +207,3 @@ + +static gint +gst_av1_helper_InverseRecenter (gint r, gint v) ditto _inverse_recenter(..
> > ::: gst-libs/gst/codecparsers/gstav1parser.c > @@ +167,3 @@ > +#define gst_av1_bit_reader_skip(br, nbits) if > (!gst_bit_reader_skip(br,(nbits))) { \ > + GST_LOG("Skip Bit Error at > line %d",__LINE__); \ > + return > GST_AV1_PARSER_SKIPBITS_ERROR; \ > > inlined function instead? > the point of the #define is to generate something that can return the "calling" function. I am not aware of a way to do this with inline functions ...
> I just added the boilerplate to build the documentation (attachment 373331 [details] [review] > [details] [review]) -- when using autotools, I didn't test it with meson > > attachment 373332 [details] [review] [review] is a work in progress to fix the > problems when generating the documentation. > > Please complete the documentation fixes and merge both patches into yours. many thanks for help with the documentation.
Review of attachment 373286 [details] [review]: ::: gst-libs/gst/codecparsers/gstav1parser.c @@ +558,3 @@ + */ +GstAV1ParserResult +gst_av1_parse_get_first_obu (GstAV1Parser * parser, gst_av1_parse_get_first_obu and gst_av1_parse_get_next_obu are standalone functions. It's not related to parser. Could we split it into a different structure? For example: GstAV1Split* gst_av1_split_new(data, size, use_annexb); gst_av1_split_get_next_obu gst_av1_split_free @@ +1135,3 @@ + break; + default: + return GST_AV1_PARSER_ERROR; for a decoder/parser, if we meet unrecognized things. We usually drop it. This is for forwarding compilability. For example, av1 add new meta, but it's not important to us. Instead of return error, we'd better ignore it @@ +1157,3 @@ + /* superres_params() */ + if (seq_header->enable_superres) { + frame_header->use_superres = gst_av1_read_bit (br); please make use_superres as a local variable @@ +1216,3 @@ +{ + GST_AV1_DEBUG (); + frame_header->render_and_frame_size_different = gst_av1_read_bit (br); should be local @@ +1240,3 @@ + GST_AV1_DEBUG (); + for (i = 0; i < GST_AV1_REFS_PER_FRAME; i++) { + frame_header->found_ref = gst_av1_read_bit (br); this is another example we can put it in local. nobody use found_ref beyond the function. Its only be used to init some other values @@ +2629,3 @@ + +static GstAV1ParserResult +gst_av1_load_reference_frame (GstAV1Parser * parser) could we change it to a more descriptive name. like gst_av1_copy_info_from_reference @@ +2647,3 @@ + return GST_AV1_PARSER_ERROR; + frame_header->current_frame_id = + ref_info->entry[frame_header->frame_to_show_map_idx].ref_frame_id; conside use ref = & ref_info->entry[frame_header->frame_to_show_map_idx] and ref->ref_frame_id @@ +2789,3 @@ + coded_tile_data_ptr = obu->data + gst_bit_reader_get_pos (&br) / 8; + memcpy (tile_list->entry[tile].coded_tile_data, coded_tile_data_ptr, + tile_list->entry[tile].tile_data_size_minus_1 + 1); the max value of tile_data_size_minus_1 2^16, it's very expensive, could we just record offset here. so we do not need memcpy @@ +2861,3 @@ + for (tile_num = tile_group->tg_start; tile_num <= tile_group->tg_end; + tile_num++) { + tile_group->entry[tile_num].tile_row = uses entry = &tile_group->entry[tile_num] info = frame_header->tile_info will simpify the codde @@ +2870,3 @@ + } else { + gint tile_size_minus_1 = gst_av1_bitstreamfn_le (&br, + frame_header->tile_info.tile_size_bytes, &retval); consider return error if tile_group->entry[tile_num].tile_size - frame_header->tile_info.tile_size_bytes > sz ::: gst-libs/gst/codecparsers/gstav1parser.h @@ +1460,3 @@ + gboolean allow_warped_motion; + gboolean reduced_tx_set; + gboolean is_filter_switchable; better change this as a functional level temporary variable. A decoder will not use it since they have interpolation_filter. A basic idea is if the variable only used by one function, we can put it in local. @@ +1670,3 @@ + GstAV1ReferenceFrameInfo ref_info; + /*References */ + GstAV1SequenceHeaderOBU *seq_header; Point to an external structure is harmful. You do not know allocation status of the memory. It may be freed by the user. Why we need this and "gst_av1_parse_sequence_header_obu (GstAV1Parser * parser, GstAV1OBU * obu, GstAV1SequenceHeaderOBU * seq_header);" at same time? We'd better choose one side. 1. change this to "GstAV1SequenceHeaderOBU seq_header;" and remove seq_headder in gst_av1_parse_sequence_header_obu or. 2. keep seq_header in gst_av1_parse_sequence_header_obu, and pass seq_header to gst_av1_parse_frame_header_obu
I have a simple loop to parse av1 buffers coming out of matroskademux, basically this: gst_buffer_map (buffer, &map, GST_MAP_READ); data = map.data; size = map.size; for (result = gst_av1_parse_get_first_obu (self->obuparser, data, 0, size, &obu); result == GST_AV1_PARSER_OK; result = gst_av1_parse_get_next_obu (self->obuparser, &obu)) { framesize += obu.size; } gst_buffer_unmap (buffer, &map); Is it normal that after that loop framesize != size ? There are 5 to 15 bytes remaining.
> Is it normal that after that loop framesize != size ? There are 5 to 15 > bytes remaining. Hi Xavier, i only tested the codec parser library against a limited test set only(basically only the included automated tests). I am a bit busy right now learning for an exam - but if you could provide me your input data - I could try reproducing / fixing the issue.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/728.