After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 660770 - videodecoder, videoencoder: support "partial" frames / slices for raw video
videodecoder, videoencoder: support "partial" frames / slices for raw video
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: Normal enhancement
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks: 671909
 
 
Reported: 2011-10-03 10:27 UTC by Sebastian Dröge (slomo)
Modified: 2018-11-03 11:19 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Sebastian Dröge (slomo) 2011-10-03 10:27:26 UTC
Currently basevideodecoder and basevideoencoder handles everything in units of complete frames. This is suboptimal in some cases, when you could pass "partial" frames to the decoder already to reduce decoding latency, e.g. h264 NALs.
Comment 1 Aaron Boxer 2016-05-19 11:55:44 UTC
Does this require significant architectural changes?
Comment 2 Sebastian Dröge (slomo) 2016-05-20 06:16:10 UTC
Probably, yes. It requires exposing this information in the caps somehow so that elements that support the partial frames can negotiate that and you don't feed that to other elements that would then fail. It requires changes to the parsers and the videodecoder/encoder base classes (which operate on frames), the latter would be API additions.

It's not entirely clear yet how the latter would look like :)
Comment 3 Tim-Philipp Müller 2016-05-20 08:00:02 UTC
I think there's only any real benefit to this if it's supported "end-to-end" in some way, yet there are few capture APIs that give you partial frames as output,  decoder APIs that give you partial frames as output, encoder APIs that take partial frames as input, or render APIs that take partial frames as input.
Comment 4 Aaron Boxer 2016-05-20 15:44:25 UTC
Thanks. Personally, I am only interested in J2K streaming, where we can have an end to end pipeline for partial frames. For tiled encoding, I think there would be a big reduction in latency. 

Also, even if the image is not tiled, J2K images can be streamed progressively by resolution, so that lower resolutions can be decoded and displayed first. 

Resolution improve progressively as higher resolutions are sent.
Comment 5 Aaron Boxer 2016-05-20 15:46:31 UTC
Of course, there are "other fish to fry", so perhaps we should be this on the back burner.
Comment 6 Aaron Boxer 2016-06-30 20:02:47 UTC
I am currently looking into Ultra Low Latency JPEG 2000.

This mode of J2K encodes and decodes video at the stripe level (a J2K strip has
a height of 4 pixels)

This can be used in broadcast, for example, to get sub-frame latency.

I think a case could be made to support partial frames, at least for 
JPEG 2000.

Also, there is a new low-latency ISO standard called JPEG-XS:

https://jpeg.org/items/20150709_press.html


I would be interested in working on this.
Comment 7 Edward Hervey 2016-07-01 07:56:07 UTC
Great that people are interested in this ! My thoughts:

We might not actually need new caps, but instead a new GstMeta/API/CapsFeature.

You still have regular buffers, but the backing GstMemory is progressively filled (or read).

You have a new GstMeta/API to allow progressive reading/writing of the backing memory.

If you don't support the meta, doing a _map() will block until all the content has been written/read by the previous element supporting the stride APi.

If you do support the meta, you can then do progressive READ/WRITE.

As an element supporting the meta, you can push the buffer downstream as soon as there's at least one "slice" in it and you have properly filled the meta.

The meta could contain information such as:
* slice height
* slices available (that can be mapped by downstream)
* total number of slices
* method to do slice-based map (read/write)

for backward compatibility the memory should have fallback map implementations to wait for all slices to be available before returning the full map.
Comment 8 Aaron Boxer 2016-07-01 17:47:19 UTC
(In reply to Edward Hervey from comment #7)
> Great that people are interested in this ! My thoughts:
> 
> We might not actually need new caps, but instead a new
> GstMeta/API/CapsFeature.
> 
> You still have regular buffers, but the backing GstMemory is progressively
> filled (or read).
> 
> You have a new GstMeta/API to allow progressive reading/writing of the
> backing memory.
> 
> If you don't support the meta, doing a _map() will block until all the
> content has been written/read by the previous element supporting the stride
> APi.
> 
> If you do support the meta, you can then do progressive READ/WRITE.
> 
> As an element supporting the meta, you can push the buffer downstream as
> soon as there's at least one "slice" in it and you have properly filled the
> meta.
> 
> The meta could contain information such as:
> * slice height
> * slices available (that can be mapped by downstream)
> * total number of slices
> * method to do slice-based map (read/write)
> 
> for backward compatibility the memory should have fallback map
> implementations to wait for all slices to be available before returning the
> full map.


Cool. This sounds like a relatively low risk approach, where the existing interfaces don't change.
Comment 9 Guillaume Desmottes 2018-03-21 11:26:21 UTC
I'm interested in moving forward this feature.

The OMX encoder on the zynqultrascaleplus can produce encoded frames on multiple slices and the decoder can start decoding as soon as it receives the first slice.

We made it worked using a few hacks so subframes aren't piling up in base classes and buffers have the right metadata (ts, flags).
See the top 3 commits of https://gitlab.collabora.com/cassidy/gst-omx/commits/subframes-hack
It's pretty different from Edward's suggestion as we have one buffer per slice here.

I kinda of like the idea of progressive read/write though. Maybe that's something that could be used with memory fences buffers as well?

So I guess the first question here is which approach should we use? One buffer per subframe/slice or the progressive meta suggested above?
I'll have to put more thinking about how our cases could be implemented with the later and I'll be happy to do so if we agree that's the proper way to do.
Comment 10 Nicolas Dufresne (ndufresne) 2018-03-21 15:17:30 UTC
(In reply to Edward Hervey from comment #7)
> Great that people are interested in this ! My thoughts:
> 
> We might not actually need new caps, but instead a new
> GstMeta/API/CapsFeature.
> 
> You still have regular buffers, but the backing GstMemory is progressively
> filled (or read).
> 
> You have a new GstMeta/API to allow progressive reading/writing of the
> backing memory.
> 
> If you don't support the meta, doing a _map() will block until all the
> content has been written/read by the previous element supporting the stride
> APi.

I don't like the idea of hacking GstMemory/GstBuffer map to implement implicit fencing. It's just a bad idea, prone to stall, hard to interrupt, and complete with multiple GstMemory objects. Mapping a buffer should be deterministic. I'd say, if downstream does not support that, reconstruct locally and push full frame.

I don't think this apply to encoded data, which unfixed size. For raw data, this model applies to decoders that would allocate one buffer per frame, but if we have once buffer per slice, it kind of fall apart. Would be nice to see what is being done in the field, e.g. jpeg200, openh264 have that notion, ALG OMX etc.

> 
> If you do support the meta, you can then do progressive READ/WRITE.
> 
> As an element supporting the meta, you can push the buffer downstream as
> soon as there's at least one "slice" in it and you have properly filled the
> meta.

No need to wait really, you could push as soon as the timestamp are known.

> 
> The meta could contain information such as:
> * slice height
> * slices available (that can be mapped by downstream)
> * total number of slices
> * method to do slice-based map (read/write)
> 
> for backward compatibility the memory should have fallback map
> implementations to wait for all slices to be available before returning the
> full map.

The use of singular "memory" is bogus, also I'm not a big fan of these map/unmap function. The one in GstVideoMeta haven't served very well, most of the time, it get implemented to hide a bug in the offsets array.
Comment 11 Guillaume Desmottes 2018-05-30 14:30:26 UTC
So to summarize we currently have 3 suggested ways to implement subframes/slices:

a) Adding a meta and/or caps feature, have the GstMemory being progressively filled (Edward's suggestion) and map() blocking until the operation is complete.

Pro:
- Shouldn't break compat with unmodified elements

Cons:
- Kind of hacky
- Mapping is no longer deterministic

b) Use a GstBuffer for each slice/subframe like I did in my gst-omx hack

Pro:
- Close to current gst's memory model

Con:
- May create a big overhead and not scale if we have loads of small slices

c) Add new "parcel" API to exchange data as suggested by Nicolas at the gst conference : https://gstconf.ubicast.tv/videos/linux-explicit-dma-fences-in-gstreamer/#slide

Pro:
- Cleaner approach
- Could be used to implement explicit DMA fences as well

Con:
- New way to exchange data in gst making the framework even more complex


Did I miss anything?
What would be the next step to move this work forward?
Comment 12 Edward Hervey 2018-05-31 08:48:22 UTC
(In reply to Guillaume Desmottes from comment #11)
> So to summarize we currently have 3 suggested ways to implement
> subframes/slices:
> 
> a) Adding a meta and/or caps feature, have the GstMemory being progressively
> filled (Edward's suggestion) and map() blocking until the operation is
> complete.
> 
> Pro:
> - Shouldn't break compat with unmodified elements
> 
> Cons:
> - Kind of hacky

  Hacky in what sense ?

> - Mapping is no longer deterministic

  It never was deterministic ? There's no guarantee you'll get the map immediately (because it requires doing hardware fetches for example) or at all for that matter anywhere in GStreamer.

  Nothing prevents you from having an additional mapping API (like is done with other meta systems). And with the proper capsfeature one can avoid providing such buffer types.

(In reply to Nicolas Dufresne (ndufresne) from comment #10)
> (In reply to Edward Hervey from comment #7)
> > Great that people are interested in this ! My thoughts:
> > 
> > We might not actually need new caps, but instead a new
> > GstMeta/API/CapsFeature.
> > 
> > You still have regular buffers, but the backing GstMemory is progressively
> > filled (or read).

  Make that : the backing *memories*.

> > 
> > You have a new GstMeta/API to allow progressive reading/writing of the
> > backing memory.
> > 
> > If you don't support the meta, doing a _map() will block until all the
> > content has been written/read by the previous element supporting the stride
> > APi.
> 
> I don't like the idea of hacking GstMemory/GstBuffer map to implement
> implicit fencing. It's just a bad idea, prone to stall, hard to interrupt,
> and complete with multiple GstMemory objects. Mapping a buffer should be
> deterministic. I'd say, if downstream does not support that, reconstruct
> locally and push full frame.

  The point of having explicit map in GStreamer *is* to support fencing and cache invalidation if needed (in addition to allowing different memory handlers).

  With the capsfeature, upstream can decide what it wants to do. If all of downstream supports that feature => yay. If downstream doesn't => reconstruct.

> 
> I don't think this apply to encoded data, which unfixed size. For raw data,
> this model applies to decoders that would allocate one buffer per frame, but
> if we have once buffer per slice, it kind of fall apart. Would be nice to
> see what is being done in the field, e.g. jpeg200, openh264 have that
> notion, ALG OMX etc.
> 
> > 
> > If you do support the meta, you can then do progressive READ/WRITE.
> > 
> > As an element supporting the meta, you can push the buffer downstream as
> > soon as there's at least one "slice" in it and you have properly filled the
> > meta.
> 
> No need to wait really, you could push as soon as the timestamp are known.

  You'd still want to do some minor validation/checks before pushing downstream. Like whether the data *can* be handled.

> 
> > 
> > The meta could contain information such as:
> > * slice height
> > * slices available (that can be mapped by downstream)
> > * total number of slices
> > * method to do slice-based map (read/write)
> > 
> > for backward compatibility the memory should have fallback map
> > implementations to wait for all slices to be available before returning the
> > full map.
> 
> The use of singular "memory" is bogus, also I'm not a big fan of these
> map/unmap function. The one in GstVideoMeta haven't served very well, most
> of the time, it get implemented to hide a bug in the offsets array.

  There's no obligation to make the extra API a *map* like API. If a direct-access method can be provided, that can also be an option. The only required is to have a fallback map-based system.
Comment 13 Guillaume Desmottes 2018-05-31 08:49:16 UTC
(In reply to Aaron Boxer from comment #6)
> I am currently looking into Ultra Low Latency JPEG 2000.
> 
> This mode of J2K encodes and decodes video at the stripe level (a J2K strip
> has
> a height of 4 pixels)
> 
> This can be used in broadcast, for example, to get sub-frame latency.
> 
> I think a case could be made to support partial frames, at least for 
> JPEG 2000.
> 
> Also, there is a new low-latency ISO standard called JPEG-XS:
> 
> https://jpeg.org/items/20150709_press.html


Hi Aaron. Do you know any lib implementing this low latency mode? It doesn't seem supported by openjpeg.
Comment 14 Nicolas Dufresne (ndufresne) 2018-05-31 12:37:01 UTC
Cisco H264 encoder and decoder implements slice level encoding decoding.

Btw, my explicit fence proposal was just a derivative of Edward proposal. It's basically a way to make the blocking "map" call cancellable. The reason is that upstream may deliver a buffer prior to receiving the associated input. The buffer may end up in error state if the originating buffer was dropped, or the previous operation failed. At this level of non-determinism, you have to be able to cancel the operation when going to Null state or flushing.
Comment 15 Guillaume Desmottes 2018-05-31 14:16:26 UTC
(In reply to Nicolas Dufresne (ndufresne) from comment #14)
> Cisco H264 encoder and decoder implements slice level encoding decoding.

I've looked at openh264 but didn't find the slice level encoding API. EncodeFrame() is encoding the whole frame and return all the slices at once.
Do you have any pointers about how it's supposed to work?
Comment 16 GStreamer system administrator 2018-11-03 11:19:37 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/53.