Bug 706774 – qtdemux: handle multiple sample descriptions per track

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 706774 - qtdemux: handle multiple sample descriptions per track


Summary:	qtdemux: handle multiple sample descriptions per track


Status:	RESOLVED DUPLICATE of bug 607471

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-08-26 01:10 UTC by Matej Knopp
Modified:	2018-08-23 06:50 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch (5.92 KB, patch) 2013-08-26 01:10 UTC, Matej Knopp	needs-work	Details \| Review
Updated patch for master (7.43 KB, patch) 2013-12-01 20:29 UTC, Matej Knopp	needs-work	Details \| Review
Updated patch for current master (5.98 KB, patch) 2014-09-28 12:08 UTC, Matej Knopp	rejected	Details \| Review
decodebin3-parse: do not overwrite GstStream caps after it is sent (2.48 KB, patch) 2017-07-16 07:45 UTC, Thiago Sousa Santos	none	Details \| Review
decodebin3: avoid using gst_stream_get_caps for negotiation purposes (5.76 KB, patch) 2017-08-25 08:20 UTC, Thiago Sousa Santos	none	Details \| Review

Description Matej Knopp 2013-08-26 01:10:48 UTC

Created attachment 253082 [details] [review]
Patch

Apparently there can be multiple sample description per track, meaning that different samples can be in different format. GStreamer doesn't handle this currently, which can result in decoders getting buffers in format they can't handle.

Attached patch makes sure that samples in other formats than specified in first sample description are ignored, with one exception. If first sample description is JPEG and there is another sample description after it, the second sample description is used as it is potentially a video track (first jpeg being still image). This is hacky, ideally we should expose the image and handle format switches somehow, but this is enough to get basic playback of this insanity working.

Sample video: https://s3.amazonaws.com/MatejK/Samples/qtdemux-multiple-sample-descriptions.mov
Note that the audio track starts at 11s, so when testing make sure the multiqueues can handle that :)

Comment 1 Sebastian Dröge (slomo) 2013-08-26 08:25:49 UTC

This should probably be handled by just sending a new caps event with the new caps of the next sample instead. Currently this would fail negotiation but once proper support for renegotiation in decodebin is implemented this is going to work.

Comment 2 Matej Knopp 2013-08-26 10:15:09 UTC

This is going to be tricky to support for transcoding / remuxing.

Comment 3 Matej Knopp 2013-12-01 20:29:00 UTC

Created attachment 263251 [details] [review]
Updated patch for master

In case someone needs it.

Comment 4 Matej Knopp 2013-12-01 20:29:54 UTC

Also I'm not sure that renegotiation is going to solve this. I don't know how this is supposed to work in quick time. Basically there are multiple kinds of samples per track, are you supposed to keep the decoder open, or re-open it every time sample format is different?

Comment 5 Sebastian Dröge (slomo) 2014-02-04 12:13:53 UTC

You would need to create a new stream group for every caps change, and basically work like hlsdemux or dashdemux for example.

Comment 6 Matej Knopp 2014-09-28 12:08:45 UTC

Created attachment 287289 [details] [review]
Updated patch for current master

Comment 7 Edward Hervey 2017-04-12 13:38:54 UTC

A bunch of commits went in today to support multiple stsd. This might be worth updating

Comment 8 Matej Knopp 2017-04-15 11:07:18 UTC

I don't supposed you tested your commits with the file I've provided?

Comment 9 Thiago Sousa Santos 2017-07-04 18:17:32 UTC

This file causes a race condition in decodebin3. It has 2 stsd entries: jpeg and svq3. Only the first frame uses jpeg all the rest of the clip is svq3 so this is what happens:

1) qtdemux creates and pushes a stream start with a GstStream object with caps=jpeg, then pushes caps and the jpeg buffer
2) decodebin3 receives on its probe the stream-start and stores the GstStream
3) decodebin3 receives the jpeg caps event

In parallel qtdemux moves to the second frame and updates the GstStream to have svq3 caps.

The issue arises from the fact that decodebin3 uses gst_stream_get_caps to create the decoder and not the caps from the event, because it assumes they are the same. What happens is that it can receive the jpeg caps event after the GstStream has been set to svq3 so it will create a svq3 decodere to handle the caps event and the next buffer and it all fails.

Replacing gst_stream_get_caps to use the event's caps makes it work reliably but I still need to understand a bit more of decodebin3 to decide if this is the right approach. Also, this makes using gst_stream_get_caps not useful for negotiation purposes.

Comment 10 Thiago Sousa Santos 2017-07-16 07:45:39 UTC

Created attachment 355697 [details] [review]
decodebin3-parse: do not overwrite GstStream caps after it is sent

Otherwise decodebin3 might use an updated caps while it is still dealing
with the previous format, leading to not-negotiated errors.

To overcome this, GstStream is treated as immutable and when a new caps
is received, the parser will make a copy of GstStream before updating it
and pushing downstream. This ensures the events are properly serialized
with the stream, avoiding the race condition on reading the caps from
gststream in decodebin3.

Comment 11 Sebastian Dröge (slomo) 2017-07-16 07:56:54 UTC

See also https://bugzilla.gnome.org/show_bug.cgi?id=777572 about GstStream updating

Comment 12 Thiago Sousa Santos 2017-07-16 08:11:13 UTC

According to https://bugzilla.gnome.org/show_bug.cgi?id=777572#c1, then decodebin3 should not use gst_stream_get_caps but should rely only on the GstCaps event for the format negotiations. At the same time, if demuxers are creating a new GstStream when something changes I feel that it is safe to use the GstStream caps for the same purposes of the GstCaps event. Isn't that so?

(Feel free to move this discussion to the other bug if you feel it is more appropriate).

Comment 13 Thiago Sousa Santos 2017-08-11 18:56:14 UTC

What should we do about this? Update decodebin3 to use Caps event instead of getting caps from stream event? Anything else left?

Comment 14 Sebastian Dröge (slomo) 2017-08-14 07:00:41 UTC

Is that enough? That seems required in any case.

Comment 15 Thiago Sousa Santos 2017-08-25 08:20:15 UTC

Created attachment 358385 [details] [review]
decodebin3: avoid using gst_stream_get_caps for negotiation purposes

The information might not be reliable as multiple elements in the
pipeline can modify it causing race conditions. Use the caps event that
should be always correct.

Comment 16 Edward Hervey 2017-08-28 08:10:21 UTC

Don't push this patch until the full picture is figured out.

Comment 17 Edward Hervey 2018-08-23 06:39:31 UTC

Review of attachment 287289 [details] [review]:

This has already been supported since 2016.

Comment 18 Edward Hervey 2018-08-23 06:44:23 UTC

So I generally agree on the fundamental problem which is that whoever creates the collection and streams can updated them *separately* from what actual data flow is currently happening at the multiqueue level.

Furthermore I'm working on adding more collection/stream emission in upstream elements (demuxers, adaptive demuxers, ...) for which this will be even more TRUE.

==> So yes, decodebin3's handling of caps should be done using the *actual* caps travelling through multiqueue.

Regarding the "multiple stsd entries" in qtdemux, this corresponds to "stream variants". i.e. it's the same "stream" per-se (from a human audio/visual point of view) but the nature of it (codec, resolution,...) can change throughout time.

It's a bit like adaptive streaming. The bitrate, resolution, framerate,... can change throughout time, but from a human perspective it's still the same "stream". (Note that DASH uses this multiple stsd-entry system extensively for exactly that purpose).

I'm working on an API extension of GstStreamCollection to be able to specify such streams and variants. I'll keep this updated.

Comment 19 Edward Hervey 2018-08-23 06:50:41 UTC


*** This bug has been marked as a duplicate of bug 607471 ***