GNOME Bugzilla – Bug 706774
qtdemux: handle multiple sample descriptions per track
Last modified: 2018-08-23 06:50:41 UTC
Created attachment 253082 [details] [review] Patch Apparently there can be multiple sample description per track, meaning that different samples can be in different format. GStreamer doesn't handle this currently, which can result in decoders getting buffers in format they can't handle. Attached patch makes sure that samples in other formats than specified in first sample description are ignored, with one exception. If first sample description is JPEG and there is another sample description after it, the second sample description is used as it is potentially a video track (first jpeg being still image). This is hacky, ideally we should expose the image and handle format switches somehow, but this is enough to get basic playback of this insanity working. Sample video: https://s3.amazonaws.com/MatejK/Samples/qtdemux-multiple-sample-descriptions.mov Note that the audio track starts at 11s, so when testing make sure the multiqueues can handle that :)
This should probably be handled by just sending a new caps event with the new caps of the next sample instead. Currently this would fail negotiation but once proper support for renegotiation in decodebin is implemented this is going to work.
This is going to be tricky to support for transcoding / remuxing.
Created attachment 263251 [details] [review] Updated patch for master In case someone needs it.
Also I'm not sure that renegotiation is going to solve this. I don't know how this is supposed to work in quick time. Basically there are multiple kinds of samples per track, are you supposed to keep the decoder open, or re-open it every time sample format is different?
You would need to create a new stream group for every caps change, and basically work like hlsdemux or dashdemux for example.
Created attachment 287289 [details] [review] Updated patch for current master
A bunch of commits went in today to support multiple stsd. This might be worth updating
I don't supposed you tested your commits with the file I've provided?
This file causes a race condition in decodebin3. It has 2 stsd entries: jpeg and svq3. Only the first frame uses jpeg all the rest of the clip is svq3 so this is what happens: 1) qtdemux creates and pushes a stream start with a GstStream object with caps=jpeg, then pushes caps and the jpeg buffer 2) decodebin3 receives on its probe the stream-start and stores the GstStream 3) decodebin3 receives the jpeg caps event In parallel qtdemux moves to the second frame and updates the GstStream to have svq3 caps. The issue arises from the fact that decodebin3 uses gst_stream_get_caps to create the decoder and not the caps from the event, because it assumes they are the same. What happens is that it can receive the jpeg caps event after the GstStream has been set to svq3 so it will create a svq3 decodere to handle the caps event and the next buffer and it all fails. Replacing gst_stream_get_caps to use the event's caps makes it work reliably but I still need to understand a bit more of decodebin3 to decide if this is the right approach. Also, this makes using gst_stream_get_caps not useful for negotiation purposes.
Created attachment 355697 [details] [review] decodebin3-parse: do not overwrite GstStream caps after it is sent Otherwise decodebin3 might use an updated caps while it is still dealing with the previous format, leading to not-negotiated errors. To overcome this, GstStream is treated as immutable and when a new caps is received, the parser will make a copy of GstStream before updating it and pushing downstream. This ensures the events are properly serialized with the stream, avoiding the race condition on reading the caps from gststream in decodebin3.
See also https://bugzilla.gnome.org/show_bug.cgi?id=777572 about GstStream updating
According to https://bugzilla.gnome.org/show_bug.cgi?id=777572#c1, then decodebin3 should not use gst_stream_get_caps but should rely only on the GstCaps event for the format negotiations. At the same time, if demuxers are creating a new GstStream when something changes I feel that it is safe to use the GstStream caps for the same purposes of the GstCaps event. Isn't that so? (Feel free to move this discussion to the other bug if you feel it is more appropriate).
What should we do about this? Update decodebin3 to use Caps event instead of getting caps from stream event? Anything else left?
Is that enough? That seems required in any case.
Created attachment 358385 [details] [review] decodebin3: avoid using gst_stream_get_caps for negotiation purposes The information might not be reliable as multiple elements in the pipeline can modify it causing race conditions. Use the caps event that should be always correct.
Don't push this patch until the full picture is figured out.
Review of attachment 287289 [details] [review]: This has already been supported since 2016.
So I generally agree on the fundamental problem which is that whoever creates the collection and streams can updated them *separately* from what actual data flow is currently happening at the multiqueue level. Furthermore I'm working on adding more collection/stream emission in upstream elements (demuxers, adaptive demuxers, ...) for which this will be even more TRUE. ==> So yes, decodebin3's handling of caps should be done using the *actual* caps travelling through multiqueue. Regarding the "multiple stsd entries" in qtdemux, this corresponds to "stream variants". i.e. it's the same "stream" per-se (from a human audio/visual point of view) but the nature of it (codec, resolution,...) can change throughout time. It's a bit like adaptive streaming. The bitrate, resolution, framerate,... can change throughout time, but from a human perspective it's still the same "stream". (Note that DASH uses this multiple stsd-entry system extensively for exactly that purpose). I'm working on an API extension of GstStreamCollection to be able to specify such streams and variants. I'll keep this updated.
*** This bug has been marked as a duplicate of bug 607471 ***