GNOME Bugzilla – Bug 607471
decodebin3: Race with caps changing in stream
Last modified: 2018-11-03 14:40:46 UTC
While playing the following file, I do not see the video, only a still image: http://samples.mplayerhq.hu/archive/container/mov/mov+svq3+aac++animatrix_2_program_640-sample.mov gst-launch then quits with the following error: $ gst-launch playbin2 uri=file:///$PWD/mov+svq3+aac++animatrix_2_program_640-sample.mov Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstPulseSinkClock ERROR: from element /GstPlayBin2:playbin20/GstURIDecodeBin:uridecodebin0/GstDecodeBin2:decodebin20/GstJpegDec:jpegdec0: Failed to decode JPEG image Additional debug info: gstjpegdec.c(1261): gst_jpeg_dec_chain (): /GstPlayBin2:playbin20/GstURIDecodeBin:uridecodebin0/GstDecodeBin2:decodebin20/GstJpegDec:jpegdec0: Error #70: Unsupported marker type 0x%02x Execution ended after 3505994408 ns. Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... ---- FWIW, I noticed that the stream topology that gets posted looks like this: CONTAINER: video/quicktime AUDIO: audio/mpeg, mpegversion=(int)4, framed=(boolean)true, codec_data=(buffer)1210, rate=(int)44100, channels=(int)2 AUDIO: audio/x-raw-int, endianness=(int)1234, signed=(boolean)true, width=(int)16, depth=(int)16, rate=(int)44100, channels=(int)2 UNKNOWN: image/jpeg, width=(int)640, height=(int)272, framerate=(fraction)15/1 VIDEO: video/x-raw-yuv, format=(fourcc)I420, width=(int)640, height=(int)272, framerate=(fraction)15/1
The error is from the libjpeg library.
Could jpegparse help here (bug #583098)?
*** Bug 608495 has been marked as a duplicate of this bug. ***
Created attachment 152830 [details] [review] fix bogus error message. I have a fix for the nonsense error (fix in jpegdec). Then the error becomes. Error #69: Unsupported marker type 0xfb This is also what jpegpare figures :) 0xfb is one of the reserved jpeg markers. I've tried to strip them in jpegparse, but hit the end of the block right now.
Comment on attachment 152830 [details] [review] fix bogus error message. Ah, I've been wondering about that. Feel free to push this fix.
Comment on attachment 152830 [details] [review] fix bogus error message. commit a9f5bbe1ffbe5c09ecb7ecff478587ee0a09dfec Author: Stefan Kost <ensonic@users.sf.net> Date: Tue Feb 2 13:41:03 2010 +0200 jpeg: don't directly access message, some message have args This caused bogus messages, such as reported in bug #607471.
Could it be that we select the wrong stream here? FOUND TAG : found by element "qtdemux0". video codec: JPEG still images FOUND TAG : found by element "qtdemux0". audio codec: MPEG-4 AAC audio maximum bitrate: 128000 bitrate: 160000 mp4info crashes on the clip :/ If I play it with seek test app I get no streams to switch ./tests/examples/seek/seek 16 file:///home/ensonic/Videos/mov+svq3+aac++animatrix_2_program_640-sample.mov mplayer reports [mov,mp4,m4a,3gp,3g2,mj2 @ 0x862a680]multiple edit list entries, a/v desync might occur, patch welcome ID_VIDEO_ID=0 [lavf] Video stream found, -vid 0 ID_AUDIO_ID=1 [lavf] Audio stream found, -aid 1 ID_AUDIO_ID=2 [lavf] Audio stream found, -aid 2 VIDEO: [SVQ3] 640x272 24bpp 24.000 fps 0.0 kbps ( 0.0 kbyte/s)
The problem here is that this file contains 2 stsd entries (in the same track), the first one being jpg, the second is svq3. This is probably confusing qtdemux.
From the qtff spec I gathered the following info: There might be multiple stsd entries for one track, meaning that the track has data in multiple formats. "Sample-to-chunk" atom is used to identify which format apply to which buffers. For this file, the first and last buffers of this 'problematic' track are jpeg, the others are svq3.
Created attachment 153018 [details] [review] qtdemux: Avoid buffers from multiple stsd entries quicktime permits the use of multiple codecs on the same trak (multiple stsd entries), this patch makes qtdemux skip buffers that are not from the selected stsd entry. (It always selects the first one currently) Fixes #607471
Created attachment 153019 [details] [review] qtdemux: Refactor stsd parsing Use offsets based on the stsd entry start position and not on the stsd atom start position. Making the code reusable to parsing multiple stsd entries.
Created attachment 153020 [details] [review] qtdemux: Select the predominant stsd entry peeks on the first stsc entries and selects the stsd entry with the most occurences on that interval. Fixes #607471
Those patches fix the issue by picking the stsd entry that has most chunks. It doesn't play correctly yet because of the empty entries in the edit list that aren't handled yet.
Pushed some patches for the empty edit lists handling in #345830 They need some review and testing is always welcome. As it wasn't already enough, this file has yet another tricky issue: The two audio traks are complimentary, the smaller ones fills the starting 11secs that are empty (by the use of the edit lists) on the longer audio trak. This smaller trak is classified by qtdemux as a preview. Haven't really thought on how we can make this work. It surprises me that we haven't found a way (maybe qtff doesn't allow it) to identify what streams are previews or what should be played together.
What's the progress on this?
I think we never reached a conclusion on how to reach the above case.
(In reply to comment #16) > I think we never reached a conclusion on how to handle the above case.
This bug has an assigned developer but has not received activity in almost a year. Is the assigned person still working on this ?
I've never seen any other file with 2 stsd entries again ever since this bug. The main issue here is that each stsd has a different codec, and the audio stream has a single stsd. Is it relevant to fix this? Has someone seen files like this around?
FWIW, since I filed the bug, I've not seen any other file with this problem.
(In reply to comment #14) > Pushed some patches for the empty edit lists handling in #345830 > > They need some review and testing is always welcome. > > > As it wasn't already enough, this file has yet another tricky issue: > > The two audio traks are complimentary, the smaller ones fills the starting > 11secs that are empty (by the use of the edit lists) on the longer audio trak. > This smaller trak is classified by qtdemux as a preview. > > Haven't really thought on how we can make this work. It surprises me that we > haven't found a way (maybe qtff doesn't allow it) to identify what streams are > previews or what should be played together. there is a way to check that in Track Header 'tkhd' atom. following the qtff spec here : > https://developer.apple.com/library/mac/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-BBCEIDFA starting from byte 9 : > Flags > Three bytes that are reserved for the track header flags. These flags indicate how the track is used in the movie. The following flags are valid (all flags are enabled when set to 1). > Track enabled > Indicates that the track is enabled. Flag value is 0x0001. > Track in movie > Indicates that the track is used in the movie. Flag value is 0x0002. > Track in preview > Indicates that the track is used in the movie’s preview. Flag value is 0x0004. > Track in poster > Indicates that the track is used in the movie’s poster. Flag value is 0x0008. concerning the multiple stsd entries, I don't see where is the problem. If we store these descriptions in an array of struct with their predefined caps (in the 'qtdemux_parse_trak' function), and store in the QTDemuxSample struct the needed index as specified in Sample-to-Chunk atoms, shouldn't this do the trick ?
FWIW I have a branch here that uses all stsd entries and pushes a new caps when a different format is required: https://cgit.freedesktop.org/~thiagoss/gst-plugins-good/log/?h=qtdemux-stsd-all-entries The new decodebin3 will support these caps changes and it already (kind of*) works with the above patches applied. * There is a race to be fixed.
I might push this as it shouldn't interfere with the standard 1 format per stsd trak. This is required to support part of the DVB DASH profile.
What happens with the old decodebin/playbin?
Current: keeps using the first stsd entry and pushes svq3 to a jpeg decoder ERROR Failed to decode JPEG image for file:///home/thiagoss/media/mov+svq3+aac++animatrix_2_program_640-sample.mov With those patches: detects the change of format and pushes a new caps event that is rejected by the jpeg decoder. (Decodebin3 is able to react to this and plugs a new decoder) ERROR debug information: qtdemux.c(5650): gst_qtdemux_loop (): /GstPlayBin:playbin/GstURIDecodeBin:uridecodebin0/GstDecodeBin:decodebin0/GstQTDemux:qtdemux0: streaming stopped, reason not-negotiated
Didn't we fix those issues (multiple stsd entries) ?
The multiple stsd part is fixed but there is a race on handling the sequence of caps events on decodebin3. As the first stsd only has 1 frame there is something comparing caps from the second stsd with decoders from the first and it will fail with not-negotiated.
Remaining issue is indeed in decodebin3, renaming
Comment on attachment 153018 [details] [review] qtdemux: Avoid buffers from multiple stsd entries Fixed since 2016
Comment on attachment 153019 [details] [review] qtdemux: Refactor stsd parsing Fixed since 2016
Comment on attachment 153020 [details] [review] qtdemux: Select the predominant stsd entry Fixed since 2016
(Copying my latest comment from Bug #706774 which I'll mark as a duplicate of this one). So I generally agree on the fundamental problem which is that whoever creates the collection and streams can updated them *separately* from what actual data flow is currently happening at the multiqueue level. Furthermore I'm working on adding more collection/stream emission in upstream elements (demuxers, adaptive demuxers, ...) for which this will be even more TRUE. ==> So yes, decodebin3's handling of caps should be done using the *actual* caps travelling through multiqueue. Regarding the "multiple stsd entries" in qtdemux, this corresponds to "stream variants". i.e. it's the same "stream" per-se (from a human audio/visual point of view) but the nature of it (codec, resolution,...) can change throughout time. It's a bit like adaptive streaming. The bitrate, resolution, framerate,... can change throughout time, but from a human perspective it's still the same "stream". (Note that DASH uses this multiple stsd-entry system extensively for exactly that purpose). I'm working on an API extension of GstStreamCollection to be able to specify such streams and variants. I'll keep this updated.
*** Bug 706774 has been marked as a duplicate of this bug. ***
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/20.