GNOME Bugzilla – Bug 752603
qtdemux: Unable to play streaming MP4 (H264+AAC) file from VLC
Last modified: 2018-06-29 14:07:17 UTC
Hi, I'm streaming video from VLC to HTML5 <video>. Firefox/Windows, Chrome (Windows/Linux/Android) and IE11 all play the stream fine. VLC and MPlayer, too. But Firefox/Linux, which uses GStreamer, doesn't; it scans the first megabyte and then gives up. gst-play-1.0 also does not play it. I saved a small excerpt: http://storage.sesse.net/srf1.mp4 klump:~> LC_ALL=C gst-play-1.0 http://storage.sesse.net/srf1.mp4 Now playing http://storage.sesse.net/srf1.mp4 Prerolling... Redistribute latency... ERROR This file is invalid and cannot be played. for http://storage.sesse.net/srf1.mp4 ERROR debug information: qtdemux.c(4970): gst_qtdemux_process_adapter (): /GstPlayBin:playbin/GstURIDecodeBin:uridecodebin0/GstDecodeBin:decodebin0/GstQTDemux:qtdemux0: sample data crosses atom boundary Reached end of play list. The file is generated from VLC, with the ffmpeg mux and these settings: --sout '#transcode{vcodec=h264,vb=3000,acodec=mp4a,ab=256,channels=2,fps=50}:std{access=http{mime=video/mp4},mux=ffmpeg{mux=mp4},dst=:9094}' --sout-avformat-options '{movflags=empty_moov+frag_keyframe+default_base_moof}'
FWIW, it's the default_base_moof (which is needed for other clients) that does it. qtdemux simply doesn't understand the flag (0x20000), seems to get the offsets wrong and ends up feeding garbage into the decoders. PS: TF_DURATION_IS_EMPTY has two conflicting definitions in qtdemux_types.h and atoms.h: 0x100000 and 0x010000, respectively. I believe the latter is correct, although I also don't think it matters (it's only used in some debug output).
(In reply to Steinar H. Gunderson from comment #1) > FWIW, it's the default_base_moof (which is needed for other clients) that > does it. qtdemux simply doesn't understand the flag (0x20000), seems to get > the offsets wrong and ends up feeding garbage into the decoders. > > PS: TF_DURATION_IS_EMPTY has two conflicting definitions in qtdemux_types.h > and atoms.h: 0x100000 and 0x010000, respectively. I believe the latter is > correct, although I also don't think it matters (it's only used in some > debug output). Do you mind refering to the spec and proposing a fix. We do care about these debug log, as they are the way to analyze (the dumps) the stream.
I have no idea about the spec, sorry. All I'm doing is looking at what the ffmpeg encoder (movenc.c) does. It seems that in qtdemux_parse_tfhd, if TF_BASE_DATA_OFFSET is not set (and possibly if said 0x20000 flag is set?), then *base_offset should not just be left alone at -1; it should be set to qtdemux->moof_offset. If I do this, the files seem to play just fine from disk, although still not from the network.
Looking at the network play (with the example file I gave): There's an mdat atom at offset 2646. next_entry_size() goes through and finds that the first sample is from one of the audio streams, at offset 532 (size 668): 0:00:00.371909880 23451 0x7f6b5c03d6d0 LOG qtdemux qtdemux.c:4493:next_entry_size:<qtdemux0> Finding entry at offset 2646 0:00:00.371913285 23451 0x7f6b5c03d6d0 LOG qtdemux qtdemux.c:6358:qtdemux_parse_samples:<qtdemux0> parsing samples for stream fourcc mp4a, pad audio_0 0:00:00.371917434 23451 0x7f6b5c03d6d0 DEBUG qtdemux qtdemux.c:6369:qtdemux_parse_samples:<qtdemux0> parsing up to sample 0 0:00:00.371920466 23451 0x7f6b5c03d6d0 LOG qtdemux qtdemux.c:4519:next_entry_size:<qtdemux0> Checking Stream 0 (sample_index:0 / offset:532 / size:668) 0:00:00.371924603 23451 0x7f6b5c03d6d0 LOG qtdemux qtdemux.c:6358:qtdemux_parse_samples:<qtdemux0> parsing samples for stream fourcc mp4a, pad audio_1 0:00:00.371928504 23451 0x7f6b5c03d6d0 DEBUG qtdemux qtdemux.c:6369:qtdemux_parse_samples:<qtdemux0> parsing up to sample 0 0:00:00.371931521 23451 0x7f6b5c03d6d0 LOG qtdemux qtdemux.c:4519:next_entry_size:<qtdemux0> Checking Stream 1 (sample_index:0 / offset:6613 / size:655) 0:00:00.371935517 23451 0x7f6b5c03d6d0 LOG qtdemux qtdemux.c:6358:qtdemux_parse_samples:<qtdemux0> parsing samples for stream fourcc mp4a, pad audio_2 0:00:00.371939507 23451 0x7f6b5c03d6d0 DEBUG qtdemux qtdemux.c:6369:qtdemux_parse_samples:<qtdemux0> parsing up to sample 0 0:00:00.371942272 23451 0x7f6b5c03d6d0 LOG qtdemux qtdemux.c:4519:next_entry_size:<qtdemux0> Checking Stream 2 (sample_index:0 / offset:13256 / size:614) 0:00:00.371947621 23451 0x7f6b5c03d6d0 LOG qtdemux qtdemux.c:6358:qtdemux_parse_samples:<qtdemux0> parsing samples for stream fourcc avc1, pad video_0 0:00:00.371951432 23451 0x7f6b5c03d6d0 DEBUG qtdemux qtdemux.c:6369:qtdemux_parse_samples:<qtdemux0> parsing up to sample 0 0:00:00.371954249 23451 0x7f6b5c03d6d0 LOG qtdemux qtdemux.c:4519:next_entry_size:<qtdemux0> Checking Stream 3 (sample_index:0 / offset:19176 / size:37203) 0:00:00.371957801 23451 0x7f6b5c03d6d0 LOG qtdemux qtdemux.c:4528:next_entry_size:<qtdemux0> stream 0 offset 532 demux->offset :2646 But since that's behind the current demux offset, it just gives up: 0:00:00.371961429 23451 0x7f6b5c03d6d0 DEBUG qtdemux qtdemux.c:4542:next_entry_size:<qtdemux0> There wasn't any entry at offset 2646 The first time next_entry_size() returns -1, it's a soft error, but then rapidly it just gives up parsing the stream. If I hack so that it simply skips samples before the demux offset, it goes through to parse the entire mdat atom (in Firefox, I even get a brief moment of audio), and then segfaults as it gets some corruption on trying to find the next one.
The confusion as of where the mdat atom ends seems to be about (in gst_qtdemux_process_adapter()) if (G_LIKELY (demux->todrop < demux->mdatleft)) { which should be a <= instead, I'd guess? And then some fault in skipping over the excess data that I don't understand. (It's hard to follow the interactions of todrop, neededbytes and mdatneeded when there's not a single comment explaining their use.)
OK, disregard my last comment; it's probably wrong. But there really seems to be something wrong in recovering the offset after the mdat chunk is done parsing; it tries to parse from offset 123904 (which is the length of the mdat chunk plus 24) instead of 126026 (which is where the mdat chunk actually ends; it starts at offset 2646).
Again, what does the SPEC says ?
I don't know. Why are you asking me what the spec says? I've never seen the spec in my life, surely you must be in a much bryter position to answer that question. I am merely trying to help debug why your code cannot stream (or even demux) the file in question. If this is not appreciated, let me know and I will stay silent.
It's much appreciated :)
I finally found the right standard. It is called ISO/IEC 14496-12:2012, and you can get it from: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html Page 56 (68 in the PDF) says: 0x020000 default-base-is-moof: if base-data-offset-present is zero, this indicates that the base-data-offset for this track fragment is the position of the first byte of the enclosing Movie Fragment Box. Support for the default-base-is-moof flag is required under the ‘iso5’ brand, and it shall not be used in brands or compatible brands earlier than iso5. NOTE The use of the default-base-is-moof flag breaks the compatibility to earlier brands of the file format, because itsets the anchor point for offset calculation differently than earlier. Therefore, the default-base-is-moof flag cannot be set when earlier brands are included in the File Type box. The same page also confirms that 0x10000 (not 0x100000) is the right value for the TF_DURATION_IS_EMPTY flag.
OK, so a few observations: - The issue with sample offsets being behind the demux offsets is because samples are supposed to be relative to moof_offset (the standard says they are relative to the “file” and not any given chunk, but in a fragmented stream, “file” really is defined by the moof box). However, moof_offset isn't always properly set; it is set if moof is encountered in gst_qtdemux_loop_state_header(), but not in gst_qtdemux_process_adapter(). - In the case marked “data atom emptied; resuming atom scan” it seems it just forgets to actually drop the excess data. demux->todrop and demux->neededbytes is reduced, but demux->offset isn't increased and gst_adapter_flush isn't called. This together with the previously mentioned TF_BASE_DATA_OFFSET support makes gst-play-1.0 actually play my live streams from HTTP, although it still doesn't fix the Firefox case.
Ah, with Iceweasel (that actually uses gstreamer 1.0 and not 0.10, unlike upstream Firefox builds), it works great. Woo!
Created attachment 308163 [details] [review] qtdemux: handle default-base-is-moof flag I think this is what the reporter was meaning. It plays in pull mode but seems to wait 50s before starting to play due to some segment/timestamp issue. Also still doesn't play in push mode.
Created attachment 308174 [details] [review] qtdemux: store the moof-offset also for push mode This fixes it for push-mode. Now only left with the timestamping issue
Looks like sensible change to me.
I'll be posting my own patch shortly; just some red tape to get through first. Probably tomorrow.
(In reply to Steinar H. Gunderson from comment #16) > I'll be posting my own patch shortly; just some red tape to get through > first. Probably tomorrow. What's wrong with Thiago's patches ?
They're correct, but incomplete.
Created attachment 308202 [details] [review] qtdemux: fix playback of certain fragmented MP4 files
Here's mine. With this, I can stream such files from HTTP, including from my own live server. There's one functional difference from Thiago's patches, and that is that when an mdat chunk is done, it actually drops the data instead of just thinking it dropped it (without this, I end up reading junk after the first mdat chunk is done).
Review of attachment 308202 [details] [review]: ::: hei/gst-plugins-good1.0-1.4.5/./gst/isomp4/qtdemux.c @@ +4819,3 @@ + if (!demux->moof_offset) { + demux->moof_offset = demux->offset; + } This will only get the offset for the first moof. If you have a sequence of moof/mdat in your stream it will likely fail fetching the right offsets.
Well, moof_offset is set to 0 after the fragment has been processed, as far as I can see? The other (existing) code that sets moof_offset has the same check, which is why I copied it. If it's correct or not is hard for me to say, but I've tested this code with long files with lots of moofs, so it does work in practice.
Can you share such files? Or perhaps instructions on how to create those? Do you plan on addressing the timestamping issue still unsolved for the file you attached?
I can't see any timestamping issue; it only buffers for 5–6 seconds for me (and much less in Firefox). As for the flags needed to share the files, I've already given the flags required in the very first message in the post. But to save you some time, here's one with (as far as I can see) 15 moof boxes: http://home.samfundet.no/~sesse/srf1-2.mp4
Hi, Is there anything missing for this patch to be merged?
There is still a timestamping issue left that is likely not reproducible with firefox but is with gst-play-1.0. Also your patch doesn't apply on git master because the extra fix not covered in my two patches was already fixed upstream. Maybe we should just merge those 2 patches and keep the bug open until someone fixes the timestamping issue, this is the most tricky part and might not be safe to do it so close to the 1.6 release.
OK, I really cannot reproduce any buffering issue in pull mode with gst-play-1.0. Maybe there's something different with my setup that happens to be random. I think you should merge your two patches, because they are correct and should cause no harm. If the moof_offset fix is already in, everything should at least be fine for Firefox.
Merged the 2 patches, only missing the timestamp fixes now. commit e0878d632591f42f2414204de387273120bc5da6 Author: Thiago Santos <thiagoss@osg.samsung.com> Date: Sun Jul 26 12:07:56 2015 -0300 qtdemux: store the moof-offset also for push mode It will be used in some cases for getting the correct offsets from trun atoms. https://bugzilla.gnome.org/show_bug.cgi?id=752603 commit bb336840c0b0b02fa18dc4437ce0ded3d9142801 Author: Thiago Santos <thiagoss@osg.samsung.com> Date: Sun Jul 26 02:09:24 2015 -0300 qtdemux: handle default-base-is-moof flag Handle the flag from the tfhd that signals the base offset to start from the moof atom https://bugzilla.gnome.org/show_bug.cgi?id=752603
These patches break DASH playback in very interesting ways: (gst-play-1.0:2661): GStreamer-CRITICAL **: pullrange on pad qtdemux1:sink but it was not activated in pull mode
Created attachment 309327 [details] [review] qtdemux: only look for more samples in moofs in pull-mode For playback of some fragmented formats with qtdemux it will try to look for the next moof after finishing one but it is only possible for pull-mode. For playback of streaming fragmented formats such as DASH it should just not try to look for another moof but instead wait for more data. https://bugzilla.gnome.org/show_bug.cgi?id=752602
Comment on attachment 309327 [details] [review] qtdemux: only look for more samples in moofs in pull-mode Merged as commit 41a4b683900e1b81f804848c40f8d920a0478507
These patches also broke the DASH CENC support in qtdemux. As these went onto master before the CENC support, I have added a fix to the CENC ticket (https://bugzilla.gnome.org/show_bug.cgi?id=705991)
Thiago, is there anything left we should fix here before 1.5.90 or 1.6.0?
(In reply to Sebastian Dröge (slomo) from comment #33) > Thiago, is there anything left we should fix here before 1.5.90 or 1.6.0? Don't think so, the remaining changes are more delicate to handle these chained mp4 fragments. Let's do it after 1.6.
Created attachment 326816 [details] [review] qtdemux: rework segment event pushing Instead of always keeping a safe segment (start=0) event from the beginning, delay the creation of this event to when we really know the timestamp of the first sample. This is important to properly start fragmented streams that we might join in the middle or to play isolated fragment files that might have an advanced tfdt. https://bugzilla.gnome.org/show_bug.cgi?id=752063
Created attachment 326817 [details] [review] qtdemux: offset edts segments by the min timestamp of the stream Otherwise if the stream is starting at timestamp=X it would wait 'X' to start playing. https://bugzilla.gnome.org/show_bug.cgi?id=752063
You've quoted the wrong bug number in both of those patches.
Created attachment 326858 [details] [review] qtdemux: rework segment event pushing Instead of always keeping a safe segment (start=0) event from the beginning, delay the creation of this event to when we really know the timestamp of the first sample. This is important to properly start fragmented streams that we might join in the middle or to play isolated fragment files that might have an advanced tfdt.
Created attachment 326859 [details] [review] qtdemux: offset edts segments by the min timestamp of the stream Otherwise if the stream is starting at timestamp=X it would wait 'X' to start playing.
Review of attachment 326858 [details] [review]: ::: gst/isomp4/qtdemux.c @@ +5890,3 @@ + QtDemuxStream *stream = demux->streams[n]; + if (stream->n_samples) + res = MIN (QTSAMPLE_PTS (stream, &stream->samples[0]), res); Is it safe to assume that the first sample has the smallest PTS? What about a MOOF that starts (in display order) with a B frame? For example BBIBBP (display order) will appear in the stream as IBBPBB (decode order) with the PTS of the I greater than the PTS of the B. This is not an issue for DASH: Each Media Segment shall contain one or more whole self-contained movie fragments. A whole, self-contained movie fragment is a movie fragment (‘moof’) box and a media data (‘mdat’) box that contains all the media samples that do not use external data references referenced by the track runs in the movie fragment box
Good points. Is there any common heuristic on how many samples to read to get the minimum timestamp or do we need to look over all samples? I should also test this with streams that have an initial gap at the beginning.
Thiagoss, take the minimum of the first sample DTS or PTS ? That should solve it. Would be great to get those patches updated.
Created attachment 372102 [details] [review] qtdemux: rework segment event pushing Instead of always keeping a safe segment (start=0) event from the beginning, delay the creation of this event to when we really know the timestamp of the first sample. This is important to properly start fragmented streams that we might join in the middle or to play isolated fragment files that might have an advanced tfdt.
Created attachment 372103 [details] [review] qtdemux: offset edts segments by the min timestamp of the stream Otherwise if the stream is starting at timestamp=X it would wait 'X' to start playing.
Created attachment 372104 [details] [review] qtdemux: do not update segment.stop is it is not a valid time Otherwise it overflows and starts having a meaningful and wrong value.
Patches updated. Validate shows no regression and DASH and local (push and pull) modes playback and seeking working as usual. Any other tests we should run or shall we merge?
lgtm
Break unit tests actually. qtdemux.test_qtdemux_duplicated_moov qtdemux.test_qtdemux_stream_change Marking as blocker
This should have fixed it. commit ae7b531902bded9f58bff8afa41508663b02b10e Author: Thiago Santos <thiagossantos@gmail.com> Date: Mon May 28 11:01:42 2018 -0700 qtdemux: mark segment as sent after pushing when moov is received Otherwise we would try to send it a second time if the same moov is received or in any other situation that might trigger segment sending. https://bugzilla.gnome.org/show_bug.cgi?id=752603
Review of attachment 372102 [details] [review]: ::: gst/isomp4/qtdemux.c @@ +1057,3 @@ + gst_event_set_seqnum (newsegment, qtdemux->segment_seqnum); + qtdemux->need_segment = FALSE; + This is sending the movie segment as track segments, completely disregarding edit lists.
Should we just remove the idea of having a movie segment and always rely on stream segments? The idea of movie segments was already there, in some places we sent the movie, in others the stream segment which is not ideal. I guess the whole back in forth on this is people working on different use cases with different sets of tests. Shall we just unite all the tests first so we avoid breaking other use cases?
(In reply to Thiago Sousa Santos from comment #51) > Should we just remove the idea of having a movie segment and always rely on > stream segments? The idea of movie segments was already there, in some > places we sent the movie, in others the stream segment which is not ideal. No, we should not; at least while we continue to use GstSegment for handling edit lists (which is the least bad solution I know until we get negative timestamps in GstBuffer, which will not happen in gst1). This is a very long response, but I hope it serves to clarify the confusion around segments and edit lists in qtdemux. The movie segment and track segments serve different purposes in the current design of qtdemux and should not be mixed carelessly. The movie segment (the one in qtdemux->segment) specifies the stream time range the user wants to play. It's modified when the user seeks. It affects all tracks. For correct edit list handling this segment SHOULD NOT be emitted downstream directly. Note this segment does not tell us how to map buffer time to stream time (it can't, as the mapping is usually different for each track). It just serves to express the wish of the user to play a certain stream time range, at a certain playback rate. Buffer time in qtdemux does not account for edit lists. It comes almost straight out from the tables found in MP4, converted to nanoseconds. CTS becomes GstBuffer.pts and (media) DTS becomes GstBuffer.dts. These timestamps are referred collectively in the spec as "media time". As surprising as it may sound, the frames in a MP4 track are not intended to be decoded or played directly in the order they are declared in the frame tables. Instead, a MP4 track is constructed with one or more edits (also called "segments", but that word is extremely overloaded, so I'll avoid it in this context) that are defined in a edit list (elst.edts box) inside the trak atom of the moov. Each edit references a range of "media time" that ought to be played. Edits are played in the order they are defined in the edit list. Take a look at the following illustrative edit list. In practice edit lists with several media edits like this are pretty rare -- usually edit lists are used just to set a small offset between media time and stream time, but looking at the complex case helps to understand their design and the mechanisms involved. duration | media_time | rate ----------------------------- 5 s | 30 s | 1.0 3 s | 10 s | 1.0 5 s | 30 s | 1.0 Assuming that every frame is 1 second long, this is the succession of frames that should be played for this file, in media time PTS: 30 31 32 33 34 10 11 12 30 31 32 33 34 Edit lists allow to skip frames that are coded in the file, play them several times, play portions of them in an order different to the one they are coded and even play them at different speeds. This dance of frames inside the tracks should be transparent for the user, who should see in their player the following (stream time) timestamps in the progress bar when playing the movie: 00 01 02 03 04 05 06 07 08 09 10 11 12 The reported duration of the track should be 13 seconds (the result from summing the duration of the edits), regardless of the duration of the frames that are actually coded in the mdat. (Note this duration may be bigger or smaller than the duration of the coded frames, since edits can be repeated and not all coded frames need to be included in an edit.) In qtdemux edit lists are accounted for by emitting a GstSegment for every edit. The previous file -- in absence of qtdemux bugs -- would output the following sequence of segments and buffers when played from the beginning: GstSegment: time=0 base=0 start=30 stop=35 Buffer: pts=30 Buffer: pts=31 Buffer: pts=32 Buffer: pts=33 Buffer: pts=34 GstSegment: time=5 base=5 start=10 stop=13 Buffer: pts=10 Buffer: pts=11 Buffer: pts=12 GstSegment: time=8 base=8 start=30 stop=35 Buffer: pts=30 Buffer: pts=31 Buffer: pts=32 Buffer: pts=33 Buffer: pts=34 For each buffer we can run the following formulas to translate buffer PTS or buffer DTS from the "media timeline" to stream time (timestamps that should be displayed in the application) and running time (timestamps that run since playback started and are used for synchronization). stream_time = (B.timestamp - S.start) * ABS (S.applied_rate) + S.time running_time = (B.timestamp - (S.start + S.offset)) / ABS (S.rate) + S.base In this case, since the file was played from the beginning without pauses stream_time equals running_time, but that's not the case when seeking. The demuxer should emit appropriate segments for each track in order to output the correct frames with the correct stream_time and running_time. Let's look at an example of seeking in the file with the edit list shown before. It's important to note that the seek request is handled by the whole demuxer, not by its individual tracks; therefore the timestamps refer to movie time (stream time of the tracks), not buffer time of any track. seek GstSegment: start=6 stop=11 The expected sequence of segments and buffers that qtdemux should emit in response to the seek is this: GstSegment: time=6 base=0 start=11 stop=13 Buffer: pts=11 (stream time PTS=6, running time PTS=0) Buffer: pts=12 (stream time PTS=7, running time PTS=1) GstSegment: time=8 base=2 start=30 stop=32 Buffer: pts=30 (stream time PTS=8, running time PTS=2) Buffer: pts=31 (stream time PTS=9, running time PTS=3) Buffer: pts=32 (stream time PTS=10, running time PTS=4) (This assumes all frames are sync frames for simplicity... additional frames need to be prepended to the first buffer if it's not a sync frame.) At any point in time, qtdemux->segment should contain the current seek segment, whilst for each track stream->segment contains the current segment used to map the frames in the current edit from buffer time to stream time and running time. Therefore stream->segment should change and a new segment event should be emitted downstream not only when the user performs a seek, but also when a edit is finished but there is another one following. ## What are edit lists used for, really? In reality, edit lists are rarely used for such complex editing tasks like in the example before. Correct edit list handling adds a lot of complexity to demuxers in exchange for obscure features that are rarely used. Despite this, there is a small use case when edit list were deemed fit and used by many, if not most, MP4 files: having correct timestamps when using B-frames. The problem goes like this: imagine you have a video track with three frames like this: . +---+ +---+ +---+ . | I |-->| B |<--| P | . +---+ +---+ +---+ PTS 0 1 2 In the example above, B has a dependency on a frame that is after it in presentation order. Therefore, we need to encode the frames in a different order so that the decoder is able to decode the frames successfully: IPB. This reorder is accomplished in MP4 with the DTS timestamps, but there are several restriction we need to take into account: a) The DTS of the first frame is always zero. This is due to the way frames are coded in MP4 tables (DTS is computed by summing the duration of the previous frames in the table). b) For every frame, PTS >= DTS. You can't show a frame before it has been decoded, after all. Indeed, in most files PTS is coded as an unsigned offset from DTS in MP4 tables. Newer versions of MP4 allow to use a signed offset and therefore break this rule, but that is only intended for coding non-displaying frames (where a negative PTS is used to signal they should never be displayed) and in other cases it must be corrected with a cslg box -- which will be explained later. With these restrictions, this is the best we can get -- at least without mangling with the frame durations: . +---+ +---+ +---+ . | I |-->| B |<--| P | . +---+ +---+ +---+ PTS 1 2 3 DTS 0 2 1 Note that we no longer have any frame at PTS=0... we had to shift all the PTS to satisfy coding dependencies. Most of the time the user will not notice any difference (it's just one frame duration), but that makes it an imperfect solution; we would expect movies to play the first video frame at 0:00, not at 0:01. The most commonly supported way to fix this issue is with a simple edit list, like this: duration | media_time | rate ----------------------------- 3 s | 1 s | 1.0 This is the most common kind of edit list, referred unofficially as "basic edit list". Once applied, stream time PTS=0 (as shown in the player UI) corresponds to the first frame (who has buffer PTS=1). ## Empty edits It's also possible to specify that in a certain stream time range there will be no frames. This is called an empty edit, and it's coded by specifying media_time = -1. The most common use of empty edits is to offset a track in such a way that stream time is greater (not lesser) than unedited media time. This is useful for instance to fix A/V synchronization, especially when the audio and video tracks don't have exactly the same length. As an example, imagine we have initially an audio track like this: duration | media_time | rate ----------------------------- 100 s | 0 s | 1.0 If we wanted to discard the first 2 seconds of the audio track we would just edit media_time like in the previous example: duration | media_time | rate ----------------------------- 98 s | 2 s | 1.0 On the other hand, if we want to displace the audio 2 seconds ahead we need to insert an empty edit. The first 2 seconds of the movie will have no sound, then the audio track will start playing from the first frame. The resulting track is 102 seconds long. duration | media_time | rate ----------------------------- 2 s | -1 | 1.0 100 s | 0 s | 1.0 Currently qtdemux emits GstSegments for empty edits too. This should be unecessary, as empty edits, by definition, contain no frames. If you look at them, beware of their values: For empty edits, start and stop refer to stream time, not buffer time (after all... there is no buffer time to map in this case since there are no frames). ## Edit list support In qtdemux full edit list support is targeted only in pull mode, since that scheduling mode is much more appropriate for the kind of dance performed by advanced edit lists. Support for advanced edit lists (those with more than one non-empty edit) is quite limited in most players. Basic edit lists have much better support. Commonly used muxers use them to make sure the first video frame starts at stream time PTS=0 as explained before. qtdemux allows only basic edit lists in push mode. ## Bonus section: About cslg_shift The iso4 brand (introduced in ISO 14496-12:2012) adds a new box (cslg: composition to decode timeline mapping) that introduces an alternative way to solve the same PTS=0 issue. It's rarely used though, as basic edit lists have better player support, but you may still see cslg_shift in many places in qtdemux, so it's a good idea to know how it works. Starting at the iso4 brand, you can use ctts version 1 which allows to use signed composition offsets, therefore allowing to code a PTS < DTS. Therefore you can code the following in the MP4 sample tables: . +---+ +---+ +---+ . | I |-->| B |<--| P | . +---+ +---+ +---+ PTS 0 1 2 DTS 0 2 1 But there is a catch: The coded DTS is used as buffer DTS in gstreamer, just as before; the PTS on the other hand needs to be adjusted so that buffer PTS >= buffer DTS. This is done by adding cslg.compositionToDTSShift ("cslg_shift" in qtdemux) to every PTS value. /* timestamp is the DTS */ #define QTSAMPLE_DTS(stream,sample) (QTSTREAMTIME_TO_GSTTIME((stream), (sample)->timestamp)) /* timestamp + offset + cslg_shift is the outgoing PTS */ #define QTSAMPLE_PTS(stream,sample) (QTSTREAMTIME_TO_GSTTIME((stream), (sample)->timestamp + (stream)->cslg_shift + (sample)->pts_offset)) In this example cslg_shift needs to be 1 second, as that's the lowest value we need to increment PTS so that the PTS >= DTS rule becomes true again. But of course, we still want the first frame to be presented at running time and stream time zero. Therefore, we need to take cslg_shift into account in our GstSegment computation: GstSegment.start and GstSegment.stop -- that is, the fields that refer to buffer time -- must have `cslg_shift` added. GstSegment: time=0 base=0 start=1 stop=4 cslg and edit lists are not exclusive to each other. The media_time specified in edit lists refers to PTS as it's coded in the file. For instance, this example could have the following super simple edit list that performs no adjustment of PTS, since that is already taken care of by cslg: duration | media_time | rate ----------------------------- 3 s | 0 s | 1.0 ## qtdemux in MSE vs qtdemux in a full player This is mostly unrelated to the previous points, but still something that has to be taken into account when working with segments in a demuxer. A typical player (simplified) looks like this: src -> queue -> qtdemux -> decoder -> sink In this setup, qtdemux is responsible not only for demuxing, but also for handling seeks. That makes sense usually because it's the element that has the frame tables and therefore knows where to look for the frames in the file. In DASH this stops being true. In this case often separated fragmented MP4 files are used, so qtdemux itself cannot know where to locate a frame outside the current frame. Instead, dashdemux or another similar element that knows how to locate those files (e.g. by parsing an MPD file) handles those seek requests by downloading a different file and feeding it to qtdemux, who will only be responsible for the seek inside the fragment. In MSE (Media Source Extensions) the separation of responsibilities is even bigger. There is no dashdemux or anything like that. Parsing the MPD file (or any other similar manifest) and downloading fragments is done entirely by a JavaScript application. This JS application sends those fragments to a demuxer (qtdemux or matroskademux) using a the SourceBuffer API. The demuxer is responsible for demuxing the received fragments into frames (GstSample objects) with correct stream time timestamps. Those frames end up stored in a series of data structures that are read later by the media player -- which is handled in a completely separate GStreamer pipeline. The demuxer is not involved in playback and has no role in handling seeks at all. The running time generated by qtdemux in this set-up is meaningless. AppendPipeline (demuxer): JavaScript source -> qtdemux -> browser PlaybackPipeline (MediaPlayer): browser internal APIs -> decoder -> sink The API used to demux frames in MSE is very slim. There is `appendBuffer()` which feeds some bytes and `abort()` (which flushes the current fragment). That's all. The demuxer is responsible for demuxing whatever fragments it receives into correct frames. The JS application may feed the fragments in any order as long as a moov has been parsed first (e.g. a fragment spanning [10, 20) seconds may be demuxed before a fragment spanning [0, 10) s if the former finished downloading faster). This is something that is unlikely to happen in non-MSE playback but should be contemplated for this kind of applications. Since gst_segment_to_stream_time_full() can handle positions outside [start, stop), emitting a GstSegment that starts at a non-zero position can be fine as long as frames received later with a PTS < start are emitted too and stream time is correct.
So, investigating a bit more the recent regressions in dash validate tests, this particular patch seems to be breaking reverse playback on dash streams. Applying the patch from Bug #796480 leads to the failure described here: https://bugzilla.gnome.org/show_bug.cgi?id=796480#c3 and this failure already exists when applying only the segment rework patch from this bug, and reverting the patch that broke dash test cases all together (ie. from Bug #793058). You can checkout my branch with that exact scenario: https://github.com/thiblahute/gst-plugins-good/tree/qtdemux_dash_reverse_failure . With that branch many test case from the `adaptivedemux` testsuite are failing because we use `gst_segment_to_running_time_full` on a segment with rate < 0 and no stop time (see comment from #796480) and it exposes the segment rework patch broke that use case. Just running ` gst-validate-launcher adaptive.dash.playback.full_live_rewind.DASHIF_livestream_testpic_2s` allows to see the issue. Reopenning and keeping as blocker.
Created attachment 372626 [details] [review] qtdemux: rework segment event pushing, again This patch aims at fixing the recent regressions in the adaptive test suite. All segment pushing in push mode is now done with gst_qtdemux_check_send_pending_segment(), which is idempotent and handles both edit lists cases and cases where the upstream TIME segments have to be sent directly. Fragmented files that start with a non-zero tfdt are also taken into account, but their handling has been vastly simplified: now they are handled as implicit default seeks so there is no need to extend the GstSegment formulas as was being done before. qtdemux->segment.duration is no longer modified when upstream_format_is_time, respecting in this way the durations provided by dashdemux and fixing bugs in reverse playback tests where mangled durations appeared in the emitted segments.
Review of attachment 372626 [details] [review]: ::: gst/isomp4/qtdemux.c @@ -6600,3 @@ -static GstClockTime -gst_qtdemux_streams_get_first_sample_ts (GstQTDemux * demux) Note this function was already returing the minimum tfdt, since QTSAMPLE_DTS <= QTSAMPLE_PTS and by definition, the tfdt is the DTS of the first sample in a track fragment. @@ +9531,3 @@ " extends to %" GST_TIME_FORMAT + " past the end of the declared movie duration %" GST_TIME_FORMAT + " movie segment will be extended", segment_number, Note the message has been changed but the behavior was already in place before. The log message was stating something different than what was actually being done. And indeed, this behavior makes more sense: the duration specified in the edit list is reliable (it has to be, otherwise the track would be cut), whereas other fields such as mvhd.duration and tkhd.duration are supposed to be computed in the same way by the muxer, but their values are merely informative (the spec does not demand using any of these for clipping the movie). Also there is no point in trusting unknown muxers to write correct values there when we can compute them by ourselves. @@ -13787,3 @@ - /* set duration in the segment info */ - gst_qtdemux_get_duration (qtdemux, &duration); - if (duration) { Note: In this if `duration` was always true. It's impossible for gst_qtdemux_get_duration() to return zero; when there is no known duration it returns GST_CLOCK_TIME_NONE.
The attached patch passes `meson tests`, `gst-validate` and `gst-validate adaptive`, putting aside some unrelated, preexisting/flaky timeouts. 314/359 gst-plugins-bad / elements_mxfmux FAIL 29.45 s validate.dash.playback.change_state_intensive.dash_exMPD_BIP_TC1: Timeout 'Application timed out: 30.0 secs' validate.rtsp2.playback.change_state_intensive.tron_en_ge_aac_h264_ts: Timeout 'Application timed out: 30.0 secs' adaptive.dash.playback.full_live_rewind.DASHIF_livestream_testpic_2s: Timeout 'Application timed out: 30.0 secs' adaptive.dash.playback.seek_end_live.DASHLIVE_manifest_e_mp4a_300: Timeout 'Application timed out: 30.0 secs' adaptive.dash.playback.seek_end_live.DASHIF_livestream_testpic_2s: Timeout 'Application timed out: 30.0 secs'
qtdemux.c:1759:7: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] GstClockTime ts = gst_util_get_timestamp (); ^~~~~~~~~~~~ cc1: all warnings being treated as errors
(In reply to Edward Hervey from comment #57) > qtdemux.c:1759:7: error: ISO C90 forbids mixed declarations and code > [-Werror=declaration-after-statement] > GstClockTime ts = gst_util_get_timestamp (); > ^~~~~~~~~~~~ > cc1: all warnings being treated as errors I didn't introduce that code and it felt wrong to fix it in this unrelated patch.
Review of attachment 372626 [details] [review]: ::: gst/isomp4/qtdemux.c @@ +1754,3 @@ case GST_EVENT_SEEK: { + qtdemux->received_seek = TRUE; Yes you did introduce this line
(In reply to Alicia Boya García from comment #56) > adaptive.dash.playback.full_live_rewind.DASHIF_livestream_testpic_2s: > Timeout 'Application timed out: 30.0 secs' > adaptive.dash.playback.seek_end_live.DASHLIVE_manifest_e_mp4a_300: Timeout > 'Application timed out: 30.0 secs' > adaptive.dash.playback.seek_end_live.DASHIF_livestream_testpic_2s: Timeout > 'Application timed out: 30.0 secs' Those worked totally fine until the following commits landed: https://ci.gstreamer.net/view/testing/job/GStreamer-master-validate-adaptive/340/
(In reply to Edward Hervey from comment #59) > Review of attachment 372626 [details] [review] [review]: > > ::: gst/isomp4/qtdemux.c > @@ +1754,3 @@ > case GST_EVENT_SEEK: > { > + qtdemux->received_seek = TRUE; > > Yes you did introduce this line So I should have done the code style fix in unrelated code just because it happened to be near?
Review of attachment 372626 [details] [review]: ::: gst/isomp4/qtdemux.c @@ +1758,1 @@ #ifndef GST_DISABLE_GST_DEBUG These variable declaration were at the right place (at the beginning of the block). You added a line before it which causes the error. The line you added can go below the variable declaration without any problem. Also please reply in the reviewing tool itself so that we can follow the flow.
Review of attachment 372626 [details] [review]: ::: gst/isomp4/qtdemux.c @@ +1758,1 @@ #ifndef GST_DISABLE_GST_DEBUG OK, I see now, will update the patch.
(In reply to Edward Hervey from comment #60) > (In reply to Alicia Boya García from comment #56) > > adaptive.dash.playback.full_live_rewind.DASHIF_livestream_testpic_2s: > > Timeout 'Application timed out: 30.0 secs' > > adaptive.dash.playback.seek_end_live.DASHLIVE_manifest_e_mp4a_300: Timeout > > 'Application timed out: 30.0 secs' > > adaptive.dash.playback.seek_end_live.DASHIF_livestream_testpic_2s: Timeout > > 'Application timed out: 30.0 secs' > > Those worked totally fine until the following commits landed: > > https://ci.gstreamer.net/view/testing/job/GStreamer-master-validate-adaptive/ > 340/ I'm not familiar with GStreamer's jenkins so I don't know how to interpret that log. What is "340"? Do all commits get a test run or only some of them and therefore a test run tests an arbitrary range of commits? How are flakes handled? Is that output some kind of regression tracking showing that the timeouts were introduced in "qtdemux: Properly handle edit list in push mode"? How does that work?
(In reply to Alicia Boya García from comment #64) > (In reply to Edward Hervey from comment #60) > > https://ci.gstreamer.net/view/testing/job/GStreamer-master-validate-adaptive/ > > 340/ > > I'm not familiar with GStreamer's jenkins so I don't know how to interpret > that log. What is "340"? It's just a run number. > Do all commits get a test run or only some of them > and therefore a test run tests an arbitrary range of commits? since this is executed on an actual DASH server, we don't want to overhit it. So it's only executed every 24 hours. The list of new commits (compared to the previous run) are mentioned on that page. > > How are flakes handled? flakes ? > > Is that output some kind of regression tracking showing that the timeouts > were introduced in "qtdemux: Properly handle edit list in push mode"? How > does that work? The 49 new failures (at the bottom of the page) were introduced by one (or more) of the commits listed on that page.
the last 100runs : https://ci.gstreamer.net/job/GStreamer-master-validate-adaptive/test/?width=800&height=600&failureOnly=true The build only becomes "fatal" if more than 3 tests failed (to cope with *some* tests sometimes failing, as you can see at the beginning of the grap).
Created attachment 372659 [details] [review] qtdemux: rework segment event pushing, again This patch aims at fixing the recent regressions in the adaptive test suite. All segment pushing in push mode is now done with gst_qtdemux_check_send_pending_segment(), which is idempotent and handles both edit lists cases and cases where the upstream TIME segments have to be sent directly. Fragmented files that start with a non-zero tfdt are also taken into account, but their handling has been vastly simplified: now they are handled as implicit default seeks so there is no need to extend the GstSegment formulas as was being done before. qtdemux->segment.duration is no longer modified when upstream_format_is_time, respecting in this way the durations provided by dashdemux and fixing bugs in reverse playback tests where mangled durations appeared in the emitted segments.
(In reply to Edward Hervey from comment #65) > > How are flakes handled? > > flakes ? Tests that sometimes fail, sometimes pass. (In reply to Edward Hervey from comment #60) > (In reply to Alicia Boya García from comment #56) > > adaptive.dash.playback.full_live_rewind.DASHIF_livestream_testpic_2s: > > Timeout 'Application timed out: 30.0 secs' > > adaptive.dash.playback.seek_end_live.DASHLIVE_manifest_e_mp4a_300: Timeout > > 'Application timed out: 30.0 secs' > > adaptive.dash.playback.seek_end_live.DASHIF_livestream_testpic_2s: Timeout > > 'Application timed out: 30.0 secs' > > Those worked totally fine until the following commits landed: > > https://ci.gstreamer.net/view/testing/job/GStreamer-master-validate-adaptive/ > 340/ Maybe. They don't happen every time. They used not to happen every time. Should this patch be rejected until all the flakes are completely eliminated?
Created attachment 372702 [details] [review] qtdemux: Don't send EOS during upstream reverse playback Upstream driving elements such as dashdemux often do reverse playback by feeding qtdemux with the fragments containing the requested playback range in reverse order. But the requested playback range stop may be somewhere in the middle of a fragment. In that case, a naive pts >= segment.stop condition may declare end of segment prematurely when demuxing this first fragment. This used not to happen because there were places in moov parsing where segment.stop was overwritten to GST_CLOCK_TIME_NONE even if upstream_format_is_time -- resulting in this case in a segment with rate < 0 and stop == -1 and hence not triggering the EOS check, but that was likely an accident. This patch modifies the EOS check to take this case into account, not sending EOS when upstream_format_is_time if rate < 0. This fixes adaptive.dash.playback.seek_end_live.DASHIF_livestream_testpic_2s
On closer inspection, `seek_end_live` failures were actually reproducible. They are fixed by the above patch. `adaptive.dash.playback.full_live_rewind.DASHIF_livestream_testpic_2s` most of the time does not get stuck, it's just borderline slow. That has not changed before or after the patch. Other, more occasional, errors seem to be this bug or variations thereof, which also predates these patches: https://bugzilla.gnome.org/show_bug.cgi?id=796607 The results of `meson test` don't change after this additional patch.
Review of attachment 372659 [details] [review]: In overall this patch makes a lot of sense to me. ::: gst/isomp4/qtdemux.c @@ -2369,3 @@ /* map segment to internal qt segments and push on each stream */ if (demux->n_streams) { - if (demux->fragmented) { In the new code path this checks become `demux->upstream_format_is_time` (inverted) - we agreed this is the right thing to do as per Bug #796480. @@ +3955,3 @@ gint64 base_offset, running_offset; guint32 frag_num; + GstClockTime min_decode_time_ns = GST_CLOCK_TIME_NONE; Just call it `min_dts`? @@ +4068,3 @@ + if (stream) { + GstClockTime decode_time_ns = Just `dts` or so - ClockTime already implies NS. You can also just do min_dts = MIN(min_dts, QTSTREAMTIME_TO_GSTTIME (stream, decode_time)); @@ +4132,2 @@ pssh_node = qtdemux_tree_get_sibling_by_type (pssh_node, FOURCC_pssh); } Just move L4070 right after that line removing the `if` I think. @@ +4159,3 @@ + * parsed... which is OK -- some apps (mostly tests) expect a segment to + * be emitted after a moov, and we can emit a second segment anyway for + * special cases like this. */ Agreed. @@ -4532,3 @@ gst_buffer_unref (moov); qtdemux->got_moov = TRUE; - if (!qtdemux->fragmented && !qtdemux->upstream_format_is_time) { What makes it possible to remove that block? (Sorry, it is not clear to me). @@ -6017,3 @@ GList *iter; - gst_qtdemux_push_pending_newsegment (qtdemux); Why can you remove that like this? It is totally unclear to me @@ +13753,3 @@ + GstClockTime duration; + /* set duration in the segment info */ + gst_qtdemux_get_duration (qtdemux, &duration); You need to check that `duration != 0` here as before I think.
Review of attachment 372659 [details] [review]: This patch makes sense to me and the approach taken to unify segment handling in push mode sounds great. I found minor details to be addressed and have some question of changes I didn't fully understand.
Review of attachment 372702 [details] [review]: How is EOS going to be sent then?
Review of attachment 372659 [details] [review]: ::: gst/isomp4/qtdemux.c @@ +4068,3 @@ + if (stream) { + GstClockTime decode_time_ns = The name is for comparison with another variable, `decode_time` which stores the starting decode time of a fragment in track timescale. This is the same value, converted to nanoseconds. Therefore I think the name makes sense, but I have no strong feelings for or against. @@ -4532,3 @@ gst_buffer_unref (moov); qtdemux->got_moov = TRUE; - if (!qtdemux->fragmented && !qtdemux->upstream_format_is_time) { In the new code `need_segment` is only relevant for push mode and this is a pull-mode function, so it's not necessary. Later in this review is explained where this assignment had an effect. @@ -6017,3 @@ GList *iter; - gst_qtdemux_push_pending_newsegment (qtdemux); After my patch there is no longer a gst_qtdemux_push_pending_newsegment() function. There is gst_qtdemux_check_send_pending_segment(), but it's only intended to be used in push mode. I just have not found any use for it in the new code... and not a good one in the old code either. This function (like other _loop variants) is called in pull mode only. As far as I know, before edit lists in fragmented files were supported in qtdemux, the intent of this function call was to ensure the default segment was propagated downstream here instead. gst_qtdemux_push_pending_newsegment() only did something if `need_segment` was TRUE. [Of course, that condition is pretty recent, but back then qtdemux->pending_newsegment would serve the same purpose.] On non fragmented files `need_segment` would be FALSE because of the old `if (!qtdemux->fragmented && !qtdemux->upstream_format_is_time)` you saw before and it would do nothing. Currently this call is unnecessary. The way segments work in pull mode is as follows: First, note upstream_format_is_time is always FALSE in pull-mode. Since qtdemux is driving the pipeline and calling getrange() with file offsets it makes no sense for the element upstream to send a TIME segment. Therefore, there is no "direct segment propagation" case in pull mode, edit lists are always mapped to GstSegment's (if the file does not contain an `edts` atom this is not a problem, as a fallback one is created automatically). stream->segment_index tracks what edit list segment we are currently in. It's initialized to -1 (no segment). stream->time_position tracks what is the current movie time (i.e. time after applying edit lists) we are on. It's initialized to zero and grows as frames are demuxed. When gst_qtdemux_prepare_current_sample() finds segment_index == -1, it calls gst_qtdemux_find_segment() to find the segment containing stream->time_position and activates it (updates segment_index and pushes a new GstSegment downstream). Later segment_index is resetted to -1 in gst_qtdemux_advance_sample() when the end of an edit list segment is found. @@ +13753,3 @@ + GstClockTime duration; + /* set duration in the segment info */ + gst_qtdemux_get_duration (qtdemux, &duration); gst_qtdemux_get_duration() never returns 0. If qtdemux->duration == 0 it returns GST_CLOCK_TIME_NONE.
Review of attachment 372702 [details] [review]: EOS has to be sent by upstream in this case; but that was the case before the patch too. Upstream knows better in this case what should be demuxed: different elements could do it in different ways (e.g. feeding an entire fragment vs feeding only parts containing specific frames), qtdemux would rather not guess that since the decode time of this fragment is zero and this is the last frame in decoding order of the first/last GOP in the fragment you want an EOS. Upstream, who has been iterating through fragments or GOPs knows much better when the loop finishes and can just send an EOS that will be propagated.
Hi guys, Since I don't think this has much with my original bug to do anymore (Firefox doesn't even use GStreamer for demuxing MP4 anymore), do you think you could set someone else as a reporter? 75 emails about a product you don't use is getting a bit much. :-)
(In reply to Steinar H. Gunderson from comment #76) > Hi guys, > > Since I don't think this has much with my original bug to do anymore > (Firefox doesn't even use GStreamer for demuxing MP4 anymore), do you think > you could set someone else as a reporter? 75 emails about a product you > don't use is getting a bit much. :-) Hm, it's possible for us to remove anyone from the CC list *except* the original reporter. I think only you can remove yourself. Are you able to edit the CC list in the upper-right hand field? You should hopefully be able to select yourself and Remove selected CCs. The rest of us don't see your name on the list and can't remove you. I checked some other bugs that I reported and I'm able to see and myself, but I'm not positive what powers you have....
No. I can _add_ myself to the Cc list (through a checkbox), but I don't see myself in the list, and I can't change the reporter.
Reporter are immutable. But you can change your email preferences.
Created attachment 372775 [details] [review] qtdemux: rework segment event pushing, again This patch aims at fixing the recent regressions in the adaptive test suite. All segment pushing in push mode is now done with gst_qtdemux_check_send_pending_segment(), which is idempotent and handles both edit lists cases and cases where the upstream TIME segments have to be sent directly. Fragmented files that start with a non-zero tfdt are also taken into account, but their handling has been vastly simplified: now they are handled as implicit default seeks so there is no need to extend the GstSegment formulas as was being done before. qtdemux->segment.duration is no longer modified when upstream_format_is_time, respecting in this way the durations provided by dashdemux and fixing bugs in reverse playback tests where mangled durations appeared in the emitted segments.
Thanks for that, all the validate tests should pass now (modulo races :-)) Attachment 372702 [details] pushed as 2c39430 - qtdemux: Don't send EOS during upstream reverse playback Attachment 372775 [details] pushed as 025a430 - qtdemux: rework segment event pushing, again