GNOME Bugzilla – Bug 793058
qtdemux: add support for edit lists for fragmented files in push mode
Last modified: 2018-06-06 15:59:27 UTC
MediaSource Extensions requires support for basic edit lists in MP4: https://www.w3.org/TR/mse-byte-stream-format-isobmff/#iso-init-segments > The user agent must support setting the offset from media composition time to movie presentation time by handling an Edit Box (edts) containing a single Edit List Box (elst) that contains a single edit with media rate one. This edit may have a duration of 0 (indicating that it spans all subsequent media) or may have a non-zero duration (indicating the total duration of the movie including fragments). Unfortunately, qtdemux does not currently support this when working with fragmented files in push mode, which is what is used in MSE. Currently there is this condition that rejects them: https://github.com/ntrrgc/gst-plugins-good/commit/8bcc733ceca037b1d680ddb21c0317b6f85fab19
Did you try lifting that and seeing how it works? Can you provide samples or ways to test the misbehavior your are seeing?
When doing so in the current version it seems to work OK in my brief tests. On the other hand, commenting the if over this pending patch https://bugzilla.gnome.org/show_bug.cgi?id=778426 causes a regression in files without an edit list because the segment is cleared without being set then again. It should not delete the default segment if the file does not contain edit lists. Regarding test files, here you have the ones I'm using: With edit list: http://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/media/car-20120827-86.mp4 Without edit list: http://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/media/car-20120827-85.mp4 With the edit list the stream time of the PTS of the first frame should be zero. The stream time of the DTS of the first frame should be negative.
This is what I get when trying to play it on push mode locally: gst-launch-1.0 pushfilesrc location=/home/thiagoss/media/car-20120827-86.mp4 ! qtdemux name=d d.video_0 ! queue ! fakesink silent=false sync=true -v |grep fakesink |grep chain /GstPipeline:pipeline0/GstFakeSink:fakesink0: last-message = chain ******* (fakesink0:sink) (156 bytes, dts: 0:00:00.000000000, pts: 0:00:00.041711111, duration: 0:00:00.041711111, offset: -1, offset_end: -1, flags: 00004040 discont tag-memory , meta: none) 0x7f850000a350 Without the edit list this is how it goes: gst-launch-1.0 pushfilesrc location=/home/thiagoss/media/car-20120827-86.mp4 ! qtdemux name=d d.video_0 ! queue ! fakesink silent=false sync=true -v |grep fakesink |grep chain /GstPipeline:pipeline0/GstFakeSink:fakesink0: last-message = chain ******* (fakesink0:sink) (156 bytes, dts: 0:00:00.000000000, pts: 0:00:00.041711111, duration: 0:00:00.041711111, offset: -1, offset_end: -1, flags: 00004040 discont tag-memory , meta: none) 0x7f850000a350 So I guess the problem is that the pts for the edit-list is slightly off. Right? Just confirming this is the same you see on your MSE tests.
This uses a type of ELST we don't handle yet, this is the core of the problem: " For example, if the composition time of the first composed frame is 20, then the edit that maps the media time from 20 onwards to movie time 0 onwards, would read: Entry‐count = 1 Segment‐duration = 0 Media‐Time = 20 Media‐Rate = 1 " So, the spec has this special case that we don't handle yet.
Unfortunately I'm unable to replicate the behavior I saw in #2 anymore. When I remove the `if ()` segment.stop is set to a very low value and no frames are pushed. But you seem not to have the same problem, as I can see frames in your output. I'll base my answer in theory, supposing I had a GStreamer build where this worked like one day it did. (In reply to Thiago Sousa Santos from comment #3) > > So I guess the problem is that the pts for the edit-list is slightly off. > Right? Just confirming this is the same you see on your MSE tests. The PTSs shown there are perfectly fine. And there are no difference with and without edit list in that console output because you are reading buffer time, which in qtdemux corresponds to unedited track time (i.e. "media time" in MP4 parlance). The difference comes into play in the GstSegment emitted before the frames, as that declares the mapping from buffer time to streaming time. Without edit list: ntrrgc@homura❯ gst-launch-1.0 -v pushfilesrc location=car-20120827-86.mp4 ! qtdemux ! fakesink silent=false |egrep "segment|chain" |head -n2 /GstPipeline:pipeline0/GstFakeSink:fakesink0: last-message = event ******* (fakesink0:sink) E (type: segment (17934), GstEventSegment, segment=(GstSegment)"GstSegment, flags=(GstSegmentFlags)GST_SEGMENT_FLAG_NONE, rate=(double)1, applied-rate=(double)1, format=(GstFormat)GST_FORMAT_TIME, base=(guint64)0, offset=(guint64)0, start=(guint64)0, stop=(guint64)18446744073709551615, time=(guint64)0, position=(guint64)0, duration=(guint64)18446744073709551615;";) 0x561f0df5daa0 /GstPipeline:pipeline0/GstFakeSink:fakesink0: last-message = chain ******* (fakesink0:sink) (156 bytes, dts: 0:00:00.000000000, pts: 0:00:00.041711111, duration: 0:00:00.041711111, offset: -1, offset_end: -1, flags: 00004040 discont tag-memory , meta: none) 0x7f58e800a400 With edit list: ntrrgc@homura❯ gst-launch-1.0 -v pushfilesrc location=car-20120827-86.mp4 ! qtdemux ! fakesink silent=false |egrep "segment|chain" |head -n3 /GstPipeline:pipeline0/GstFakeSink:fakesink0: last-message = event ******* (fakesink0:sink) E (type: segment (17934), GstEventSegment, segment=(GstSegment)"GstSegment, flags=(GstSegmentFlags)GST_SEGMENT_FLAG_NONE, rate=(double)1, applied-rate=(double)1, format=(GstFormat)GST_FORMAT_TIME, base=(guint64)0, offset=(guint64)0, start=(guint64)0, stop=(guint64)18446744073709551615, time=(guint64)0, position=(guint64)0, duration=(guint64)18446744073709551615;";) 0x13beeb0 /GstPipeline:pipeline0/GstFakeSink:fakesink0: last-message = event ******* (fakesink0:sink) E (type: segment (17934), GstEventSegment, segment=(GstSegment)"GstSegment, flags=(GstSegmentFlags)GST_SEGMENT_FLAG_NONE, rate=(double)1, applied-rate=(double)1, format=(GstFormat)GST_FORMAT_TIME, base=(guint64)0, offset=(guint64)0, start=(guint64)41711110, stop=(guint64)18446744073709551615, time=(guint64)0, position=(guint64)41711110, duration=(guint64)18446744073709551615;";) 0x7fb9880051f0 /GstPipeline:pipeline0/GstFakeSink:fakesink0: last-message = chain ******* (fakesink0:sink) (156 bytes, dts: 0:00:00.000000000, pts: 0:00:00.041711111, duration: 0:00:00.041711111, offset: -1, offset_end: -1, flags: 00004040 discont tag-memory , meta: none) 0x7f58e800a400 The first GstSegment is superfluous. Ideally qtdemux should not emit it, but since it's not followed by any GstBuffers, it's mostly harmless. The important one is the second GstSegment. Notice start=41711110 ns, whereas it was zero without edit lists. Buffer timestamps are still the same, but when doing the math to get stream time (see https://gstreamer.freedesktop.org/documentation/design/synchronisation.html#stream-time), we get different results: stream_time = (B.timestamp - S.start) * ABS (S.applied_rate) + S.time Without edit lists: stream time PTS = (41711111 ns - 0 ns) * 1 + 0 ns = 41711111 ns stream time DTS = (0 ns - 0 ns) * 1 + 0 ns = 0 ns With edit lists: stream time PTS = (41711111 ns - 41711110 ns) * 1 + 0 ns = 1 ns stream time DTS = (0 ns - 41711110 ns) * 1 + 0 ns = -41711110 ns Notice how the movie presentation starts close to zero whereas without the edit list there is the equivalent time interval of a frame without image. Unfortunately, it does not start exactly at zero because of rounding errors inherent to GStreamer design*. * This is a consequence of applying edits by performing addition of fixed point nanoseconds (first frame unedited track time PTS lossly converted to fixed point in order to set GstBuffer.pts + edit start lossly converted to fixed point in order to set GstSegment.start). Unfortunately, since GstBuffer.dts is unsigned, qtdemux cannot handle edits by itself, they have to be applied by GstSegment transformation from (unsigned) buffer time to stream time.
(In reply to Thiago Sousa Santos from comment #4) > This uses a type of ELST we don't handle yet, this is the core of the > problem: > > " > For example, if the composition time of the first composed frame is 20, then > the edit that maps the > media time from 20 onwards to movie time 0 onwards, would read: > Entry‐count = 1 > Segment‐duration = 0 > Media‐Time = 20 > Media‐Rate = 1 > " > > So, the spec has this special case that we don't handle yet. I assume you are referring to `segment-duration = 0`, which is what I'm having trouble with. Is that the case? If that's the case... well, we are supposed to support it already. See this comment in qtdemux_parse_segments(): /* zero duration does not imply media_start == media_stop * but, only specify media_start.*/ Maybe there is a (different) bug somewhere.
It is a different thing. If there is only one entry and duration is 0 and media-rate is 1 (sounds like a very specific thing). It is a special case from the spec that we don't handle. Check the iso 14496-12 from 2015 on the edts/elst sections. (It is freely available).
I'm well aware of that section of the spec. The comment is from the code that parses that box in the branch where duration == 0.
Created attachment 369399 [details] [review] qtdemux: handle pts offset shifts indicated at elst elst atom has a special case from ISO/IEC 14496-12:2015 that says that a if it has a single entry with media_duration=0, then media-start actually means the amount to offset the PTS so that DTS and PTS are correct according to the stream. Implementation reuses the setup done for cslg atoms.
This patch should parse the elst with the special case correctly. Maybe the segment handling could still be improved to not have stop=GST_CLOCK_TIME_NONE but this is enough to get the DTS and PTS properly set. Could you give this a try? It was done on top of the patch for bug #778426
(In reply to Thiago Sousa Santos from comment #10) > This patch should parse the elst with the special case correctly. Maybe the > segment handling could still be improved to not have > stop=GST_CLOCK_TIME_NONE but this is enough to get the DTS and PTS properly > set. > > Could you give this a try? It was done on top of the patch for bug #778426 I'm currently working on a test battery for qtdemux edit list handling. My first goal is to determine the current status, but when I get it nearer completion, I'll start applying patches and seeing how many issues can be fixed. I'll test this patch too.
Review of attachment 369399 [details] [review]: r- I don't understand how this is supposed to work. I have added comments in the points that look the most suspicious to me. ::: gst/isomp4/qtdemux.c @@ +129,3 @@ #define QTSAMPLE_DTS(stream,sample) (QTSTREAMTIME_TO_GSTTIME((stream), (sample)->timestamp)) /* timestamp + offset + cslg_shift is the outgoing PTS */ +#define QTSAMPLE_PTS(stream,sample) (QTSTREAMTIME_TO_GSTTIME((stream), (sample)->timestamp + (stream)->cslg_shift + (stream)->elst_shift + (sample)->pts_offset)) QTSAMPLE_PTS() is used to set buffer PTS. By adding elst_shift you are introducing an inconsistency into qtdemux: So far buffer PTS in qtdemux is always track presentation time. You need to calculate stream time to get movie presentation time, where edits are applied. With this change, you would change that definition for push mode streams with basic edit lists. It would become track presentation time + media_time of the edit. It's not even movie presentation time, you would need to subtract elst_shift for that instead*. *and that would be problematic because an edit list can cause negative movie decode and presentation time... which cannot be handled in buffer time because GstBuffer.{pts,dts} are unsigned. @@ +352,3 @@ + /* elst */ + guint64 elst_shift; + The comment does not explain what that field is about. If I've understood correctly the code it would be something like this. /* In push mode, when there is an edit list with a single, non-empty edit with duration=0, this field stores the media_start of that edit in trak timescale. In other cases, this field is zero. */ @@ +4793,3 @@ stream->segment.rate = rate; stream->segment.start = start + QTSTREAMTIME_TO_GSTTIME (stream, + stream->cslg_shift + stream->elst_shift); Why do you shift buffer time above if you then shift the segment.start in the same direction? Stream time will reverse the elst_shift that you added, returning (unedited) track presentation time.
(In reply to Alicia Boya García from comment #12) > Review of attachment 369399 [details] [review] [review]: > > r- > > I don't understand how this is supposed to work. I have added comments in > the points that look the most suspicious to me. > > ::: gst/isomp4/qtdemux.c > @@ +129,3 @@ > #define QTSAMPLE_DTS(stream,sample) (QTSTREAMTIME_TO_GSTTIME((stream), > (sample)->timestamp)) > /* timestamp + offset + cslg_shift is the outgoing PTS */ > +#define QTSAMPLE_PTS(stream,sample) (QTSTREAMTIME_TO_GSTTIME((stream), > (sample)->timestamp + (stream)->cslg_shift + (stream)->elst_shift + > (sample)->pts_offset)) > > QTSAMPLE_PTS() is used to set buffer PTS. By adding elst_shift you are > introducing an inconsistency into qtdemux: > > So far buffer PTS in qtdemux is always track presentation time. You need to > calculate stream time to get movie presentation time, where edits are > applied. > > With this change, you would change that definition for push mode streams > with basic edit lists. It would become track presentation time + media_time > of the edit. It's not even movie presentation time, you would need to > subtract elst_shift for that instead*. AFAIK this code is for both push and pull mode to get correct PTS and DTS alignment. Initial PTS should be 0 if DTS could be negative but it can't so we shift PTS forward (together with segment.start) and keep initial DTS as 0 to achieve the same effect. elst_shift is only non-zero on the scenarios outlined on the spec: """ A non‐empty edit may insert a portion of the media timeline that is not present in the initial movie, and is present only in subsequent movie fragments. Particularly in an empty initial movie of a fragmented movie file (when there are no media samples yet present), the segment_duration of this edit may be zero, whereupon the edit provides the offset from media composition time to movie presentation time, for the movie and subsequent movie fragments. It is recommended that such an edit be used to establish a presentation time of 0 for the first presented sample, when composition offsets are used. """ Do you have a sample file that meets this criteria but fails? I'd be happy to look at it. The missing check is if the first segment is empty but that could be added, maybe. > > *and that would be problematic because an edit list can cause negative movie > decode and presentation time... which cannot be handled in buffer time > because GstBuffer.{pts,dts} are unsigned. > > @@ +352,3 @@ > + /* elst */ > + guint64 elst_shift; > + > > The comment does not explain what that field is about. If I've understood > correctly the code it would be something like this. > > /* In push mode, when there is an edit list with a single, non-empty edit > with duration=0, this field stores the media_start of that edit in trak > timescale. In other cases, this field is zero. */ > > @@ +4793,3 @@ > stream->segment.rate = rate; > stream->segment.start = start + QTSTREAMTIME_TO_GSTTIME (stream, > + stream->cslg_shift + stream->elst_shift); > > Why do you shift buffer time above if you then shift the segment.start in > the same direction? Stream time will reverse the elst_shift that you added, > returning (unedited) track presentation time. Because the goal here is not to offset the start time (and have emptiness on the beginning) but to have PTS be shifted forwarded because we can't shift DTS backwards because it is unsigned. Maybe I should have left better comments around the whole elst_shift scenario. I can improve the patch in that regard.
(In reply to Thiago Sousa Santos from comment #13) > AFAIK this code is for both push and pull mode That's my bad, qtdemux_parse_segments() is used in both modes indeed. > AFAIK this code is for both push and pull mode to get correct PTS and DTS > alignment. Initial PTS should be 0 if DTS could be negative but it can't so > we shift PTS forward (together with segment.start) and keep initial DTS as 0 > to achieve the same effect. That's not what we want or what the edit list states. Most edit lists, including the provided test file, shift backwards, so that movie PTS (=stream time PTS in gstreamer) starts at zero and movie DTS (=stream time DTS in gstreamer) starts negative. > > elst_shift is only non-zero on the scenarios outlined on the spec: > > """ > A non‐empty edit may insert a portion of the media timeline that is not > present in the initial movie, and > is present only in subsequent movie fragments. Particularly in an empty > initial movie of a fragmented > movie file (when there are no media samples yet present), the > segment_duration of this edit may be > zero, whereupon the edit provides the offset from media composition time to > movie presentation time, > for the movie and subsequent movie fragments. It is recommended that such an > edit be used to > establish a presentation time of 0 for the first presented sample, when > composition offsets are used. > """ I think you are misunderstanding there the word "fragments" as edits. That's not it, it actually refers to MP4 fragments (moof+mdat boxes that can extend a moov and are usually used in DASH and MSE). What the spec is stating: * Edits can include frames from the fragments too. In other words, edit lists can be used with fragmented files (like the provided example). * Fragmented files are divided into a non-fragmented part and one or more fragments. Fragments have different structures for the frame index than the non-fragmented part, so usually, for simplicity and orthogonality, fragmented files have no frames at all in the non-fragmented part, which serves only as a header. In that case, it's useful to set the edit segment_duration to 0, which shall be interpreted as "until the end of the complete movie, including fragments". * It's recommended that you use an edit list so that your movie PTS starts at zero. > Do you have a sample file that meets this criteria but fails? I'd be happy > to look at it. The missing check is if the first segment is empty but that > could be added, maybe. I already linked you a test file above. If you demux it with your patch you'll see PTS is being shifted forwards, instead of backwards. You can also check that the file uses segment_duration=0 indeed, which is usual though not a requirement (i.e. a fragmented file may specify the (non-zero) duration of the movie with fragments in elst.segment_duration and that would still qualify as a completely fine basic edit list). http://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/media/car-20120827-86.mp4 Some time ago I wrote a document about the MP4 format, fragments, edit lists and why they are necessary. I have extended that last section and added some notes about how they are implemented in GStreamer too. I think it would be helpful if you gave it a look. If something does not seem to make sense, you can ask me in IRC. https://github.com/ntrrgc/media-notes/wiki/MP4-Notes#track-edit-list
(In reply to Alicia Boya García from comment #2) > When doing so in the current version it seems to work OK in my brief tests. > > On the other hand, commenting the if over this pending patch > https://bugzilla.gnome.org/show_bug.cgi?id=778426 causes a regression in > files without an edit list because the segment is cleared without being set > then again. It should not delete the default segment if the file does not > contain edit lists. I have checked again and it's quite the opposite, the segment is not emitted in the file WITH edit list (car-20120827-86.mp4) There actually two bugs here playing together: a) Lack of support for duration = 0 in edit lists. b) Lack of support for edit lists in fragmented files. Removing the conditional fixes b), but a) remains. (In reply to Alicia Boya García from comment #5) > Notice how the movie presentation starts close to zero whereas without the > edit list there is the equivalent time interval of a frame without image. > > Unfortunately, it does not start exactly at zero because of rounding errors > inherent to GStreamer design*. > > * This is a consequence of applying edits by performing addition of fixed > point nanoseconds (first frame unedited track time PTS lossly converted to > fixed point in order to set GstBuffer.pts + edit start lossly converted to > fixed point in order to set GstSegment.start). Unfortunately, since > GstBuffer.dts is unsigned, qtdemux cannot handle edits by itself, they have > to be applied by GstSegment transformation from (unsigned) buffer time to > stream time. In this case this is not a rounding error, it's a side effect of a) that causes addition to occur with GST_CLOCK_TIME_NONE, effectively subtracting 1 nanosecond, and through several hops it ends up in the emitted GstSegment.
Created attachment 370391 [details] [review] [PATCH] qtdemux: Allow edit lists on fragmented files on push mode
How does this patch handle this? """ A non‐empty edit may insert a portion of the media timeline that is not present in the initial movie, and is present only in subsequent movie fragments. Particularly in an empty initial movie of a fragmented movie file (when there are no media samples yet present), the segment_duration of this edit may be zero, whereupon the edit provides the offset from media composition time to movie presentation time, for the movie and subsequent movie fragments. It is recommended that such an edit be used to establish a presentation time of 0 for the first presented sample, when composition offsets are used. """ Also, just because the PTS on a buffer is non-zero it doesn't mean that the first PTS for the running time is going to be non-zero as well. Shifting both the buffer PTS and segment start by the same amount makes it effectively be 0 on running time point of view.
(In reply to Thiago Sousa Santos from comment #17) > How does this patch handle this? > > """ > A non‐empty edit may insert a portion of the media timeline that is not > present in the initial movie, and > is present only in subsequent movie fragments. Particularly in an empty > initial movie of a fragmented > movie file (when there are no media samples yet present), the > segment_duration of this edit may be > zero, whereupon the edit provides the offset from media composition time to > movie presentation time, > for the movie and subsequent movie fragments. It is recommended that such an > edit be used to > establish a presentation time of 0 for the first presented sample, when > composition offsets are used. > """ It doesn't, because that's a different problem (a). See this patch for that: qtdemux: edits with duration = 0 don't work on files without a mehd https://bugzilla.gnome.org/show_bug.cgi?id=794858 https://bug794858.bugzilla-attachments.gnome.org/attachment.cgi?id=370389 > Also, just because the PTS on a buffer is non-zero it doesn't mean that the > first PTS for the running time is going to be non-zero as well. That's correct, but I see no problem with the patch. > Shifting both the buffer PTS and segment start by the same amount makes it > effectively be 0 on running time point of view. But that's not what we want with edit lists. Edit list should shift running time too. An edit list can be used not only to displace the track in movie presentation (it's current, usual use case), but to compose a movie with slices of the track (its original purpose). You could for instance set media_time = 5 minutes and would expect the frame with track PTS (CTS) = 5 minutes to be played immediately and be rendered to the user as 0:00:00.
Created attachment 372384 [details] [review] qtdemux: Allow edit lists on fragmented files on push mode Fragmented files often use elst.duration=0 which before ee78825eaef2c5fffac7d6c5526fe18cec6b3eef was wrongly interpreted as having no frames. Since that issue has now been fixed, there is no reason to disable edit lists in fragmented files. This commit enables them, therefore producing correct stream time for files containing edit lists.
I've rebased the patch and added an explanation to the commit message.
Attachment 372384 [details] pushed as c2a0da8 - qtdemux: Allow edit lists on fragmented files on push mode
This broke several tests in `validate.dash.playback.*`. reopenning.
Alicia, any chance you could look at the issues ?
(In reply to Edward Hervey from comment #23) > Alicia, any chance you could look at the issues ? I'm currently investigating the issue. The most immediate cause of the assertion error is that in the dashdemux case qtdemux is leaving stream->segment with uninitialized format (zeroed) when a "reused stream" is created. The assertion was not triggered before the patch because the `if (fragmented)` branch avoided emitting segments for reused streams. I'm still reading related code and trying to figure out the best solution for that issue.
We really need this to be fixed ASAP. Builds have been failing for 2 weeks now. Alicia if you need help, don't hesitate to ask.
(In reply to Edward Hervey from comment #25) > We really need this to be fixed ASAP. Builds have been failing for 2 weeks > now. Alicia if you need help, don't hesitate to ask. I don't know when I'll have a solution so be free to revert whatever you want in the meantime
Closed through: https://bugzilla.gnome.org/show_bug.cgi?id=796480