GNOME Bugzilla – Bug 735666
design doc : trick mode handling in demuxers (SKIP)
Last modified: 2018-11-03 12:22:26 UTC
I wrote a document detailing more how fast trick modes could be handled in demuxers. The document is here (will keep it updated based on comments): http://cgit.freedesktop.org/~bilboed/gstreamer/tree/docs/design/part-demuxer-skip.txt?h=design
Some random comments... Option 0) - You seem to assume here that there's a different keyframe every seek (which might not be true if you use 500ms segments, play 2x rate and there's only a keyframe every 1+ seconds). If that is not the case you would play the same segment multiple times, each time a bit longer. Also KEYUNIT flag says nearest keyframe. There needs to be some logic in the application to prevent the above, and I don't see how the application can properly prevent that - With every seek I think you add an error to the actual playback rate that is the time between receiving SEGMENT_DONE and the pipeline being prerolled again after the next seek. The pipeline will go to PAUSED after SEGMENT_DONE, and then offset the base time by the time it stayed in PAUSED. Do you mean that with "potentially causes delay" and the "potential issue" in the cons? I think these are not just potential but real problems :) Proposal) - I think there's a problem here as you jump from one keyframe to the next all the time. If a) the keyframe distance is not similar all the time it will look rather weird to the eye as you jump completely different distances, and if b) the keyframe distance is too small for the playback rate you could request too much data from upstream and just play a fraction of the duration of each keyframe. I don't know what to do about a), but for b) the demuxer should probably have some logic to skip keyframes based on how many frames it will actually output for the current keyframe. If it's less than a frame or N frames, maybe it should instead output one frame or N frames and skip the next K keyframes to compensate for that. (I just saw that you mentioned this at the end, but I think this should be considered from the beginning) - Position reporting will be weird. It would run a bit in real time, then jump and run again a bit in real time. The alternative would be to scale the stream time by the rate, which would however then report the wrong position for a frame. Not sure what is better. - For reverse playback I think with your proposal the position would go forwards a bit, then jump backwards, and then go forwards a bit. Maybe instead the position should go backwards all the time, and the frames in each segment should be reordered like with how we currently implement reverse playback? (does this happen automatically already if you just set rate=-1.0?) - I think the segments are wrong overall. You currently set segment.time to the already played duration, but for the running time this has no effect. It should be segment.base. segment.time is only an offset for the stream time, but I think the stream time would already be good if you keep segment.time=segment.start all the time. Possible improvements: - Keyframe-only and disable-audio should probably be new seek flags instead of magically doing that. I think the application wants control over that.
(In reply to comment #1) > Some random comments... > > Option 0) > Do you mean that with "potentially causes delay" and the "potential issue" in > the cons? I think these are not just potential but real problems :) That's indeed what I meant :) Option 0 is really "what could we do to get closer to trick modes without any code modification". It's not meant as a solution, more a discussion/brainstorming. > > Proposal) > - I think there's a problem here as you jump from one keyframe to the next all > the time. If a) the keyframe distance is not similar all the time it will look > rather weird to the eye as you jump completely different distances, and if b) > the keyframe distance is too small for the playback rate you could request too > much data from upstream and just play a fraction of the duration of each > keyframe. I don't know what to do about a), but for b) the demuxer should > probably have some logic to skip keyframes based on how many frames it will > actually output for the current keyframe. If it's less than a frame or N > frames, maybe it should instead output one frame or N frames and skip the next > K keyframes to compensate for that. > (I just saw that you mentioned this at the > end, but I think this should be considered from the beginning) Right. I'll put it at the top. The goal was to simplify the explanation, else it would have been a tad too complex. > > - Position reporting will be weird. It would run a bit in real time, then jump > and run again a bit in real time. The alternative would be to scale the stream > time by the rate, which would however then report the wrong position for a > frame. Not sure what is better. There are two constraint that maybe weren't 100% clear with this whole proposal, which is: 1) Reducing as much as possible upstream bandwith/bitrate. Ideally you want to end up pulling from upstream (over a certain window of time) at the same bitrate as the normal playback rate. This is to avoid issues with network connections (limited bandwidth) but also local storage (limited i/o disks). 2) Reducing as much as possible seeking/skipping from upstream. Assume there's a more or less big latency introduced when "skipping" upstream (such as on HTTP, but also on slow local storage). > > - For reverse playback I think with your proposal the position would go > forwards a bit, then jump backwards, and then go forwards a bit. Maybe instead > the position should go backwards all the time, and the frames in each segment > should be reordered like with how we currently implement reverse playback? > (does this happen automatically already if you just set rate=-1.0?) The problem with that is the latency (and memory usage) introduced by reversing all frames (you need to wait for the next beginning of a GOP). > > - I think the segments are wrong overall. You currently set segment.time to > the already played duration, but for the running time this has no effect. It > should be segment.base. segment.time is only an offset for the stream time, but > I think the stream time would already be good if you keep > segment.time=segment.start all the time. True, I wonder why I got that wrong. > > > > Possible improvements: > - Keyframe-only and disable-audio should probably be new seek flags instead of > magically doing that. I think the application wants control over that. Right, so that's a bigger issue (deciding what technique to use). This proposal was discussing *one* way of doing trick mode with various constraints in mind. If you have very fast i/o and low latency, you could use another method.
(In reply to comment #2) > > - Position reporting will be weird. It would run a bit in real time, then jump > > and run again a bit in real time. The alternative would be to scale the stream > > time by the rate, which would however then report the wrong position for a > > frame. Not sure what is better. > > There are two constraint that maybe weren't 100% clear with this whole > proposal, which is: > 1) Reducing as much as possible upstream bandwith/bitrate. Ideally you want > to end up pulling from upstream (over a certain window of time) at the same > bitrate as the normal playback rate. This is to avoid issues with network > connections (limited bandwidth) but also local storage (limited i/o disks). > 2) Reducing as much as possible seeking/skipping from upstream. Assume > there's a more or less big latency introduced when "skipping" upstream (such as > on HTTP, but also on slow local storage). Understood, but unrelated to my comment :) What I just mean is that the stream time is going to behave weird. Either you'll have jumps (e.g. 1s, 1.1s, 1.2s, 5s, 5.1s, ...) or you have wrong position reporting (e.g. 1s, 2s actually being 1.05s, 3s actually being 1.1s, ..., 5s being 5s, ...) We should consciously choose which kind of behaviour we want here for the stream time and document that so it's implemented the same in every demuxer. > > - For reverse playback I think with your proposal the position would go > > forwards a bit, then jump backwards, and then go forwards a bit. Maybe instead > > the position should go backwards all the time, and the frames in each segment > > should be reordered like with how we currently implement reverse playback? > > (does this happen automatically already if you just set rate=-1.0?) > > The problem with that is the latency (and memory usage) introduced by > reversing all frames (you need to wait for the next beginning of a GOP). Ack > > - I think the segments are wrong overall. You currently set segment.time to > > the already played duration, but for the running time this has no effect. It > > should be segment.base. segment.time is only an offset for the stream time, but > > I think the stream time would already be good if you keep > > segment.time=segment.start all the time. > > True, I wonder why I got that wrong. Because segments are confusing for everybody :) > > Possible improvements: > > - Keyframe-only and disable-audio should probably be new seek flags instead of > > magically doing that. I think the application wants control over that. > > Right, so that's a bigger issue (deciding what technique to use). > > This proposal was discussing *one* way of doing trick mode with various > constraints in mind. > > If you have very fast i/o and low latency, you could use another method. Keyframe-only or disable-audio could maybe also be controlled via QoS, but maybe it should be an application decision in the end.
Created attachment 292524 [details] [review] segment: Add new skip flags for clarifying trick mode playback.
Created attachment 292525 [details] [review] playback-test: Support new skip seek flags
Created attachment 292526 [details] [review] avviddec: Implement SKIP and new SKIP_KEY_UNITS_ONLY flags When playing non-1.0 playback, respect any skip flags set in the incoming segment.
Here's some related patches for review - adding a couple of new SKIP flags, and implementing the video SKIP flags in the gst-libav video decoder. Starting with the decoders is the simplest place, and already helps a lot on the CPU, although obviously not on the disk reads. On top of this would be handling the flags in the demuxers to reduce the data read/pushed out where possible, and to implement the skipping in the audio decoders (and push gap events so things still preroll).
Any opinions on these new skip flags, or should I just push them?
> - You seem to assume here that there's a different keyframe every seek (...). > KEYUNIT flag says nearest keyframe. There needs to be some logic in the > application to prevent the above, and I don't see how the application can > properly prevent that GST_SEEK_FLAG_SNAP_{BEFORE|AFTER} were added to help with that in an application-driven scenario (simulated trick-modes using fast successive seeks whilst keeping the pipeline in paused state).
About the API: it seems to me there are multiple different ways of doing SKIP/trick-mode playback, and we will want to be able to support all of them. I think the question is if we think we will be able to squeeze all the things we'll need in the near future into flags, or if we don't want something more explicit instead which is easier to extend later. This could be either some kind of GstTrickModeContext helper object that can then be added/gotten from the seek event, or some kind of gst_event_seek_set_trick_mode_options (....) with option/gtype/value style extendability. Different trick mode "modes" we'll want to be able to support: (1) automatic (chosen by whoever drives the pipeline, source and/or demuxer, perhaps based on the rate and other things) (2) I-frame only (continuous) (possible extra parameters: min-distance, for higher-rate FF/FR) (3) I-frame only (single-shot) (i.e. EOS right after; useful for snapshotting and simulated app-managed FF/FR) (4) Ref frames only / skip B-frames (avcodec has separate options for these) (5) play-skip style trickmodes, as described by Edward's proposal and part-trickmodes.txt (I can imagine some parameters one might want to influence the behaviour of this in future). Possibly with subvariants that limit the input datarate and/our output datarate (Edward's proposal is to limit it to the rate of the original stream on both counts, but one might want to allow higher input/output rates for better trick mode performance if it is known that the decoders can handle more or there's enough bandwidth and processing power on the input side to handle more). - any of the above combined with additional options such as reducing post-processing quality or somesuch - any of the above with or without audio Most of the above could be modelled with flags, but e.g. the min-distance parameter seems very useful to have in practice, and I think it would be difficult to squeeze that into flags, for example. Then there's the question what the SKIP flag alone is supposed to mean. I think we should not try to run before we can walk, and let SKIP default to the lowest common denominator (which could just be automatic/whatever mode). We don't even support I-frame only SKIP mode anywhere yet! I also wonder if we should default to no-audio by default, which seems like a more sensible default to me (and what device vendors seem to do in practice.)
And couple of random comments about the proposal doc: - I think this should just be an additional section(s) in the existing part-trickmodes.txt. One section per trick mode as identified above, or somesuch. Can label it/them as draft if needed. > This concentrates mostly on container formats that have > full visibility of the stream (i.e. contain an index). - N.B. an index does not in practice entail "full visibility" of the stream, it often does not even index all keyframes > The difference is that this proposal tries to: > * Reduce processing complexity downstream (data outputted is always > sent with a forward rate of 1.0). > * Avoid any changes in downstream elements > * Reduce amount of changes required in demuxers and other elements. While I see how this can be useful, it is not clear to me that this should be our target (I realise you are just listing differences here of course). Requiring additional code in various elements for new features seems fine to me. We should make sure our system allows people to implement the best possible trick mode support, not aim for the easiest fix to get something working with the existing code base with the smallest amount of changes. This just as a general comment, not mean to be an argument against the proposal, which just details implementation details of a trick mode type we'll want to support in any case imho. I don't know if the proposal will actually be able to minimise the amount of changes required in demuxers, it's going to be non-trivial to implement this in practice whilst keeping the restrictions you want. The demuxer will need to be able to adapt to real-time performance of the pipeline, esp. in terms of being able to read+process input data fast enough (there will be extra latencies and overhead caused by incomplete/missing indices etc.). The rationale for always outputting data with a forward rate of 1.0 here is so that the short playback sections get played at 1.0 speed, right? I think for the other non play-skip trick modes I listed in comment #10 we would want to pass through the original segment with the requested rate though. > The global idea is to: > 1) Output at most the same average amount of data downstream to be > decoded in realtime (There are still <fps> frames decoded for each > walltime seconds, regardless of the rate). > 2) Fetch from upstream the same average bitrate. This ensures that if > the file was readable at normal rate from a local or remote storage, > it can still be read at high rate. These seem prescriptive in a certain way now, and as I mentioned above I think a case can be made for a variant of this skip-play mode without such restrictions. It makes sense in some cases of course, and certain specs have strict compliciance requirements along those lines, but making this more restrictive mode the default might arguably be a conflict with the hitherto defined meaning of SKIP (in part-trickmodes.txt) and one might have to skip much more than one would when reading the file sequentially if the goal is not to exceed the original input bitrate ever). > Option 0 : Achievable currently without any modification This seems like a slightly more complicated variant of what is used in practice by many vendors (emulated trick mode by doing timed KEY_UNIT seeks while pipeline stays in PAUSED state - this has the advantage that it is self-regulating in the sense that it will work as fast as the source and demuxer/parser/decoders can work in practice on a given file/stream). > [Option 0] Cons: > * Logic needs to be (re)implemented in every application We could provide helper API for this in some way or another, be it in GstPipeline, playbin or something separate like a "pipeline manager" object (which would be useful for other things too), or just a bunch of utility functions like _{start,stop,change}_simulated_trick_play(). > * Delay caused by back/forth between element and application for > every segment, could potentially cause delays. > * Application doesn't know optimal location of keyframes, so can't > push out only the requested amount of data (chunks played out will > be of variable length). We could make the position of keyframes available to the application of course, but the problem is that often the demuxer doesn't know itself for formats where there's no index (and one needs to support those, otherwise the whole thing is not very interesting). In practice one would often have to make a blind seek to some position and see what falls out. > * There's a potential issue where the "next" position the > application requests is not beyond the next keyframe, resulting in > the demuxer pushing a very big segment again. This would require > the application to detect and handle such cases gracefully. * application needs to do 'cycle detection', it's possible a seek to N+pos yields N again as nearest keyframe location. SNAP seek flags would help with this, but are not implemented everywhere yet. The delay thing is a problem in any case btw, also for plain old reverse playback at 1.0 rate (when there's no wait for the application to issue another seek/instruction), because we don't necessarily have enough buffering for a whole GOP to be cached downstream of the demuxer, so the demuxer is often blocked for some time before being able to process more data. > Proposal : Move logic in demuxers > > In order to avoid the various "Cons" from option 0, the proposed way > forward is to move the logic of figuring out which segments to be > played back into the demuxers themselves and activate that mode if the > application sends a seek with the SKIP flag enabled. Ok, so the proposal is to interpret the SKIP flag to activate this particular approach to trick modes (which involves playing back short chunks). I think we should default to an easier lowest-common-denominator fallback like I-frame only by default, or let the demuxer/source decie, but have a way to select this skip-play kind of mode. (Also, reportedly this kind of trick mode is not generally perceived as very pleasant, which might be another reason to not default to it). > 2) Demuxer figures out the initial optimal initial previous and next > keyframe for Pos. N.B. The demuxer might not know where the (actual) next keyframe is. It is entirely possible and often the case that an index does not index every keyframe. The logic described will still work fine of course. > [Possible variants and improvements] > 2) Global speedup calculation > If the keyframe interval is non-constant... Out of curiosity, do we have empirical data about whether this is a rare case or the normal case? > 3) Disable audio-track IMHO this should be the default. Cool stuff btw, have you implemented it anywhere yet? :)
I don't think every possible trick mode desire can be represented by flags - but the one's I want to implement can be :) I avoided trying to define any further because of KISS, but also because flags are the only thing that can be easily added to GstSegment in a backwards-compatible way. GstSegments are statically allocatable. There's some padding, but the ability to add new fields is limited - we could easily fill the padding with incorrectly thought out API. Sticking to flags is comparitively safe. For adding a lot of trick-mode description stuff in the future, I was thinking it would be better to define a new flag 'GST_SEEK_FLAG_TRICKMODE_EXTENDED' and 'GST_SEGMENT_FLAG_TRICKMODE_EXTENDED' that would indicate the presence of extra fields in the GstStructure attached to seek and segment events respectively, with a boxed GstTrickModeSettings struct containing all that extended info. That flags might not even be needed - just the presence of the 'trickmode-settings' field in the event would be sufficient for a function call like gboolean gst_seek_event_parse_trickmode_settings() to return TRUE or FALSE. Done that way, the extra info is not in GstSegment, and doesn't affect the padding. Elements that know how to do fancier trickmode stuff can just check for the info and then act on it when present, and store it alongside their existing GstSegment(s) as needed.
Created attachment 295448 [details] [review] segment: Add new skip flags for clarifying trick mode playback. Add GST_SEEK_FLAG_TRICKMODE_KEY_UNITS and GST_SEEK_FLAG_TRICKMODE_NO_AUDIO, and rename GST_SEEK_FLAG_SKIP to GST_SEEK_FLAG_TRICKMODE (with backwards compat define). Do the same for the corresponding SEGMENT flags.
Created attachment 295449 [details] [review] playback-test: Support new skip seek flags Support the new SEEK_TRICKMODE_KEY_UNITS and SEEK_TRICKMODE_NO_AUDIO flags added to core
Created attachment 295451 [details] [review] avviddec: Implement SKIP and new SKIP_KEY_UNITS_ONLY flags When playing non-1.0 playback, respect any skip flags set in the incoming segment.
Updated all the attachments with some simple renaming of the flags constants to clarify them. These 3 flags cover most of what Tim asked for above, with easy future expansion for other cases. So far, we've used 9 bits of the GST_SEEK_FLAGS_* bits, so there's lots left.
Comment on attachment 295448 [details] [review] segment: Add new skip flags for clarifying trick mode playback. Looks good to me. One issue though: the define ensures backwards-compatibility for our C API, but I don't think it ensure the same for our g-i / python API. Not that I know why anyone would be using a flag that's not implemented yet, but perhaps we should keep it as part of the enum and set it to the same value. +1 for the FLAG_TRICKMODE* naming instead of FLAG_SKIP_, it makes things more discoverable for users IMHO and the names are less confusing then (e.g. SKIP_KEY_UNIT_ONLY - sounds like it means it should only decode delta frames). Please add a "(Since 1.6)" at the end of the @NEWFLAG: ... gtk-doc line. Otherwise good to push IMO.
Created attachment 295460 [details] [review] segment: Add new skip flags for clarifying trick mode playback. Add GST_SEEK_FLAG_TRICKMODE_KEY_UNITS and GST_SEEK_FLAG_TRICKMODE_NO_AUDIO, and rename GST_SEEK_FLAG_SKIP to GST_SEEK_FLAG_TRICKMODE (with backwards compat define). Do the same for the corresponding SEGMENT flags.
Pushed the core change.
Created attachment 295470 [details] [review] playback-test: Support new skip seek flags Support the new SEEK_TRICKMODE_KEY_UNITS and SEEK_TRICKMODE_NO_AUDIO flags added to core
Created attachment 295471 [details] [review] audio: Add fallback in audio sink for trickmode no-audio mode Add a last-ditch fallback in the audio sink to fill any incoming audio buffers with silence during trick-mode plays
The idea with this most recent patch is to make sure that SEEK_TRICKMODE_NO_AUDIO works in all cases - even with decoders that don't yet know not to decode audio. In that case, the sink will discard the received samples and render out some silence instead. There may be a more efficient way of getting the sinks to do that, but it's not clear to me, and I think the overhead of painting some silence is low enough to not care.
Created attachment 295472 [details] [review] avviddec: Implement SKIP and new SKIP_KEY_UNITS_ONLY flags Respect any skip flags set in the incoming segment.
Comment on attachment 295471 [details] [review] audio: Add fallback in audio sink for trickmode no-audio mode I think this is good to have as ultimate fallback. I don't think silence_fill should be an error, it's weird if that happens, but we can just move on, as the sink will fill in silence anyway most likely. I think it would also be nice to log to the PERFORMANCE debug category, 'cause we really want to shortcut/skip things earlier in the pipeline I think.
Created attachment 295560 [details] [review] baseaudiosink: drop GAP buffers, or all buffers in trickmode no-audio mode Make the base audio sink throw away buffers marked GAP, or all incoming buffers when performing a trick play with GST_SEGMENT_TRICKMODE_NO_AUDIO flag set, and make sure to start the ringbuffer when that happens so the clock starts running.
Replaced the audiobasesink patch with one that simply discards buffers and makes sure the ringbuffer is running, instead of painting silence samples.
Created attachment 295610 [details] [review] audiobasesink: drop GAP buffers, or all buffers in trickmode no-audio mode Make the base audio sink throw away buffers marked GAP, or all incoming buffers when performing a trick play with GST_SEGMENT_TRICKMODE_NO_AUDIO flag set, and make sure to start the ringbuffer when that happens so the clock starts running. Preserve the timing calculations when rendering, so state is all updated the same, but just don't render samples.
Created attachment 295612 [details] [review] audiodecoder: Where possible, skip decode for GST_SEGMENT_FLAG_TRICKMODE_NO_AUDIO If we have timestamps on input buffers and are in trickmode no-audio mode, then don't pass anything to the subclass for decode and simply send gap events downstream
Review of attachment 295612 [details] [review]: Actually, this patch sucks. Forward playback seems to work OK, but for reverse playback, the GAP events would need to be sent in reverse order like buffers are.
Comment on attachment 295610 [details] [review] audiobasesink: drop GAP buffers, or all buffers in trickmode no-audio mode Pushing the base sink patch.
Created attachment 296176 [details] [review] audiobasesink: Re-work GAP buffer and trick-mode handling In trickmode no-audio mode, or when receiving a GAP buffer, discard the contents and render as a GAP event instead. Make sure when rendering a gap event that the ring buffer will restart on PAUSED->PLAYING by setting the eos_rendering flag. This mostly reverts commit 8557ee and replaces it. The problem with the previous approach is that it hangs in wait_preroll() on a PLAYING-PAUSED transition because it doesn't commit state properly.
Created attachment 296177 [details] [review] audiodecoder: Where possible, skip decode for GST_SEGMENT_FLAG_TRICKMODE_NO_AUDIO If we have timestamps on input buffers and are in trickmode no-audio mode, then don't pass anything to the subclass for decode and simply send gap events downstream Only for forward playback for now - reverse requires accumulating GAP events and pushing out in reverse order.
(In reply to comment #31) > Created an attachment (id=296176) [details] [review] > audiobasesink: Re-work GAP buffer and trick-mode handling > > In trickmode no-audio mode, or when receiving a GAP buffer, > discard the contents and render as a GAP event instead. > > Make sure when rendering a gap event that the ring buffer will > restart on PAUSED->PLAYING by setting the eos_rendering flag. > > This mostly reverts commit 8557ee and replaces it. The problem > with the previous approach is that it hangs in wait_preroll() > on a PLAYING-PAUSED transition because it doesn't commit state > properly. It feels like this change is related to bug #736655. Which is having problems with GAP events and pre-rolling of the audio sink (and maybe base sink in general?), also not prerolling again after PLAYING->PAUSED->PLAYING transitions when the last thing was a GAP event. Can you check?
Yes, this will probably help with that issue too by not requiring preroll when changing PLAYING->PAUSED->PLAYING when handling a GAP buffer or event - but still requires that streamsynchroniser sends a GAP event downstream for the short audio stream
Cool, SKIP mode is very useful in embed platform. We implement it in tricky way (implement it only in demuxer). But I wander can demuxer handle QOS? Video decoder can handle QOS and can drop B or drop BP. Can demuxer handle QOS to only output key frame to reduce file fetch? This will achieve SKIP automatically. The reason of the proposal is application hard to decide which speed need SKIP mode. Little resolution clip needn't SKIP even high rate playback. Large resolution clip need SKIP even low rate playback.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/73.