GNOME Bugzilla – Bug 640610
basesink: QoS events are wrong in live pipelines
Last modified: 2019-01-06 03:01:27 UTC
We have two problem cases: 1. "videotestsrc is-live=1 ! xvimagesink" If one looks at the QoS events this generates, "diff" is always > 0 .. So it thinks all frames are late. Also, "proportion" is always just slightly about 1.. But in fact, my computer is very fast and could probably do much more processing. So "proportion" should be close to 0. 2. "videotestsrc is-live=1 ! element-that-drops-framerate-and-duration ! xvimagesink" That magical element could be a network using RTP. Since the sink doesn't know the framerate or duration, the "stop" time is always NONE. So the only thing that prevents frames from being dropped is the "max-lateness", which is often not enough if there is enough processing hapenning. Also, in the QoS event, the "proportion" computation is completely off. Especially if it knows the duration of some frames but not others. I think the cause of these two problems is that we ignore the compute latency. I think there should be a global variable somewhere that is added to the latency of every pipeline if it is live instead of ignoring the problem like we currently do.
Comments regarding the first pipeline: * It's just above 1.0 because videotestsrc doesn't report the image creation time (compute latency). * For a live pipeline of both consumers and produces (once you take into account latency) the ideal target is indeed 1.0. 1.0 means the buffers are always arriving "on-time" in the sinks (not too early, not too late). * If videotestsrc was to take into account the average compute latency (after some buffers) and report that you'd end up with a perfect 1.0 ... But you don't want to report that compute latency ! If you have a cpu usage spike all of a sudden, you *want* to drop to catch up. If you start reporting the cpu usage as latency... you'll never drop :)
Created attachment 179362 [details] [review] basesink: calculate rstop when get_times() can't The solution to the second problem IMO is to calculate the stop time by using the previous start time as reference.
For the record, I dislike any solution that estimates the current framerate. Maybe if its unknown, basesink should just never drop buffers and let upstream elements do it based on QoS events ?
The problem with having proportion tend towards 1.0 in a live pipeline is that upstream elements can't know if they're able to do extra processing or not..
Also, I didn't mean report the compute latency, I meant allow for some compute latency before we start dropping frames... Currently it does that in secret (by allowing fps/1 of compute latency), but without a known fps this fails. We should have an acceptable compute latency instead.
And how would this "compute latency" be calculated. Should each encoder/decoder be aware how much maximum latency the algorithm would take for the specified settings? The only way to find that out is to actually measure different configurations, make some guesses and write some heuristics. Do we expect everyone writing codecs to do that? People would either put random values, or calculate this latency on-the-fly. And if this latency is calculated on-the-fly, as Edward mentioned, there might be spikes, and which point you would be dropping buffers unnecessarily. Supposing an extreme case where the "compute latency" is calculated by perfect heuristics based on profile, level, framesize, etc. and there's a very low framerate. Chances are this "compute latency" would be way below the real limit (second / fps). If something else in the system slows things down we again might end up dropping buffers unnecessarily. My proposed patch doesn't have any of those issues.
No, my idea is that the "acceptable compute latency" is an arbitrary number decided by the application which knows how much is acceptable and how much is not. If the CPU can't keep up, frames will be dropped (and as early as possible in the pipeline if encoder/decoders use the QoS events). Edward also came up with another simple case which always drops frames: videotestsrc is-live=True ! jpegenc ! jpegdec ! xvimagesink
Comment on attachment 179362 [details] [review] basesink: calculate rstop when get_times() can't This is unrelated to the issue at hand. Even with valid framerate and buffer durations we would still drop frames.
(In reply to comment #8) > (From update of attachment 179362 [details] [review]) > This is unrelated to the issue at hand. Even with valid framerate and buffer > durations we would still drop frames. Ok, created bug #640665 then, to make it the issue at hand there.
Been pondering this some more. From part-latency.txt: "The latency is the time it takes for a sample captured at timestamp 0 to reach the sink." The Fact is that elements that report latencies can only report latencies for which they are certain (in the case of an audio/video sources how big the capture buffers are, in the case of decoders how many buffers they will need to buffer up before they can decode them, in the case of rtpjitterbuffer how long it will delay the packets,...). Those are *known* latencies (which can be updated, but there are determined/explicit formulas for calculating them based on the input). They do *not* depend on cpu speed/availability. If you had an element that was doing some processing backed with a dsp with guaranteed/constant processing time based on input, you *could* report that latency (audio processing algorithms come to mind here). The reason we don't add the processing time is that we can't know for sure whether that processing time is too much or not (you might be sharing the cpu with other processes who are doing cpu intensive tasks, the codec might be going through a high processing phase, ...). What we *should* end up with is: sync_time = running_timestamp + reported_delay_latency (+ expected_processing_latency) Where reported_delay_latency is what the latency GstQuery currently returns. The problem is : How do we calculate that expected_processing_latency over which we need to start dropping ? Elements can't figure that out on their own (expect in rare circumstances) since it would require figuring out the processing complexity based on input data and available cpu speed. Therefore we do need to offer a way for pipeline users to provide an expected_processing_latency. This will be somewhat tricky, since we'll need to ensure not to *exceed* the minimum of reported maximum latencies, else we'll end up with live sources under-running. Or maybe we shouldn't care and have application developer insert queue elements to compensate for that. Something like gst_bin_set_additional_min_latency() could do it. And it would increase the min_latency in bin_query_latency_done().
(In reply to comment #10) > The Fact is that elements that report latencies can only report latencies for > which they are certain (in the case of an audio/video sources how big the > capture buffers are, in the case of decoders how many buffers they will need to > buffer up before they can decode them, in the case of rtpjitterbuffer how long > it will delay the packets,...). > Those are *known* latencies (which can be updated, but there are > determined/explicit formulas for calculating them based on the input). They do > *not* depend on cpu speed/availability. But of course in order to calculate those latencies, the framerate is needed, which in the rtp example, it's not. > If you had an element that was doing some processing backed with a dsp with > guaranteed/constant processing time based on input, you *could* report that > latency (audio processing algorithms come to mind here). Just to bring this a bit closer to reality. The DSP is not a magic box, just like any other CPU, it cannot know beforehand exactly how much time a certain input need to be processed. The best you can say (if the DSP is ruining a real-tile OS), is that _after_ the processing you would receive the response after exactly a certain amount of time. However, by _you_ I mean the GPP, which of course runs linux, which is not real-time and there's tons of things that could delay such response. IOW; there's no way to report that latency.
(In reply to comment #11) > (In reply to comment #10) > > The Fact is that elements that report latencies can only report latencies for > > which they are certain (in the case of an audio/video sources how big the > > capture buffers are, in the case of decoders how many buffers they will need to > > buffer up before they can decode them, in the case of rtpjitterbuffer how long > > it will delay the packets,...). > > Those are *known* latencies (which can be updated, but there are > > determined/explicit formulas for calculating them based on the input). They do > > *not* depend on cpu speed/availability. > > But of course in order to calculate those latencies, the framerate is needed, > which in the rtp example, it's not. If you have latencies that can be calculated from the framerate, then I guess you have to know the framerate.. Or if its something like RTP, I guess you have to set an arbitrary deadline and drop frames if they are according to that deadline. > > If you had an element that was doing some processing backed with a dsp with > > guaranteed/constant processing time based on input, you *could* report that > > latency (audio processing algorithms come to mind here). > > Just to bring this a bit closer to reality. The DSP is not a magic box, just > like any other CPU, it cannot know beforehand exactly how much time a certain > input need to be processed. The best you can say (if the DSP is ruining a > real-tile OS), is that _after_ the processing you would receive the response > after exactly a certain amount of time. However, by _you_ I mean the GPP, which > of course runs linux, which is not real-time and there's tons of things that > could delay such response. > > IOW; there's no way to report that latency. For the processing/transfer latency related to the DSP, I think this is just like any other processing latency and that the application (maybe the platform?) should just use a higher value for the minimum_latency. Otherwise I think Edward's approach is sound. I would call is processing_deadline or something though or max_processing_time. Otherwise it's quite confusing if you don't understand the internals of GStreamer's latency processing.
Yes, deadline is a better naming.
And how do you calculate this "processing_deadline" if you have no duration? This deadline is essentially the rate at which the buffers are coming from the network; this buffer must finish processing before the next one comes. Of course, you never know when that is going to be. The best you can do is say "the amount of time it will take for the next buffer to arrive, is probably around the amount of time this buffer took to arrive". That is the most accurate "processing_deadline", anything else would be more unreliable.
No no, the deadline is not computed. It's arbitrarily decided based on the application. And I would say it can be more than then "duration" of one frame (assuming that some frames are easier to decode than others)..
(In reply to comment #15) > No no, the deadline is not computed. It's arbitrarily decided based on the > application. And I would say it can be more than then "duration" of one frame > (assuming that some frames are easier to decode than others).. And what if the application doesn't (e.g. gst-launch)? It would be 0, right? Or what if the application does it wrong (2 times the duration)? Everything would go to hell if the processing is too heavy, right?
I would suggest having a default that's not zero.. Maybe 50ms or something and maybe we could have an environment var to be able to change the default for the whole system/session. If the application sets it too high, it will just queue up more frames before playing them, so instead of dropping 1 frames out of 2 if you have half the necessary speed, then it would drops 2 out of 4 (for example).
Created attachment 302899 [details] [review] basesink: Account for max-lateness in QoS event QoS event contain contains the diff between the target time on the clock and the actual time it was received on the sink. When the diff is negative, element like fpsdisplaysink assumes that the buffer is dropped. The problem is that it is only dropped by the sink if the diff exceeds the max-lateness configured. Substracting this amount to the diff fixed fpsdisplay sink for the live case where are arrived on target or slightly after, while being all rendered.
This is a tentative fix, I'm looking for input about that.
I don't think that's correct. The buffers *are* too late, and the diff shows how much they are too late.
Created attachment 302900 [details] [review] basesink: Add processing deadline Proposing a different approach from Nicolas's, which matches what I suggested a couple years ago. In this approach, we just add an arbitrary number to the latency. My original idea was to do that in GstBin, but it may make more sense to do it in each sink, so they can change the default value. The processing deadline is the acceptable amount of time to process the media in a live pipeline before it reaches the sink. This is on top of the algorithmic latency that is normally reported by the latency query. This should make pipelines such as "v4lsrc ! xvimagesink" not claim that all frames are late in the QoS events. Ideally, this should replace max_lateness for most applications.
We could expose dropped/renderered count in the QoS event then, since diff will never tell if a the buffer was rendered or not. Actually, even with all these proposal, the droppped/rendered count will be wrong in fpsdisplaysink, since we might be running in sync=false.
Created attachment 314690 [details] [review] basesink: Add processing deadline Updated version of the patch which posts a WARNING message if the processing deadline exceeds what the pipeline can buffer. Do you think we should merge this? If not, why?
Review of attachment 314690 [details] [review]: ::: libs/gst/base/gstbasesink.c @@ +545,3 @@ + * for processing the buffer. This is added to the latency of live + * pipelines. + */ Since: 1.8 Maybe explain when you would want to use that, especially compared to max-lateness. @@ +1404,3 @@ + * Maximum amount of time (in nanoseconds) that the pipeline can take + * for processing the buffer. This is added to the latency of live + * pipelines. How would a subclass decide on a good value for this? Based on how much time it thinks it will take to get buffers from render() to the display? Or prepare() to render()? @@ +1406,3 @@ + * pipelines. + * + * This function is usually called by subclasses. Since: 1.8 @@ +1437,3 @@ + * the processing deadline. + * + * Returns: the processing deadline Since: 1.8
(In reply to Sebastian Dröge (slomo) from comment #24) > @@ +1404,3 @@ > + * Maximum amount of time (in nanoseconds) that the pipeline can take > + * for processing the buffer. This is added to the latency of live > + * pipelines. > > How would a subclass decide on a good value for this? Based on how much time > it thinks it will take to get buffers from render() to the display? Or > prepare() to render()? Maybe we shouldn't expose it, it's mostly meant for subclass that want to override the "processing-deadline" property. The value is really meant to be selected by the application as an acceptable deadline. My original idea was to set this and default max-lateness to 0, but I'm worried it would have unintented side-effects.
Coming back on this. After giving this more though, I think we should default this to 0 and have GstVideoSink do - max-lateness = 5ms - processing-deadline = 15ms This gives it some time to do decoding, but also gives it a bit of leeway. And it shouldn't modify the behaviour of existing pipelines except that they will report a bit more latency and give a warning if they have zero queuing. This also makes the frame submission times a lot smoother in the cases where there is no latency, so it's much likelier that it will block on the clock.
Created attachment 361967 [details] [review] basesink: Add processing deadline Same as previous version, except that it sets the default to 0.
Created attachment 361968 [details] [review] videosink: Set processing deadline to 15ms This roughly corresponds to one frame at 60fps, and leave 5ms of max_lateness to not change the existing behaviour.
Created attachment 372972 [details] [review] basesink: Add processing deadline The processing deadline is the acceptable amount of time to process the media in a live pipeline before it reaches the sink. This is on top of the algorithmic latency that is normally reported by the latency query. This should make pipelines such as "v4lsrc ! xvimagesink" not claim that all frames are late in the QoS events. Ideally, this should replace max_lateness for most applications.
Comment on attachment 372972 [details] [review] basesink: Add processing deadline Attachment 372972 [details] pushed as a7f9c80 - basesink: Add processing deadline
Attachment 361968 [details] pushed as 28e5f81 - videosink: Set processing deadline to 15ms
The test_pipeline_reset_start_time test in core has been failing since the basesink commit went in. gst/gstpipeline.c:622:F:pipeline tests:test_pipeline_reset_start_time:0: Pipeline position is not at least 50millisecond (reported 35339525 nanoseconds)
Ok, let's mark as blocker to make sure we don't forget.
So it fails because the pipeline used has now 20ms latency, which means over 55ms the pipeline position will only be 35ms. Though, I think that the default for processing-latency is wrong here, fakesink should not introduce 20ms latency. Specially that this value is from the VideoSink originally. We should leave the default to 0, and then VideoSink can set this to 20ms or something inline with the framerate, and AudioSink would set it to something inline with latency-time.
I agree we should make the default 0 to keep existing behaviour, but it's still weird that it's affecting the test pipeline that isn't live.
I went back to 1.14 and ran: GST_DEBUG="basesink:5" gst-launch-1.0 fakesrc do-timestamp=1 sizetype=fixed sizemax=4096 datarate=$((4096*100)) ! fakesink silent=0 sync=1 2>&1 | grep latency 0:00:00.022531958 28899 0x557046588520 DEBUG basesink gstbasesink.c:1168:gst_base_sink_query_latency:<fakesink0> we are ready for LATENCY query 0:00:00.022548676 28899 0x557046588520 DEBUG basesink gstbasesink.c:1210:gst_base_sink_query_latency:<fakesink0> latency query: live: 1, have_latency 1, upstream: 0, min 0:00:00.000000000, max 99:99:99.999999999 Notice "live: 1". Olivier's patch didn't change that behaviour. It's just that now we have 20ms instead of 0ms.
I didn't realize how exactly the latency code worked in basesink, I've fixed it now... The reason is I pushed the change to basesink for everyone is so that pipelines like udpsrc ! videodecoder ! udpsink, because before this fix, those pipelines end up dropping all packets unless QoS is involved. I've also added some unit tests for the processing-deadline: commit 4f3aea316acd38ce498dd7130db76e91a76490d7 Author: Olivier Crête <olivier.crete@collabora.com> Date: Mon Jul 16 11:51:05 2018 -0400 sink: Only add processing latency if upstream is live Only add it if upstream is live, otherwise leave the latency at 0. https://bugzilla.gnome.org/show_bug.cgi?id=640610 commit f79b871a6a400c411313eb82a71b3a3475c0ac99 Author: Olivier Crête <olivier.crete@collabora.com> Date: Mon Jul 16 11:50:36 2018 -0400 pipeline tests: Add test for processing latency
Just for my own understanding: this now delays the output of every sink that syncs against the clock by an extra 20ms, right? If upstream is live. e.g. gst-launch-1.0 v4l2src num-buffers=10 ! fakesink silent=false sync=true processing-deadline=3000000000 -v Takes 3 seconds to get started. Is this a good idea in terms of backwards compatibility? Or did I misunderstand something?
Only if you have a live source upstream. In that case it does add 20ms, but previously the real latency was actually controlled by max-lateness, so you would think there was no latency, but it was not actually true. And if you enabled QoS, it would drop every packet making it unusable. Yes it's a small change in behaviour, but I argue that the previous behaviour just didn't work at all.