GNOME Bugzilla – Bug 744106
Maximum latency handling broken everywhere
Last modified: 2015-10-01 13:38:01 UTC
I think we have a problem with the maximum latency. Let's first summarize what it means: The maximum latency is the maximum amount of time, that is allowed to block the data without overflowing fixed-size buffers in upstream elements and losing data. Now what we see in various elements is: 1) minimum and maximum latency are set to the same value, assuming that maximum latency means the maximum latency this element produces 2) minimum latency is set correctly, maximum to GST_CLOCK_TIME_NONE. As the element in question has no fixed size buffer (like most filters, decoders, etc) 3) the element's minimum latency is added to the upstream maximum latency if it is not NONE 4) the element's maximum latency is set to the minimum latency if the upstream maximum latency is NONE I think what would be correct is that the minimum latency is added to the maximum latency if the upstream maximum latency is not NONE. And if it is NONE, the minimum latency should not influence the maximum latency. Also every element that has a fixed size buffer should add its maximum latency defined by that to the upstream maximum latency if it is NONE, or set its own if the upstream one is NONE. However this also means that *all* queues that have a maximum limit set would have to do this. So the problems I see here right now is that elements are doing everything in various combinations wrong, queues are not setting the maximum latency (and often can't), base classes provide min/max latency setters but do wrong things internally (combination of 3) and 4) most often apparently). And to make things perfect, the maximum latency is currently only used to create a GST_ELEMENT_ERROR() in GstBin if it ever becomes bigger than the minimum latency. Due to all the confusion about what maximum latency is, this cause unnecessary error messages now. I see two ways forward here: 1) We fix everything to work as described above (assuming this is correct), at least every code under our control. This would also mean that base class behaviour would change. This all is arguably an ABI change, but as everything is broken right now anyway it can't become worse ;) 2) We deprecate the maximum latency and don't post an error message anymore. It's not used in useful ways currently, and there's too much code using it wrong in various ways Ideas? Did I misunderstand anything? :) If we decide on a solution here I'm ok with implementing it.
I think you're understanding of the maximum latency is not correct, what you're describing is the min latency. I understand it as the maximum amount of buffers (in time) that the element can accumulate. Each element is meant to just add it's min and max the result of the upstream query, for the max, NONE means "infinite" and is an absorbing value. 1) For an element like an encoder (or deinterlace) that has a fixed number of buffers inside of it, min=max=n. This element will not produce buffers before it has n data inside of it, and will block the input as soon as it reaches n data. 2) For a queue with min-threshold-*=max-size-*=0 that can grow without bounds, the min=0, max=NONE. 3) for an element with no latency (filters, like audioconvert/audioresample), min=max=0 I understand that the max latency is just a check to verify that you have enough queuing in your pipeline. For example, if for the audio you have the following pipeline: zerolatencysource ! audiodecoder_with_no_latency ! zerolatencysink zerolatencysource ! videodecoder_with_b_frames ! zerolatencysink Then the min and max latency from the audiosink query will be 0, but for the video sink, min=max=latency-of-the-decoder. Then GstBin will check that MAX(min)>=MIN(max) and that check will fail unless you add queue in the audio pipeline, because the video takes "latency-of-decoder" to get to the sink, so the audio has to be delayed by the same amount, but since there is no queues on the audio sink, there is nowhere to store that data.
I just re-read the design doc. I think we need to see why this is an impossible case: sink1: [20 - 20] sink2: [33 - 40] MAX (20, 33) = 33 MIN (20, 40) = 20 < 33 -> impossible sink1 query and sink2 query could be in a graph or two completly seperate pipeline in the same bin, it would not matter. Basically, what this query says, is that from sink1 point of view, the pipeline won't be able to hold on more then 20ms of multimedia data. While on sink2, it can hold up to 40ms of data. What the min latency says is that to be synchronise these stream, both sink need to reach a target latency of 33ms. So sink1 has no mean to synchronise both stream, since it can hold on enough data to compensate the other sink delay. At least that's what the doc seems to be trying to say. In this regard, max latency has nothing to do with blocking, it's in fact the amount of data that can be buffered over all the pipeline. That includes the latency of course and that's a good explanation of why in element contribution to latency, mininum contribution should never be bigger then maximum. If we take x264enc with it's latest patch, we would have: livesrc [10 - 10] ! x264 [5 - inifnit] ! tee name=t t. ! sink1 [5 - 5] t. ! sink2 [18 - 25] Previously, x264 would have set [5 - 5], which would have given a resulting [33, 20], an impossible situation. Now with the path, we get [33, infinit] a properly working pipeline. Now sadly I'd don't known the internal behaviour of x264, but if that is true, it means x264 has an infinit input queue. Great, since it solve the issue. But, One important fact about x264 element though is that it uses GstVideoDecoder base class. This class has this interesting limitation, is that it blocks input while pushing outputs. Considering we are live, input and output flow have very similar speed, hence it is likely that after the minimum latency has been reach, each input buffer will produce an output buffer (I voluntarily wave out interlacing here). What this mean, is that whatever internal implementation details of x264, the element will never accumulate an inifit amount of buffer. It will in fact block. The result is that the sink1 will have accumulate 5ms, the livesrc 10 and x264 5ms (maybe plus 1 if we are lucky). This is 20ms, but we need to hold on 33ms, it's in fact a situation impossible to synchronise. In this regard, what would have failed here, will fail differently with the current change. So in this case, deprecating max latency, that would server at sending an eror "please add a queue in your pipeline", would in fact be expressed as a pipeline with a massive amount of data being dropped, because there is no where to store them while delaying the playback.
I think our main disagreement here in your example would be what the videocoder does with the latency query. We agree that the source reports [0,0], right? And the decoder has minimum latency X, the *maximum* time it will possibly delay a frame due to reordering (not the minimum time, because we take this value to tell the sink to at least wait that much... and if it was the minimum, almost all frames will be late). Now the disagreement is about the maximum latency. Does the decoder have a maximum latency of -1, or of Y>=X? a) My original reasoning was that it would be -1, because the decoder itself does not care about how much latency downstream will add, while e.g. the audio source cares a lot because it continuously fills a ringbuffer that will just overflow if downstream adds too much latency (i.e. waits too long before unblocking). However the decoder will have at least X of internal buffering (how would it otherwise delay by X?), thus would increase the maximum latency by X as it adds some buffering to the pipeline. So in my case the decoder will change the maximum latency in the query to X, i.e. telling downstream [X,X]. b) I think what you say is that the decoder has maximum latency Y>=X. Because it has some internal buffering that would allow everything to be delayed more. So would tell downstream a maximum latency of Y, overall giving [X,Y]. If it had -1 in your understanding, it would return [X,-1] because -1 would mean that it has an infinitely large buffer internally. Now what seems odd to me is that in your case b) the meaning of maximum latency is different in the source than it is in the decoder. In the source it means the maximum amount of time that downstream can block until data gets lost. In the decoder it means the maximum amount of time the decoder can buffer things before it blocks (and not the maximum amount of time until data gets lost, because that's infinity for this specific decoder). I my case the maximum latency has the meaning of the maximum time that downstream can delay until data gets lost in all cases. However any element that has internal buffering will add this amount of buffering to it (which is not the maximum latency, but how much it *improves* or increases the maximum latency). A decoder in my understanding would have a maximum latency by itself if it can't block downstream indefinitely without losing data. Which would mean that the decoder would *reduce* the maximum latency by that amount. And in general such element will have its own internal buffering, so it would increase the maximum latency in the query by the amount of buffering, and decrease it by its own maximum latency. I'm not sure which of the two versions is correct, but in both cases we would need to change a lot of code :) I prefer your version because it is simpler, but OTOH the two different meanings of maximum latency seems odd to me.
My understanding would also mean that we need a third parameter for e.g. gst_video_decoder_set_latency(). The minimum latency, the internal buffer size and the maximum latency.
It's not yet what I mean. The latency as designed (and as documented) is something that can only make sense per sink. There is no per element blocking/dropping buffer. Blocking or dropping is only the effect. What I use to describe the parameters in gst_video_decoder_set_latency() and similar method is to call it the "contribution" to the global latency. Also, the decoder and the source latency is exactly the same thing if you stop describing it in blocks and drops way. Take v4l2 src. The latency is generally configured to 1 (because we don't now and we want to give it some time, unless the driver says otherwise, but that's recent). It cannot be zero, since you have to store the frame entirely before delivering it. If it was more then zero, the only effect would be that synchronization with other srcs would not be right. This is the same idea with audio buffer. We will fill an entire fixed size audio chunk before we let it go which cause latency. The maximum latency of v4l2src, is the size of the capture queue we allocate. So it's actually configurable. In your term, if this buffer is bigger, then downstream can delay longer the rendering. Downstream being to block is the effect of having a bigger queue inside the src. When the pipeline starts delaying more, then buffers start to get lost. Depending on the chosen algorithm to drop buffers, you may endup not rendering a single frame, because they are all late. For the v4l2decoder, the min latency is how much buffer the driver will accumulate before it let the first buffer out. It's the actually the codec observation size. It impose a minimum buffering. Decoder could be smarter, but this is what they nearly all do. Remains that the min latency has to be the worst case, so I think this is right value and we agree. The decoder maximum latency, in the way I described it is exactly the same a for v4l2src. Except that it's the total of frames that can be held inside of the decoder. If the base class was not preventing it, it would be expressed in term of the maximum between the input and the output queue size. Though we can't, since the decoder base class prevents this parallelism. That's why all decoder using that base class must set this [X - X]. A decoder without latency, would not contribute, as the query is a some, it's natural that these value are [0 - 0], no [0 infinity/-1] Now what I'm wondering is if we aren't disagreeing on the meaning of -1. I always assumed -1 is infinity, not "I don't know". An element that don't know, should pick something sensible, or assume it's not contributing any latency. In simple filter cases, where you get a buffer, do transform and let the buffer go. "I don't know" means I don't contribute to the latency in any ways. Which is nice, because all it has to do is do nothing (just forward the query). An element like a decoder, can't pretend to contribute to min latency without contributing a max latency of at least the same value. That would be a lie. But considering "I don't know" as infinit is to me a bigger lie.
OK, I got it. Both are actually the same because all buffering in live pipelines is leaky, even if the element just blocks. And every element can only have a positive impact or 0 on the max latency. So what Olivier says makes most sense as it is simpler. I'll fix everything accordingly tomorrow, especially the base classes. Will put it in a branch for review :) Sorry for being a bit short, on my phone right now. Had the insight while driving ;)
I think at least the videoencoder/decoder base classes are doing the right thing with the latency query - if upstream reported max latency -1, it is replaced by the encoder or decoder's max latency (if any), otherwise total_max_latency += local_max_latency. And there are appropriate checks in gst_{de|en}coder_set_latency that max >= min. The only real problem I see is that queue uses -1 to mean 'infinite' for max latency, and every other user of the query interprets it as 'no info yet' - since the default value for max latency in the query is -1. That is, there's no way when handling a latency query for an element to differentiate between an upstream queue with max-size-time=0 (which can buffer infinitely and therefore sets max = -1) and an upstream which didn't override the maximum latency value - but why would that ever happen? Any element that handles latency query should update the max latency if it's -1, right? It seems like it is ambiguous to pass a latency query upstream and have max returned as -1 since we can't know whether we can override that value. An alternative is that only source elements are allowed to override a max_latency=-1 in a query - as it hasn't been set yet, and that all downstream elements then need to assume -1 means upstream can do infinite buffering. The sticky point with that is queues which have max-buffer-bytes/buffers, but max-size-time=0 - at the moment that would make queue report infinite max latency (infinite buffering capability), but obviously that isn't true. Queue can't make a sensible estimate of max-latency in that case, and it should probably warn or something in live pipelines if it ever hits the bytes/buffers limit before it hits the time limit. The end result is: * A pipeline can't necessarily estimate max latency - in which case it should pick min-latency and hope for the best in terms of compatibility of latency in different chains * There's an ambiguity between 2 different interpretations of max-latency=-1, queue uses it to mean 'infinite buffering, therefore infinite max-latency' and everywhere else uses it to mean 'noone set this to anything sensible, so you should assume max-latency = min-latency'
I think that doesn't matter. Only the source should override -1, anything else just keeps it as infinity. If any element has infinite max latency, the complete chain has it. Which is currently handled wrong in a few places.
@jan: I think if an element can't know it's latency, it should just return FALSE to the query.. as it's not usable in a live pipeline.
And for the rest, we could considering adding a new GST_CLOCK_TIME_INFINITY (that secretly maps to NONE for backward compatibility). Would help readability.
(In reply to comment #6) > OK, I got it. Both are actually the same because all buffering in live > pipelines is leaky, even if the element just blocks. And every element can only > have a positive impact or 0 on the max latency. That statement is actually wrong, bringing as back to square one. E.g. consider an unbound queue followed by a leaky queue. The unbound queue would report -1=infinity as maximum latency, while the leaky queue would bring that down to its own limit again. So we definitely need something to distinguish blocking and leaky queueing for the latency reporting, instead of just having a single number. Now Nicolas said that blocking or leaking does not make a difference and I disagree with that. In the above example we should all agree that an unbound queue would increase the maximum latency to infinity. However if we afterwards have a leaky queue that can only hold 1s, our effective maximum latency is 1s. If we configure a latency >1s, the leaky queue will start leaking and things go wrong. But if instead we have a non-leaky queue with a 1s limit after the unbound queue, we still have a effective maximum latency of infinity. So at least this (and that our queues don't report latency properly) needs to be fixed somehow. What Olivier said in comment #2 would only be correct if we always assume that nothing but a source has leaky buffering, but that's already not true with our audio sinks. And we definitely need to document the expected behaviour in more detail in the documentation. Maybe we can just consider it the normal case that elements don't have leaky buffering, in which case we can let the base classes and elements behave like Olivier said. And elements that are actually having leaky buffering, will have to override the latency query handling. In specific this would mean that sources would always set whatever latency they have, and especially don't set -1 as maximum latency unless they really have an unbound buffer internally. And that filter/decoder/encoder/etc base classes do: > if (upstream_max == -1 || own_max == -1) > max = -1; > else > max = upstream_max + own_max; Because if either upstream has infinite buffering or the element itself, the buffering will be infinite. And otherwise we just add our own limited amount of (non-leaky!) buffering to the upstream value (and own_max >= own_min, always... so many elements will set it to the same value). If an element has leaky buffering, it would instead do something like > max = MIN (upstream_max, own_max) in the overridden latency query handling. Sounds like a plan, makes sense? For Jan's problem, I think a source can consider the query content unset and just sets whatever it wants in there without reading anything. But there actually is a problem further downstream if upstream does not introduce any latency and keeps the values unset (i.e. [0,-1]). Maybe downstream elements can assume that [0,-1] means that both are unset? And everything else means both are set? However a live source that does not set any latency would seem quite weird, and maybe this is really a case like Olivier mentioned in comment #9.
I think queues that have min/max set in bytes/buffers should try to use the convert query to transform that into time, if that fails, then they should just fail the latency query, which means that they aren't fit for use in a live pipeline. That would probably break the current usage of queues in decodebin, but I think it's already broken as it can cause the pipeline to just drop every buffer if they queue for longer than the declared latency, and it provides no jitterbuffering at all in these cases anyway as all the queues will never fill up higher than the min. My "hack" in these cases is to implement the latency query in the application and set a value higher than min (but lower than max). I guess the queues (multiqueue?) could instead have a "error-above-bytes" and "error-above-buffers", which would emit an error message, if the pipeline can't preroll without buffering more than the limited number of buffers or bytes, then it should fail instead of just blocking forever or randomly dropping things. I admit I hadn't though of leaky queues, as I have yet to hear a good use-case for them, I generally feel that the QOS event handling is what you want instead. That said, having them reset the max-latency to their max latency is probably right, as it means nothing will ever be buffered upstream of a leaky queue. I think GstBaseSrc should set the default to [0,0] for "instantaneous" live sources (videotestsrc, udpsrc, etc), as they never buffer anything and create buffers "right now". Sources with a fixed latency, should set [n,n], etc. That said, I'd really like to see Wim's input as he designed this latency thing in the first place and put the max=-1 in basesrc.
All this seems to make sense now, except for the max=-1 in basesrc. Let's wait for Wim's comment. Meanwhile: > max = MIN (upstream_max, own_max) My impression was that we should completely ignore upstream max, and I can't see why it make a difference if upstream_max is smaller then own max.
Maybe I should mention something: ... ! decoder ! queue ! sink It is not because the queue thinks it can accumulate N ms of data (hence contribute to max latency), that it will be possible for the queue to be filled. Buffer pools might have maximum of buffers that prevent this. I think it's all linked together. I wonder what should be the behaviour of the latency query/message from the decoder (regardless if it's is own or downstream, the decoder decide the allocation and pool).
(In reply to comment #12) > That said, I'd really like to see Wim's input as he designed this latency thing > in the first place and put the max=-1 in basesrc. Same here, and the max=-1 in basesrc is one of the things that currently seem wrong to me. Which is why I CCd Wim in this bug from the beginning :) I'll talk to him tomorrow. About the queues I agree, and also otherwise we seem to agree? Regarding leaky queues... audioringbuffer is also one, and we have it in every audio source and sink. v4l2src also internally has a leaky queue (it allocates a maximum number of buffers, if downstream is blocked for more than that it can't capture anything for a while and behaves like a leaky-upstream queue). (In reply to comment #13) > All this seems to make sense now, except for the max=-1 in basesrc. Let's wait > for Wim's comment. Meanwhile: > > > max = MIN (upstream_max, own_max) > > My impression was that we should completely ignore upstream max, and I can't > see why it make a difference if upstream_max is smaller then own max. Why should we completely ignore upstream max? In which cases? If any element has unlimited buffering? Maybe, maybe not. I'm not entirely sure. Consider the case of > unbound-queue ! leaky-queue ! some-element ! unbound-queue where leaky-queue has X max latency. Now if some-element blocks for more than X, things will go wrong. But some-element would have no information about that in your case, because the unbound queue in the beginning would make max latency infinite. So some-element could just decide to wait >X on its sinkpad before unblocking. An example of such a element could be any aggregator based element, which also has other sinkpads and on the other sinkpads has a min latency >X. There would be no way to detect inside the element that the latency configuration is impossible (aggregator has max<min detection code like GstBin), and also GstBin would have no way to know this as it will get a infinite maximum latency for this part of the pipeline. (In reply to comment #14) > Maybe I should mention something: > > ... ! decoder ! queue ! sink > > It is not because the queue thinks it can accumulate N ms of data (hence > contribute to max latency), that it will be possible for the queue to be > filled. Buffer pools might have maximum of buffers that prevent this. I think > it's all linked together. I wonder what should be the behaviour of the latency > query/message from the decoder (regardless if it's is own or downstream, the > decoder decide the allocation and pool). Yes, the allocation query with the min/max buffers count seems to somehow be connected to the latency too. But I think it's a separate problem that can be solved independently. Basically the max buffers count could limit the maximum latency of the element, depending on what other buffering mechanisms the element has.
Wim confirmed this on IRC, so I'll get it all fixed later and make part-latency.txt a bit more detailed :)
<wtay> slomo, yeah, I saw it. popcorn time <wtay> slomo, I think the only problem is that the default for basesrc should be 0,0 instead of 0,-1 <slomo> wtay: that might be one change, but how should overall be the expected behaviour? -1 meaning infinite maximum latency, meaning that usually elements just pass that through... unless they themselves have leaky queueing behaviour like audioringbuffer, in which case they would reset it to their queue size? also do you agree that all our queues should do something with the latency query somehow? :) <wtay> slomo, yes, -1 in infinity and stays that way unless you have a leaky queue that actually throws away things <wtay> slomo, queues should do something with the query. I think queue does (but only when time limits are set, bytes and buffer limits can't be taken into account) <slomo> they could be taken into account if upstream properly implements the convert query i guess :) <wtay> probably <slomo> ok, i think then it all makes sense now and needs some clarification in part-latency.txt :) i'll do that later and fix all the code that is currently doing weird things <slomo> thanks
This should all be fixed now in our modules, including documentation update. Following commits in core and others in the other modules. What's missing is using the CONVERT query in queue, and doing things in the other queues (multiqueue, queue2). Will create new bugs for that. Also opusdec is a bit weird, yet another bug will be created :) commit 4a5ce862a2a36bd0ec20fe63a1b92731bc2ae205 Author: Sebastian Dröge <sebastian@centricular.com> Date: Wed Feb 11 13:41:56 2015 +0100 Improve and fix LATENCY query handling This now follows the design docs everywhere. https://bugzilla.gnome.org/show_bug.cgi?id=744106 commit ee8d67ef2c1ef968eddc8e71d6ac5f83688c125b Author: Sebastian Dröge <sebastian@centricular.com> Date: Wed Feb 11 12:20:39 2015 +0100 design/part-latency: Add more details about min/max latency handling These docs missed many details that were not obvious and because of that handled in a few different, incompatible ways in different elements and base classes. https://bugzilla.gnome.org/show_bug.cgi?id=744106