GNOME Bugzilla – Bug 791285
gstaggregator with start-time-selection=first uses incorrect base_time when waiting for data on sink pads
Last modified: 2018-11-03 12:43:30 UTC
Created attachment 365067 [details] [review] possible solution Hi I'm testing farstream rtp conference with audiomixer. | farstream participant 1 | -> | audiomixer | -> | farstream participant 2| It was find out that in case of jitterbuffer mode=none audiomixer drops incomming audio packets almost since it start. As far as I understand it happens when audiomixer (gstaggregator) moves to live mode and waits for some time if there are no data on input pads. The problem is that in case gstaggregator's property start-time-selection is set to 'first' function gst_aggregator_wait_and_check() calculates wake time the following way: time = base_time + start + latency; where: start - a time when the next buffer has to be sent (i.e it is pts of buffer has to be sent next) latency - latency of element base_time - time when element goes to PLAYING mode. I assume that the problem is that for aggregator with start-time-selection=first base_time is not equal with the time when the element produces the first buffer pts=0.00000000. I.e. we have some time lag between moments when aggregator moves to state PLAYING and the moment when it receives the first buffer on it's sinks and produces the first buffer pts=0.00000000 over it's src pad. I have introduced some traces and I see that time between sending aggregated buffer pts 0.10000000 (the last buffer which sink pad had received this buffer) and starts mixing and sending aggregated buffer pts 0.12000000 performs without wait: 0:00:01.763980665 45 0x56416f77eed0 DEBUG aggregator gstaggregator.c:690:gst_aggregator_wait_and_check:<output_mixer_0> possibly waiting for clock to reach 2887:08:36.661652285 (base 2887:08:36.161652285 start 0:00:00.120000000 latency 0:00:00.380000000 current 2887:08:36.895968312) as you can see the current time is more than wake time so there is no delay between sending aggregated buffer pts 0.10000000 and sending empty buffer pts 0.12000000 My proposal is to set gstaggregator's base_time to the time when it produces first buffer on it's src pad. Infoutunately audiomixers's source code is bit difficult for me. Could you please review my patch and let me know if it can broke something else?
Created attachment 365068 [details] audiomixer traces
Review of attachment 365067 [details] [review]: So no, you cannot change base_time in one element and not the other. All elements in a pipeline must share the same base time. To affect the base time, am element need to delay the transition to playing state, which may not be possible. Yet, I still need to red more about this issue before proposing any proper solution.
I wonder if it would be much work to make a small unit test for this?
Tim, is your question addressed to me?
It was more of a nudge really, but yes :) I think it would help get it fixed.
Created attachment 365249 [details] [review] unit test I have prepared some test where I check volume level on audiomixer's src. It reproduces the problem: 0:00:05.036677055 153 0x56288753dca0 DEBUG aggregator gstaggregator.c:678:gst_aggregator_wait_and_check:<audiomixer> possibly waiting for clock to reach 2953:10:33.360612164 (base 2953:10:33.128403831 start 0:00:00.200875000 latency 0:00:00.031333333 current 2953:10:38.142885803) 0:00:05.036748629 153 0x56288753dca0 DEBUG aggregator gstaggregator.c:678:gst_aggregator_wait_and_check:<audiomixer> possibly waiting for clock to reach 2953:10:33.370612164 (base 2953:10:33.128403831 start 0:00:00.210875000 latency 0:00:00.031333333 current 2953:10:38.142956734) please note that I have not checked code of the test for memory leaks and coding style rules. Is it possibe to solve the problem by storing absolute time of audiomixer first buffer and waiting incoming buffers from this time instead of base time?
(In reply to Tim-Philipp Müller from comment #5) > It was more of a nudge really, but yes :) I think it would help get it fixed. do you need something else from my side?
No, I don't think so. Many thanks for that, I think that's very helpful.
A small detail about the test, it introduces a delay of 5s, but the normally live sources shouldn't keep buffers for that long, so if there is an actual 5s delay, we also expect the buffer timestamp to reflect about 5s in running-time.
(In reply to Nicolas Dufresne (stormer) from comment #9) > A small detail about the test, it introduces a delay of 5s, but the normally > live sources shouldn't keep buffers for that long, so if there is an actual > 5s delay, we also expect the buffer timestamp to reflect about 5s in > running-time. in my case I had 2 pipelines with rtp cooperation between them, I assume that in this case delay in few seconds is more or less realistic scenario. I implemented test case such way because I didnt know how to reproduce this problem in test envirnment without delay. There is only 2 major differences between the issue in unit test and my test case with rtp: in both cases audiomixer produces the silence, but in rtp case audio dissapears permanently
"disapears permanantly" probably means the audio is always late, AudioAggregator is a bit too silent about dropping audio. If you have two stream, with both having timestamp (running-time) that starts at 0, but one is arriving 5 seconds later, it means you have 5 second latency in one of the streams, and you have to enable enough buffering in the aggregator (see latency property).
(In reply to Nicolas Dufresne (stormer) from comment #11) > "disapears permanantly" probably means the audio is always late, > AudioAggregator is a bit too silent about dropping audio. > > If you have two stream, with both having timestamp (running-time) that > starts at 0, but one is arriving 5 seconds later, it means you have 5 second > latency in one of the streams, and you have to enable enough buffering in > the aggregator (see latency property). not sure I understand you correctly. In may case I have two stream: 1) first is incoming to audiomixer.sink_0 2) second is outgoing, it produces from audiomixer.src I assumed that they will be synchronized by audiomixer's 'start-time- selection=first' option. Also I expected that 'latency' (of audiomixer) is used for 'smooth' mixing of audio in case of multiple incoming streams (sink_0, sink_1 ...) Am I wrong?
The start-time is use to decide when to stop dropping incoming buffers and what timestamp should the first buffer (on src pad) should have. So if you set it to 0, all buffers will be included. An possibly unwanted side effect is that if you have a source that produce it's first buffer 5s after we switched to playing state, then it will produce 5s of silence in a burst. Them start-time mode first comes into play. With that mode, the mixer should set it's start time to the first timestamp that arrives. It only consider 1 buffer on one pad, and make this the first timestamp of the output. When you want to let you live pipeline run for a while (e.g. let it stabilize), you can set a start time in the future, the mixer will then drop buffers until they are at least that time, and then will start mixing. The first output produce will be at least that time. In your test, you introduce 5 second delay on the first buffer, that means it's 5 second late. audiotestsrc never drops anything, so for a while, all buffers will be late compare to the already processed stream (the one that had defined the start-time, but being first). Unlike a real use case of latency, it should eventually catch up.
all I understand from your message is that audiomixer is not out-of-the-box stuff for live audiomixing :( audiomixer in live mode is on the edge of real time and pts time, and it tries to map incomming data pts to real time (=outcoming pts). I see the following facts that is unclear for me: 1) If audiomixer receives first incomming buffer which is already delayed why it produces it? And audiomixer processes few first buffers the same way. It drops data starting from ~120ms (my tests case) 2) In case of start-time mode first when audiomixer produces the first output buffer (in my case pts=0) why delay between time of producing next output buffers is significantly less than theirs durations? I assume in live mode pst timestamps of output buffers should correlate somehow with real time.
I'm not sure I understand 1 and 2 here. In live mode, audiomixer no longer waits for upstream data. It calculate when is the deadline for data to be produce in order for that data to be on-time at the sink. If start-time is 0, it will setup the timer immediatly when we go to playing (at least this what it should do, I think this is buggy). In start-time first, it will setup the timer when the first buffer comes in, base on when that buffer need to be mixed and ready to go to reach the sink on time. And then with start-time set, we set to timeout base on that, no need to look at the buffers. All this is base on the declared latency of upstream segment. In live mode, we pop data from the internal queues and mix them. If a source is having a huge latency (like in your test, 5s) that is not declared, then that source will be ignored from the mixing. The unit test helps understand, but does not seem valid. With RTP, if you have this case, you can increase the latency on rtpjitterbuffer element. What is not out-of-the box is this latency, we don't have a dynamic latency mode in GStreamer.
(In reply to Nicolas Dufresne (stormer) from comment #15) I'm sorry, my lack of gstreamer competence make it difficult to explain in a good way > I'm not sure I understand 1 and 2 here. In live mode, audiomixer no longer > waits for upstream data. Most likley I have misundesrtanding here. Could you please describe what kind of upstream data do you mean? I expected that audiomixer should wait for downstream buffers. (I have some upstream/downstream picture in my mind https://gstreamer.freedesktop.org/documentation/design/overview.html#introduction) And in live mode audiomixer waits for downstream buffers also, but just for some time. > ...It calculate when is the deadline for data to be > produce in order for that data to be on-time at the sink. Definitely we define the term 'on time' different way. According audiomixer's logic output buffer has to be sent before the moment of time: base_time + next_output_buffer_pts + latency, where base_time is the absolute time when audiomixer moved its state to PLAYING It means that as soon audiomixer is started it tries to produce aggregated buffers from absolute base_time moment till the current moment of absolute time, and it pushes this aggregated buffers as fast as it can. That leads to bursting of buffers with silent right after audiomixer receives the first buffer. I.e audiomixer produces on its src pad the following buffers: buffer 1 pts 0.00 duration 0.02 at real time t (buffer has data from the sink_0 pad) buffer 2 pts 0.02 duration 0.02 at real time t+0.0001ms(buffer has data from the sink_0 pad) buffer 3 pts 0.04 duration 0.02 at real time t+0.0001ms(buffer has data from the sink_0 pad) buffer 4 pts 0.06 duration 0.02 at real time t+0.0002ms(buffer has data from the sink_0 pad) buffer 5 pts 0.08 duration 0.02 at real time t+0.0002ms(buffer has data from the sink_0 pad) buffer 6 pts 0.10 duration 0.02 at real time t+0.0002ms(buffer has data from the sink_0 pad) buffer 7 pts 0.12 duration 0.02 at real time t+0.0003ms(buffer has data from the sink_0 pad) buffer 8 pts 0.14 duration 0.02 at real time t+0.0003ms(buffer has silence, because there is no data on sink_0 pad, it has no arrived yet) buffer 9 pts 0.14 duration 0.02 at real time t+0.0004ms(buffer has silence, because there is no data on sink_0 pad, it has no arrived yet) ... (I tried to explain it in my previous message item 1) I assume that 'on time' should be calculated the following way: first_incoming_buffer_time + next_output_buffer_pts + latency, where first_incoming_buffer_time is absolute time when the first buffer appeared on the one of mixer's sink pads. I expect that audiomixer will provide aggregated packet each X milliseconds (where X is about output-buffer-duration property), I assume that audiomixer's src pad should push buffers the following way: buffer 1 pts 0.00 duration 0.02 at real time t (buffer has data from the sink_0 pad) buffer 2 pts 0.02 duration 0.02 at real time t+0.02 (buffer has data from the sink_0 pad, because it waited enougth time ~0.02 ms to receive it) buffer 3 pts 0.04 duration 0.02 at real time t+0.04 (buffer has data from the sink_0 pad, because it waited enougth time ~0.02 ms to receive it) buffer 4 pts 0.06 duration 0.02 at real time t+0.06 (buffer has data from the sink_0 pad, because it waited enougth time ~0.02 ms to receive it) buffer 5 pts 0.08 duration 0.02 at real time t+0.08 (buffer has data from the sink_0 pad, because it waited enougth time ~0.02 ms to receive it) buffer 6 pts 0.10 duration 0.02 at real time t+0.10 (buffer has data from the sink_0 pad, because it waited enougth time ~0.02 ms to receive it) buffer 7 pts 0.12 duration 0.02 at real time t+0.12 (buffer has data from the sink_0 pad, because it waited enougth time ~0.02 ms to receive it) buffer 8 pts 0.14 duration 0.02 at real time t+0.14 (buffer has data from the sink_0 pad, because it waited enougth time ~0.02 ms to receive it) Could you please clarify which of two behaviors is correct? > In live mode, we pop data from the internal queues and mix them. If a source > is having a huge latency (like in your test, 5s) that is not declared, then > that source will be ignored from the mixing. In my test you can see that the first messages which are late are not ignored by the mixer, as well as few next messages (~5). That's what I tried to explain in item 2 > ...but does not seem valid. With RTP, if you have this case, you can increase > the latency on rtpjitterbuffer element. What is not out-of-the box is this > latency, we don't have a dynamic latency mode in GStreamer. I expected that rtpjitterbuffer's latency shows what is a maximum time rtpjitterbuffer can keep incoming message before it has to be produced. It is needed for fix some possible issues with message reordering and bursting which can happend during sending messages over UDP. Im not sure I can calculate and set latency for RTP and audiomixer, because the moment when pipeline is ready to receive messages is not the moment when it receives the first incoming RTP message. Also it is looks bit strange for me that it is not enougth just prepare pipeline and wait while it start to receive data.
Buffers travel from upstream to downstream in our terminology. source ! mixer ! sink Source is upstream to the mixer.
And base_time is the time when the GstPipeline went to playing, not the mixer itself. Element cannot change it.
(In reply to Sergey Mamonov from comment #16) > > ...It calculate when is the deadline for data to be > > produce in order for that data to be on-time at the sink. > > Definitely we define the term 'on time' different way. According > audiomixer's logic output buffer has to be sent before the moment of time: > > base_time + next_output_buffer_pts + latency, > where base_time is the absolute time when audiomixer moved its state to > PLAYING > It means that as soon audiomixer is started it tries to produce aggregated > buffers from absolute base_time moment till the current moment of absolute > time, and it pushes this aggregated buffers as fast as it can. That leads to > bursting of buffers with silent right after audiomixer receives the first > buffer. I.e audiomixer produces on its src pad the following buffers: This burstring will not happen if the data is not late. That's beacause we so "+ some_latency" which is queried upstream. In RTP case, this is usually something provided by rtpjitterbuffer (and configurable). > > I assume that 'on time' should be calculated the following way: > first_incoming_buffer_time + next_output_buffer_pts + latency, > where first_incoming_buffer_time is absolute time when the first buffer > appeared on the one of mixer's sink pads. No, time is always in relation to the clock. The calculation is a bit more complex then looking at the PTS. 1) PTS -> running-time (see gst_segment_to_running_time()) 2) OnTime if: running-time + some_latency <= clock_time_now > > In live mode, we pop data from the internal queues and mix them. If a source > > is having a huge latency (like in your test, 5s) that is not declared, then > > that source will be ignored from the mixing. > > In my test you can see that the first messages which are late are not > ignored by the mixer, as well as few next messages (~5). > That's what I tried to explain in item 2 > > > > ...but does not seem valid. With RTP, if you have this case, you can increase > > the latency on rtpjitterbuffer element. What is not out-of-the box is this > > latency, we don't have a dynamic latency mode in GStreamer. > > I expected that rtpjitterbuffer's latency shows what is a maximum time > rtpjitterbuffer can keep incoming message before it has to be produced. It > is needed for fix some possible issues with message reordering and bursting > which can happend during sending messages over UDP. No, rtpjitterbuffer don't have an upper limit by default unless you set drop-on-latency property to TRUE. It's kind of a missing feature to have some proper limits. drop-on-latency is a bit too agressive for most use cases. The rtpjitterbuffer buffer latency will have an impact on the amount of data that will be buffered by the jitterbuffer though. It should reflect the transmission delay plus some acceptable time cause by network jitter (e.g. wifi burst). This element is also responsible for setting proper timestamp that correlates to the pipeline clocks. I'm pretty sure that some of the timestamp mode are incompatible with the audiomixer (and any code that would like to do live synch). > > Im not sure I can calculate and set latency for RTP and audiomixer, because > the moment when pipeline is ready to receive messages is not the moment when > it receives the first incoming RTP message. > Also it is looks bit strange for me that it is not enougth just prepare > pipeline and wait while it start to receive data. It's a very difficult problem to implement dynamic jitterbuffer. No one in the GStreamer community have proposed a solution. To be fair, audiomixer works for my use case, but I use the default slave mode, not the none mode that you seem to use.
> 1) PTS -> running-time (see gst_segment_to_running_time()) > 2) OnTime if: running-time + some_latency <= clock_time_now Thanks it is clear for me that base_time should not been modified but latency has to be increased. But I still have misunderstanding how and where it should be done(internally by jitterbuffer logic, internally in audiomixer logic or externally by some application code) > > I expected that rtpjitterbuffer's latency shows what is a maximum time > > rtpjitterbuffer can keep incoming message before it has to be produced. It > > is needed for fix some possible issues with message reordering and bursting > > which can happend during sending messages over UDP. > > No, rtpjitterbuffer don't have an upper limit by default unless you set > drop-on-latency property to TRUE. It's kind of a missing feature to have > some proper limits. drop-on-latency is a bit too agressive for most use > cases. I understand rtpjitterbuffer latency description https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-good/html/gst-plugins-good-plugins-rtpjitterbuffer.html#GstRtpJitterBuffer--latency absolutely different way. > It's a very difficult problem to implement dynamic jitterbuffer. No one in > the GStreamer community have proposed a solution. To be fair, audiomixer > works for my use case, but I use the default slave mode, not the none mode > that you seem to use. slave mode works fine for me as well, but i have silence in modes node/synced The main question for me is when audiomixer starts working in my case it has all information for making a conclusion that 'latency' set incorrectly and some bursting and pissible permanent silence can happen. It also can calculate some 'correction' value for fixing this situation. Do you have any ideas what should audiomixer perform in such situation: 1) to do nothing and to work as previously (reject this ticket) 2) to report incorrect latency with some WARNING/INFO trace 3) to try to fix this latency internally (may be this functionality should be enabled by some property flag) 4) your option?
About the latency on jitterbuffer buffer, it is at most. But then it can block on downstream, meanwhile the jitterbuffer will grow (max latency has no limit, see min/max latency in the design doc) If you think you can implement 2) please go ahead. It would be really helpful. I've implemented 3) for another case, video encoders with variable input framerate, but I never got any traction with this. https://bugzilla.gnome.org/show_bug.cgi?id=675761
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/260.