GNOME Bugzilla – Bug 725167
opusdec PLC doesn't seem to work as well as Chrome
Last modified: 2015-08-16 13:38:08 UTC
I have noticed that PLC in opusdec seems far worse than in Chrome - it does make some difference but not as much as it could. I think this is to do with the code in gstopusdec.c line 427 that fixes the samples at 120. The documentation for the opus codec (see http://www.opus-codec.org/docs/html_api-1.1.0/group__opus__decoder.html#ga7d1111f64c36027ddcb81799df9b3fc9) states: In the case of PLC (data==NULL) or FEC (decode_fec=1), then frame_size needs to be exactly the duration of audio that is missing, otherwise the decoder will not be in the optimal state to decode the next incoming packet. For the PLC and FEC cases, frame_size must be a multiple of 2.5 ms. I think gstreamer has this information from the GAP_EVENT that is processed in gstaudiodecoder.c, and is passed to gstopusdec.c in the 0 size buffer that is passed in for PLC. I think that using the correct number for samples may improve things. I will try to find a way to get the timings from the GstBuffer and use them to set the correct size of samples in the PLC case (where the size of the buffer is 0) in order to create a patch.
Also think I may have spotted another related problem. The code currently checks if the last_buffer was missing by checking if it's not NULL: if (dec->last_buffer) { /* normal delayed decode */ ... } else { /* FEC reconstruction decode */ ... } I think that the contract says the buffer will be 0 size for PLC so it should be checking the size of dec->last_buffer as it does earlier when deciding if to decode using buffer or last_buffer. gst_buffer_get_size seems to return 0 if the buffer is null - so I think it should be safe to change it.
Having looked into this further there is an issue with using the gap events in that it seems that opus doesn't like the samples to represent exactly one lost packet (in this case 20ms). The duration from the gap events is quite variable when using a real rtpbin - sometimes much less than 1 lost packet (in my case it's sometimes less than 1ms). I will continue to look into it!
Imho, if the gap is less than one packet, you should just ignore it (and get desynchronized a bit), if it is more,than you can just call into libopus multiple times ?
(In reply to comment #3) > Imho, if the gap is less than one packet, you should just ignore it (and get > desynchronized a bit), if it is more,than you can just call into libopus > multiple times ? If you look at the opus codec API docs: In the case of PLC (data==NULL) or FEC (decode_fec=1), then frame_size needs to be exactly the duration of audio that is missing, otherwise the decoder will not be in the optimal state to decode the next incoming packet. It seems like it doesn't go badly wrong if you pass in different times, but I am trying to get the PLC/FEC to work as well as it can. I think if the timing was based on the RTP timestamps I think it could be pretty exact. From looking at the jitterbuffer code it seems that the DTS is based on the times received so has jitter. Though I am still trying to work things through. I was wondering if another alternative was to use PTS as I understand it's got the smoothed out values which might give a value much closer to the difference in RTP timestamps.
RTP doesn't carry DTS information, so the DTS is meaningless. rtpjitterbuffer should probably set it to the PTS or just set it to GST_CLOCK_TIME_NONE
That's helpful as it confirms what I was thinking. Currently the GstRTPPacketLost event raised in gstrtpjitterbuffer.c takes the timestamp from a timer that I think is always created using expected DTS when the packet would be received. As far as I can tell the events only contain one timestamp value so there is no PTS available for the packet lost case. In the opus depayloader the code creates the gap event using this timestamp so it's filling in the timestamp of the gap event with a DTS value. I can see 2 different ways this could be fixed: - Supplying PTS in the packet loss and gap events - Attempting to calculate the PTS for the packet within the audio decoder so the audio to cover the gap starts straight after the last decoded sample from the last packet. Does that make sense or is there a better way to fix this? I think that fixing the opus decoder to match the API is a good idea as well - which I have hacked something for but it's not good at the moment :( (In reply to comment #5) > RTP doesn't carry DTS information, so the DTS is meaningless. rtpjitterbuffer > should probably set it to the PTS or just set it to GST_CLOCK_TIME_NONE
GstRTPPacketLost normally contains a PTS, not a DTS (there is no DTS in the RTP case at all). Anyway, for all audio cases, DTS==PTS all the time, so the difference is meaningless there.
It seems to me that rtpbin sets the DTS to the receive time with some adjustment, but then calculates the PTS. I've traced through the code and the packet loss events and they are created with the expected receive time - which is similar to DTS. Also the code has a comment that states: gstrtpjitterbuffer.c /* take the DTS of the buffer. This is the time when the packet was * received and is used to calculate jitter and clock skew. We will adjust * this DTS with the smoothed value after processing it in the * jitterbuffer and assign it as the PTS. */ It seems to me that the main thing that doesn't end up in DTS is the clock-skew calculations. In the rtpjitterbuffer.c the PTS is set: /* do skew calculation by measuring the difference between rtptime and the * receive dts, this function will return the skew corrected rtptime. */ item->pts = calculate_skew (jbuf, rtptime, dts); I cannot see any code in that would set DTS to PTS after this unless DTS isn't set (which it is almost always for my use case). If as you assert DTS==PTS then perhaps when we push the buffer DTS should be unset or set to PTS.
I made some changes to the logging so I could see the values used in my pipeline: 0:00:22.050750000 ^[[335m40732^[[00m 0x7fab6c037370 ^[[37mDEBUG ^[[00m ^[[00m rtpjitterbuffer gstrtpjitterbuffer.c:2349:pop_and_push_next:<rtpjitterbuffer3>^[[00m Pushing buffer 3271, dts 0:00:06.122462478, pts 0:00:05.728122767 You can see the buffer it's pushing has a few 100ms difference between dts and pts.
Created attachment 276871 [details] [review] Patch to gst-plugins-base that updates calculations for timestamps during packet loss to give a better estimate
Created attachment 276872 [details] [review] Patch for gst-plugins-good that gives rtptime exact times during packet loss
Created attachment 276873 [details] [review] Patch for gst-plugins-bad that gives the opus codec the correct timings for PLC These 3 patches combine to allow exact timings to be provided into the codec when packets are lost. Without this the codec has no idea of how long to correct for and it turns out that this gives much improved packet loss concealment. In real life tests with WebRTC we found that opus audio went from unintelligible to pretty good with the same level of packet loss.
BTW sorry this took so long I have had the patches for a while but no time to upload them! Please note that the patches contain one TODO on some code that I didn't know how to deal with and you may want to review which debug statements are required. I'm happy to make any changes you want. I am also not sure if the approach is correct, but I couldn't find another way to give exact timings for PLC.
Review of attachment 276873 [details] [review]: The depayloader part is not opus specific at all ::: ext/opus/gstrtpopusdepay.c @@ +94,3 @@ static gboolean +gst_rtp_opus_depay_packet_lost (GstRTPBaseDepayload * depayload, + GstEvent * event) This probably all belongs in GstRtpBaseDepayload in gst-plugins-base, it's not Opus specific at all.
Review of attachment 276871 [details] [review]: ::: gst-libs/gst/audio/gstaudiodecoder.c @@ +808,2 @@ /* normalize to bool */ + dec->priv->agg = !!res; Please don't make unrelated changes. and run your commits through the commit hook that uses gst-indent. "! !res" is what gst-indent wants. @@ +1816,3 @@ + audio ends (though can make a reasonable guess based on the presumption + of constant ptime) */ + if (!GST_CLOCK_TIME_IS_VALID (dec->priv->out_ts)) { You probably don't meant to use "!" here.
Review of attachment 276872 [details] [review]: I'm not convinced that this patch is correct, removing the call to apply_offset() looks quite suspect. And actually, why do you pass the duration as rtptime and then convert it back to nanoseconds in the depayloader instead of just using the regular duration here? ::: gst/rtpmanager/gstrtpjitterbuffer.c @@ +1507,3 @@ apply_offset (GstRtpJitterBuffer * jitterbuffer, GstClockTime timestamp) { + GstClockTime offsetTime = timestamp; I'm not sure why you just rename the variable here, this is not useful or needed.
Just wanted to chime in on this bug report. The problem with the OPUS decoder always producing 120ms concealment frames regardless of the length actual gap event is really limiting the audio quality for low latency applications where late/lost frames are frequent. BR, TOmmy
Indeed, there seems to be remnants of NULL buffer vs zero sized buffers checks needing fixing. About the timestamps, since opusenc will encode samples in chunks of well known times (multiples of 2.5 ms), you should get gaps that also match these, unless something messes with the timestamps between opusenc and opusdec. Does RTP not send proper timestamps in this case ?
Created attachment 294503 [details] [review] alternative patch for frame size determintion on FEC/PLC cases Here's another possible patch. I think (not 100% sure) that the duration should always be taken from the missing data. This patch also keeps track of any leftover samples when the number of missing samples is not a multiple of the minimum frame size. This leftover is carried over non missing frames, which I'm not sure is the best thing to do (as opposed to re-zeroing it when an actual packet is received). Doing it this way will desync less over time I think. Tested with: gst-launch-1.0 uridecodebin uri=file://path/to/audio.flac ! audioconvert ! audioresample ! opusenc frame-size=X inband-fec=1 ! identity drop-probability=0.1 ! opusdec use-inband-fec=1 plc=1 ! pulsesink X being 2, 5, 10, and using either use-inband-fec=1 and plc=1. This required a change to identity to send gap events when dropping, which I'll commit shortly. It sounds better than the original but I don't know how it compares to Chrome.
Created attachment 301992 [details] [review] New patch for PLC frame size calculation I wrote a new patch based on the idea from Vincent Penquerc'h . It does essentially the same as his patch, with the following differences: (1) The calculations are done with the durations instead of the sample count. The criterion for Opus' PLC to work is that frame lengths must be a multiple of 2.5ms, so it makes sense to do the calculations directly in the time domain. (2) The patch had a bug that caused decoding errors. Since the plc_samples value got rounded (and not just truncated), the leftover samples count could become negative. Furthermore, PLC intervals which are smaller than 2.5ms were not being handled properly. The new patch fixes these issues.
I'd like to push my patch for inclusion in 1.6. We run the Opus PLC all the time in our system, and it works well. We saw no stability issues or timestamp problems with it.
Can you cleanup the patchset, anything that is not relevant should be marked as obsolete. In my quick lookup for reviews, when I see patches marked "need-work" that has not be obsoleted by another patch, I skip that bug assuming it's not ready yet.
I honestly don't understand enough about what the other patches exactly do or achieve to be able to clean them up. It seems to me as if they are fixing a related, but distinct issue. Perhaps they should be handled in a separate bugzilla entry.
Comment on attachment 301992 [details] [review] New patch for PLC frame size calculation Merged, please re-open if something else is needed! commit a595874aac5d8b7a5c628ef34d749d558c791d5f Author: Carlos Rafael Giani <dv@pseudoterminal.org> Date: Mon Apr 20 15:04:56 2015 +0200 opusdec: Fix PLC frame size calculations Previously, PLC frames always had a length of 120ms, which caused audio quality degradation and synchronization errors. Fix this by calculating an appropriate length for the PLC frame. The length must be a multiple of 2.5ms. Calculate a multiple of 2.5ms that is nearest to the current PLC length. Any leftover PLC length that didn't make it into this frame is accumulated for the next PLC frame. https://bugzilla.gnome.org/show_bug.cgi?id=725167