GNOME Bugzilla – Bug 651615
[vorbisenc] Too small jitter tolerance
Last modified: 2011-10-19 08:46:59 UTC
Hello, After upgrading from Ubuntu 10.10 (Maverick) to 11.04 (Natty), I'm finding that files transcoded by apps from FLAC to Ogg Vorbis won't play on my Android phone. Both banshee and soundconverter are showing this problem; files I've transcoded by hand with oggenc or gst-launch (with the following command) aren't: gst-launch-0.10 filesrc location=foo.flac ! flacdec ! audioconvert ! vorbisenc quality=0.6 ! oggmux ! filesink location=foo.ogg I've created two Vorbis files of Pink Floyd's "Time" for comparison, one that was transcoded with gst-launch and the other with soundconverter: http://www.barbuto.org/Time.works.ogg http://www.barbuto.org/Time.doesntwork.ogg Though they both have the same sampling/bitrate settings, the latter is slightly larger. I initially submitted a bug report with Android (http://code.google.com/p/android/issues/detail?id=17260), but it's increasingly looking like a gstreamer issue. I submitted a bug report with Ubuntu as well: https://bugs.launchpad.net/ubuntu/+source/gstreamer0.10/+bug/790522 The Ubuntu changelog can be found at http://packages.ubuntu.com/natty/gstreamer0.10-plugins-base I upgraded gstreamer to 0.10.34 using the PPA at https://launchpad.net/~gstreamer-developers/+archive/ppa, but it didn't fix the problem.
The doesntwork version has discontinuities for some reason: $ ogginfo Time.doesntwork.ogg [...] WARNING: sequence number gap in stream 1. Got page 23 when expecting page 22. Indicates missing data. WARNING: discontinuity in stream (1) WARNING: discontinuity in stream (1) [...] Could you get a debug log from soundconverter with GST_DEBUG=vorbisenc:5,flacdec:5,ogg*:5 ? Also, does the same happen if you transcode it with gst-launch and plug an audiorate element before vorbisenc?
Plugging in audiorate didn't make a difference, the file played fine. The debug log from soundconverter is located at http://www.barbuto.org/debug.log.gz
There are also many discontinuities detected by vorbisenc, which will result in exactly this behaviour. But using audiorate before vorbisenc should fix this then. Could you also create a new debug log with audiorate before vorbisenc and GST_DEBUG=vorbisenc:5,flacdec:5,ogg*:5,audiorate:5
Done, it's located at http://www.barbuto.org/gst-launch-audiorate.debug.log.gz
Created attachment 193463 [details] [review] Relax overly-tight jitter tolerances in gstvobisenc vorbisenc currently reacts in a rather draconian fashion if input timestamps are more than 1/2 sample off what it considers ideal. If data is 'too late' it truncates buffers, if it is 'too soon' it completely shuts down encode and restarts it. This is causing vorbisenc to produce corrupt output when encoding data produced by sources with bugs that produce a smple or two of jitter (eg, flacdec)
Created attachment 193464 [details] [review] Correct sample number rounding resulting in timestamp jitter flacdec converts the src timestamp to a sample number, uses that internally, then reconverts the sample number to a timestamp for the output buffer. Unfortunately, sample numbers can't be represented in an integer number of nanoseconds, and the conversion process was truncating rather than rounding, resulting in sample numbers and output timestamps that were often off by a full sample. This corrects the time->sample convesion
This looks like the same bug as Fedora bug #722667 (redhat bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=722667). That bug has a test case and patches. I'm duplicating my comments and patchs here: It is a combination of a bug in gstflacdec and a bad decision in gstvorbisenc. gstflacdec is converting input buffer timestamp to sample number, using that internally, then converting sample number back to timestamp for the output buffer. Unfortunately, it's using gst_util_uint64_scale_int to convert timestamp to sample number (which always rounds by truncating) and since sample numbers aren't usually expressible in an integer number of nanoseconds, the output buffer timestamp is off by a full sample about half the time. gstvorbisenc, on the other side, is hardwired to tolerate only a half-sample of jitter in the input buffer; if it sees more, it treats the input as a discontinuous stream and resets the encode process. If time 'went backwards' due to jittering back a sample, it truncates the buffer as well. The combination of these two behaviors is obviously catastrophic to FLAC->Vorbis transcodes, and worse, the corruption happens without any warning whatsoever. proposed patch 1/2: Patch to the vorbis encoder element to relax jitter action threshold from .5 samples to 3 samples. proposed patch 2/2: corrects the bug in flacdec that was causing a full sample of timestamp jitter The patches are tested here; either one fixes the specific reported bug. The flacdec patch is for a real bug obviously, and I think the vorbisenc patch should be put in place to back away a bit from a potentially bad design decision. I'm not sure what the vorbisenc code thinks it's doing with respect to discontinuous streams, but I am sure it should not be mistriggering on inputs with very minor timestamp jitter.
*** Bug 656211 has been marked as a duplicate of this bug. ***
Some thoughts about the patches patch 193463: We use different tolerances everywhere unfortunately. Some encoders use a multiple milliseconds value, some multiple or one sample, some don't allow any tolerance at all, some allow one "frame" of difference. In general your patch is correct though, especially the part with the negative difference. We should reach some consensus here before pushing this though, something like MAX(N milliseconds, M samples) probably makes most sense for some values of N and M ;) patch 193464: This one is correct too but could you check in the convert_src and convert_sink functions if any of the other conversions should use correct rounding too? From a short look I'd say all of them should be changed.
"We should reach some consensus here before pushing this though, "something like MAX(N milliseconds, M samples) probably makes most sense for "some values of N and M ;) Your framework, do as you see fit :-) "could you check in the convert_src "and convert_sink functions if any of the other conversions should use correct "rounding too? I did; none of the others should be changed.
commit 9cbe7c1403de3d774dc25bc41fadcd1391de952e Author: Monty Montgomery <cmontgom@redhat.com> Date: Thu Jul 21 17:16:26 2011 -0400 vorbisenc: Relax overly-tight jitter tolerances in gstvobisenc vorbisenc currently reacts in a rater draconian fashion if input timestamps are more than 1/2 sample off what it considers ideal. If data is 'too late' it truncates buffers, if it is 'too soon' it completely shuts down encode and restarts it. This is causingvorbisenc to produce corrupt output when encoding data produced by sources with bugs that produce a smple or two of jitter (eg, flacdec) commit 799c8e3d04456ce0b22c03de66d20d0a1a599643 Author: Monty Montgomery <cmontgom@redhat.com> Date: Thu Jul 21 17:23:28 2011 -0400 flacdec: Correct sample number rounding resulting in timestamp jitter flacdec converts the src timestamp to a sample number, uses that internally, This corrects the time->sample convesion The jitter tolerance will be unified by using the audio encoder base class that is going to be in gst-plugins-base soonish.
FWIW, as mentioned above, audioencoder now takes care of such jitter tolerance, and vorbisenc has been ported to it.