GNOME Bugzilla – Bug 666545
mad: number of samples read is less than expected
Last modified: 2012-02-18 19:56:28 UTC
Created attachment 203896 [details] Relevant files to reproduce/highlight this issue When attempting to read MP3 or AAC files using playbin2, the amount of data returned is one buffer short. For AAC files, the buffer of data that is not read is at the start of the audio (first audio buffer). For MP3 files, the buffer of data that is not read is at the end of the audio (last audio buffer). The "audio-sink" property of playbin2 is attached to an appsink from which the data is retrieved. No such issues are observed when reading a WAV file. The default buffer size supplied by playbin2 has been used. During testing the buffer sizes were 1152 samples for MP3 and 1024 samples for AAC. Attached are audio files using which I have tested: sine_noisy_both_ends.m4a: 1 second of sine wave of 800 Hz frequency which contains 1024 samples of random noise at the start and 1024 samples of random noise at the end. The noise can be heard when using Windows Media Player. This file was created on Windows 7 using Media Foundation. sine_noisy_both_ends.mp3: Output of converting sine_noisy_both_ends.m4a using a utility called AAC to MP3 converter. This conversion was also done on Windows 7. Execute the following gst-launch command: gst-launch-0.10 -v playbin2 uri=file:///tmp/sine_noisy_both_ends.m4a video-sink=fakesink text-sink=fakesink You will NOT hear the noise at the beginning but you will hear the noise at the end. When testing with the mp3 file, this issue cannot be clearly observed using gst-launch because the MP3 file generated by the converter contains a lot of zero padding both at the start and end of the signal. I am attaching the plot mf_gst_for_mp3.tif" to illustrate this. The data read in from linux (in red) has been padded with 1152 zero samples at the end to make it have the same length as the data read in using media foundation. The signals overlap exatly for the last 1152 samples. The workflow being used is: 1. Create playbin2 and appsink elements and assign the appsink to the "audio-sink" property of playbin2. 2. Move the state of the pipeline(playbin2) to PAUSED. I do this so that I can query the negotiated caps for properties such as numchannels, bit depth, which I report to the user. 3. Seek to the desired location in the stream using gst_element_seek_simple. I use the flags GST_SEEK_FLAG_FLUSH and GST_SEEK_FLAG_ACCURATE. 4. Move the state of the pipeline(appsink) to PLAYING. Use the GstAppSink API functions to retrieve pull the buffers. No such issues are observed when attempting to read WAV files.
For the AAC part: GStreamer's default AAC decoder uses libfaad. I could not see anything wrong after a first check of the plugin, though outputting its output to a raw file shows the first thousand or so samples to be missing. So I downloaded libfaad (1), and the same issue happens with its command line tool - the first few samples are cut off. Manually selecting ffdec_aac in place of faad in the same gst pipeline gets it right, noise is head at both ends. So it's a bug in libfaad. (1) http://www.audiocoding.com/downloads.html
For the MP3 part, I'm not sure how to tell if there really is an issue. I have converted the mp3 file to wav with both ffmpeg and gstreamer, and both files show the exact same length (0:00:01.071020409 seconds, at 44100 Hz). It is possible that both gstreamer and ffmpeg drop the same amount of data though. Is the mp3 file you included supposed to contain more data than this ? FWIW, playing with gst, I hear noise at the beginning and end, but that might be expected since you said you added padding at the end. Last, mad (the mp3 decoding library used by gst) reports decoding errors at the end of the stream. Whether it is because of the padding, or because of invalid data before that padding was added, I do not know).
Created attachment 204061 [details] WAV files containing samples from MP3 file sine_noisy_both_ends.mp3 read on media foundation and gstreamer
Hello Vincent, I am computing the expected number of samples using the duration and sample rate. As per your measurements above, the expected total number of samples should be 47232. Using media foundation, I get exactly this many samples. Using gstreamer, I get a total of 46080 samples which is 1152 samples (1 buffer size) less. I am attaching two WAV files which contain the samples read in from the MP3 file using Media Foundation and gstreamer. Please do let me know if you require any additional information. Dinesh (In reply to comment #2) > For the MP3 part, I'm not sure how to tell if there really is an issue. > I have converted the mp3 file to wav with both ffmpeg and gstreamer, and both > files show the exact same length (0:00:01.071020409 seconds, at 44100 Hz). > It is possible that both gstreamer and ffmpeg drop the same amount of data > though. > Is the mp3 file you included supposed to contain more data than this ? > > FWIW, playing with gst, I hear noise at the beginning and end, but that might > be expected since you said you added padding at the end. > > Last, mad (the mp3 decoding library used by gst) reports decoding errors at the > end of the stream. Whether it is because of the padding, or because of invalid > data before that padding was added, I do not know).
Ah, I see the issue. A straightforward pipeline with mad (the mp3 decoder) gets me the right amount of samples, but if I insert mp3parse beforehand, I do get 1152 samples less, so it's a bug in mp3parse, which I will now be hunting.
And it a bug in mad, or in the format - not quite sure and I did not really look deeper than http://www.mars.org/mailman/public/mad-dev/2001-May/000262.html. Stuffing extra zero bytes after the last buffer "fixes" the missing samples. So part of the bug is fixed, and apparently a bug in faad, so please file a bug with faad for the AAC part. Not sure what to set the bug status to here. commit 30e29b6fdbbcd28fc8d70e1146abd54ef0065aa3 Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Thu Dec 22 15:23:54 2011 +0000 mad: helpfully bodge the last buffer to let mad decode the last frame If http://www.mars.org/mailman/public/mad-dev/2001-May/000262.html is to be believed, the last buffer must be followed by a number of 0 bytes in order for the last frame to be decoded (at least in some cases). Doing so seems to work here, fixing a missing 1152 samples when using mp3parse before mad (not using mp3parse would yield the correct amount of samples, if there's extra non-MP3 data after (eg, tag data)).
Marking as fixed - at least for the MP3 part. The AAC part is apparently a faad bug, so gst should start working automagically when buily against a fixed faad. Please file a bug against faad with your example and analysis.