Bug 620323 – mpegaudioparse: Add support for LAME tags and adjust segments based on the padding information from it

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 620323 - mpegaudioparse: Add support for LAME tags and adjust segments based on the padding information from it


Summary:	mpegaudioparse: Add support for LAME tags and adjust segments based on the pa...


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Duplicates:	719867 764267 (view as bug list)
Depends on:	740196
Blocks:

Reported:	2010-06-02 00:43 UTC by nyall
Modified:	2018-11-03 14:42 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch for mpegaudioparse to exclude padding samples from segment (5.57 KB, patch) 2015-03-06 14:53 UTC, Carlos Rafael Giani	none	Details \| Review
Patch for mpegaudioparse to exclude padding samples from segment , v2 (6.33 KB, patch) 2015-03-12 16:01 UTC, Carlos Rafael Giani	none	Details \| Review
Patch for mpegaudioparse to exclude padding samples from segment , v3 (2.47 KB, patch) 2015-03-24 22:46 UTC, Carlos Rafael Giani	none	Details \| Review
Drop first frame if it's a Xing or VBRI info tag (2.76 KB, patch) 2015-09-03 11:37 UTC, Robert Tiemann	none	Details \| Review
Fix delay samples computation (1.24 KB, patch) 2015-09-03 11:43 UTC, Robert Tiemann	none	Details \| Review

Description nyall 2010-06-02 00:43:57 UTC

Currently, gstreamer is ignoring the lame tags in mp3s during gapless playback. These tags contain information about the padding at the start and end of the files, which should be skipped to get perfect gapless playback of these files. Without using these tags, there'll always be a tiny silence or glitch between songs. 


Here's an example of the lame tags in a file which should play back gaplessly:

$eyeD3 --lametag Track01.mp3

Track01.mp3 [ 3.77 MB ]

Encoder Version : LAME3.98r
LAME Tag Revision : 0
VBR Method : Variable Bitrate method2 (mtrh)
Lowpass Filter : 18500
Radio Replay Gain : -2.0 dB (Set automatically)
Encoding Flags : --nspsytune --nssafejoint
ATH Type : 4
Bitrate (Minimum) : 192
Encoder Delay : 576 samples
Encoder Padding : 960 samples
Noise Shaping : 1
Stereo Mode : Joint
Unwise Settings : False
Sample Frequency : 44.1 kHz
MP3 Gain : 0 (+0.0 dB)
Preset : V2
Surround Info : None
Music Length : 3.77 MB
Music CRC-16 : 07F9
LAME Tag CRC-16 : E5BF


The encoder delay and encoder padding tags describe how many samples were added at the start and end of the audio to complete the frames. Some more information is available on this page: http://gabriel.mp3-tech.org/mp3infotag.html

Comment 1 Sebastian Dröge (slomo) 2010-06-02 08:08:43 UTC

Yes, this would definitely nice if it was implemented :) Shouldn't be too much work to add support for this in the mad plugin.

Comment 2 Tim-Philipp Müller 2010-06-02 08:26:09 UTC

I think there's a bug for this elsewhere already too. Ideally we shouldn't have to implement this for each decoder individually..

Comment 3 nyall 2010-07-04 00:46:36 UTC

As far as I know these tags apply specifically to mp3 files only, as a way of getting around limitations in the format.

I had a bit of a look in the mad source, and it looks like the delay and padding are stored in tag->lame.start_delay and tag->lame.end_padding. xmms uses these tags to gaplessly play the files by:

data->samples_to_skip = lame->start_delay;
data->samples_to_play = ((guint64) xmms_xing_get_frames (data->xing) * 1152ULL) -
                                lame->start_delay - lame->end_padding;

Comment 4 Tim-Philipp Müller 2010-12-26 22:39:40 UTC

> Shouldn't be too much work to add support for this in the mad plugin

Shouldn't this rather be implemented in mpegaudioparse instead?

Comment 5 Sebastian Dröge (slomo) 2010-12-27 11:16:58 UTC

As this tags say how many samples should be dropped of the first and last frame this has to be implemented in the decoders somewhere. Or extraction in mpegaudioparse and then some way to forward this information downstream (hm, would it be possible with a carefully created newsegment event to drop these samples? probably)

Comment 6 Tim-Philipp Müller 2010-12-27 11:27:18 UTC

I was assuming that the samples to be dropped are fewer than one frame, so thought maybe the parser could then tweak the first newsegment event(that should be quite easy anyway, shouldn't it?) and send an update for the last one before pushing out the last frame or so.

Comment 7 Sebastian Dröge (slomo) 2011-04-01 08:30:37 UTC

Yes, I guess adding this to the segments of the parser will work (and yes, it's less than one frame). Who wants to add lame tag parsing and the segment adjustments to mpegaudioparse? :)

Comment 8 Sebastian Dröge (slomo) 2011-04-01 12:05:33 UTC

Trying it now

Comment 9 Mark Nauwelaerts 2011-09-08 17:08:09 UTC

mpegaudioparse hangs out in -good these days.

Comment 10 Christoph Reiter (lazka) 2012-05-13 13:56:36 UTC

It seems mpegaudioparse can now parse the lame tag / encoder delay.

What would be needed to get this working?

Comment 11 Carlos Rafael Giani 2012-08-01 15:58:36 UTC

I have been developing a plugin for MP3 playback using mpg123 to decode the frames, with GstAudioDecoder as the base class, and relied on mpegaudioparse for seeking. This has worked well, except for the gapless part, which has turned out to be fairly difficult to achieve. I have tried out several ideas, but they all fail when seeking occurs. 

One big problem I have encountered is that the number of padding samples indicated by the LAME tag sometimes exceed the frame size. Example: frame size 1152 samples, encoder padding 1941 samples. I looked at the mpg123 code, which can do gapless internally (I will refer to that later), and what it apparently does is that it estimates the total length of the MP3, and subtracts the padding length from that. Once decoding reaches the calculated length, playback stops.

However, I could be wrong, and maybe I misinterpreted the number 1941.

Either way, one possibility would be to somehow truncate the start and end of the segment, as Sebastian Dröge suggested above. That leaves one question open, though: do sinks that have their sync property set to FALSE still respect the start/end times of the segment? If not, the gap would return if sync=FALSE.

Comment 12 Carlos Rafael Giani 2012-08-01 16:02:53 UTC

Sorry, I accidentally pressed the commit button too early :)

Continuing here:

If I am wrong, and the LAME padding is always shorter than one frame, then perhaps mpegaudioparse could send the following in-band events downstream to help the decoder: (1) an event containing the delay and padding information from the LAME tag, (2) an event informing downstream that the frame that is about to be pushed is the first one, (3) an event informing downstream that the frame that is about to be pushed is the last one. (2) and (3) should respect seeking; for example, if I seek to the beginning of the song, (2) should be sent again. This way, the decoder could decode the frame as usual, then, with the information from (1), if it is the first frame, throw away the first N samples, if it is the last frame, shorten its duration. In a pre_push() function, the timestamps could then be corrected by the length of the delay.

If the padding can be more than one frame, then I dont know what could be done.

As for mpg123, it does have internal support for gapless. If enabled, it automatically throws away the null samples. However, that does not work well with seeking using GstAudioDecoder, because every time I seek, a flush event is generated, and as a consequence, I have to reset mpg123 (= recreate the decoder handle). It loses all gapless information. This is because mpg123 is being used without letting it handle seeking itself - mpegaudioparse is doing that now. So, mpg123 is oblivious to the seeking. Possible solutions would be to either feed back the gapless information into mpg123 (requiring modifications to the mpg123 library), or rewrite the plugin from scratch using GstElement as the base class (and that is something I really don't wanna do unless it is absolutely necessary). Therefore, I don't think we could use mpg123's internal gapless support, or any decoder's internal support, for that matter.

Comment 13 Sebastian Dröge (slomo) 2013-07-21 09:04:15 UTC

Related bug: https://bugzilla.gnome.org/show_bug.cgi?id=704520

Comment 14 Edward Hervey 2013-10-02 12:26:11 UTC

Still applies in 1.x

Comment 15 Sebastian Dröge (slomo) 2014-03-15 10:55:49 UTC

*** Bug 719867 has been marked as a duplicate of this bug. ***

Comment 16 Carlos Rafael Giani 2015-03-06 14:53:07 UTC

Created attachment 298715 [details] [review]
Patch for mpegaudioparse to exclude padding samples from segment

With this patch, gapless playback can be achieved. It adjusts the start, stop, and duration fields of the segment before a segment event is sent downstream. This way, padding samples will be clipped.

Depends on the patch in Bug 740196 .

Comment 17 Sebastian Dröge (slomo) 2015-03-11 16:31:31 UTC

Review of attachment 298715 [details] [review]:

::: gst/audioparsers/gstmpegaudioparse.c
@@ +1139,3 @@
+   * calculations, because otherwise, the base parser's duration would be set
+   * twice (here and in gst_mpeg_audio_parse_adjust_segment) */
+  if (((mp3parse->encoder_delay == 0) && (mp3parse->encoder_padding == 0)) || (mp3parse->rate != -1)) {

Maybe just have a single function that sets these things?

@@ +1501,3 @@
+    segment->start = delay_in_ns;
+    segment->stop = duration - padding_in_ns;
+    segment->duration = duration - delay_in_ns - padding_in_ns;

This should maybe also adjust the segment.time and segment.base. You want to drop samples, but should the first non-dropped sample have running/stream time 0 or N?

Comment 18 Carlos Rafael Giani 2015-03-12 15:58:33 UTC

"Maybe just have a single function that sets these things?" I don't understand. What would you group into a function? The places which call gst_base_parse_set_duration() differ considerably. I do not see how these could be a function.

Comment 19 Carlos Rafael Giani 2015-03-12 16:01:12 UTC

Created attachment 299214 [details] [review]
Patch for mpegaudioparse to exclude padding samples from segment , v2

Decoder delay was not being applied to the padding value properly, and segment base and time fields were not set

Comment 20 Carlos Rafael Giani 2015-03-24 22:46:31 UTC

Created attachment 300237 [details] [review]
Patch for mpegaudioparse to exclude padding samples from segment , v3

Patch updated to make use of the gst_base_parse_set_padding function in baseparse

Comment 21 Cédric Bellegarde 2015-05-26 13:39:21 UTC

Any news on this?

It's really a needed feature....

Comment 22 Robert Tiemann 2015-09-03 11:37:15 UTC

Created attachment 310584 [details] [review]
Drop first frame if it's a Xing or VBRI info tag

Here is a patch that drops Xing/VBRI info tags. Without this patch, the frame containing the info tag is "played", producing a short gap of zeroes in front of actual audio data.

The patch depends on attachment 300237 [details] [review].

Comment 23 Robert Tiemann 2015-09-03 11:43:44 UTC

Created attachment 310586 [details] [review]
Fix delay samples computation

Here is another patch which avoids an extra zero sample being played after the end of an MP3.

I do not really know why this is "+1" is necessary, really, but the output of the multifilesink plugin clearly gave me an extra 0 sample at the end. You can actually hear it as a click when playing a sine sample in a gapless loop, but there is no click with this patch applied.

The patch depends on attachment 300237 [details] [review].

Comment 24 mike 2015-11-12 14:30:58 UTC

bump for this - i'm reading on LWN about all the cool things happening with GStreamer... but can't we have something basic like this patch applied?  I use a non-GStreamer audio player (Audacious, which uses MPG123)  to play back music only because of this issue.

Comment 25 Sebastian Dröge (slomo) 2015-11-12 14:57:35 UTC

It's not a simple patch, and there were discussions about how to implement it properly. For the Opus codec it is now implemented, for a legacy codec like MP3 someone will have to implement it similarly now.

That is, doing these kind of things via segment adjustments is not ideal and there's now a GstMeta (GstAudioClippingMeta) that parsers can put on the encoded buffers for this and decoders should use to remove additional samples. And encoders should also put it on their output if they can.

Any help with that would be appreciated :)

Comment 26 Sebastian Dröge (slomo) 2016-03-28 06:38:29 UTC

*** Bug 764267 has been marked as a duplicate of this bug. ***

Comment 27 Christoph Reiter (lazka) 2016-03-29 10:42:48 UTC

Looked into GstAudioClippingMeta a bit.

To correctly apply the clipping to each frame we need to know in which frame we are based on the end or start of the stream and atm we can only detect if we are in the first (offset=0) or in the last frame (offset + bpf == xing_bytes). A solution would be to buffer frames so that all the delay/padding samples fit in it and if we detect a first/last frame we can attach the GstAudioClippingMeta to the affected buffered frames.

On the mpg123 side we would need to make sure that we can link input frames with decoded ones, but that seems possible with some work.

Any idea how to buffer frames in mpegaudioparse/GstBaseParse?

Alternative: Only support one frame of delay/padding for now, that would at least reduce the gap.

Thoughts?

Comment 28 Carlos Rafael Giani 2016-03-29 10:54:16 UTC

Since mpegaudioparse knows the total number of frames (it is contained in the Xing header), it could keep track of the current frame (= the frame that contains the current playback position). Also, out of the number of frames, mpegaudioparse computes the playback length. By using the LAME tag information and the total number of frames, it becomes possible to figure out the number of the last frame. This takes care of cases where the padding length in the LAME tag exceeds one frame length. Also, this way, it becomes possible to know which one is the last frame without any buffering.

Anybody sees problems with this approach? Of course this is only applicable to MP3 data which is finite, which means, no web streams for example. But I'd argue that LAME tags make little sense with these anyway (well, at least for the last frame).

Comment 29 Christoph Reiter (lazka) 2016-03-29 11:00:48 UTC

(In reply to Carlos Rafael Giani from comment #28)
> Since mpegaudioparse knows the total number of frames (it is contained in
> the Xing header), it could keep track of the current frame (= the frame that
> contains the current playback position).

How would that work with seeking?

> Also, out of the number of frames,
> mpegaudioparse computes the playback length. By using the LAME tag
> information and the total number of frames, it becomes possible to figure
> out the number of the last frame. This takes care of cases where the padding
> length in the LAME tag exceeds one frame length. Also, this way, it becomes
> possible to know which one is the last frame without any buffering.

The problem I see is that we need to know when we are in the second to last frame (we need to attach two GstAudioClippingMeta to the last two frames). With frames having variable frame size we can't guess based on the file offset.

Comment 30 Carlos Rafael Giani 2016-03-29 11:15:10 UTC

My proposal indeed relies on constant frame sizes. However, from what I've seen in the mpegaudioparse source, the frame size depends on the MPEG version and layer numbers (see gstmpegaudioparse.c:733). Can these change in the middle of a bitstream?

With constant frames, seeking is also not really a problem of course, because then it is easily possible to figure out the current frame number from the current position.

Comment 31 Christoph Reiter (lazka) 2016-03-29 11:40:39 UTC

> My proposal indeed relies on constant frame sizes. However, from what I've seen in the mpegaudioparse source, the frame size depends on the MPEG version and layer numbers (see gstmpegaudioparse.c:733). Can these change in the middle of a bitstream?

It also depends on the bitrate and that usually changes every frame. That's why we have xiph/vbri tags to detect the duration and have a seek table despite variable size frames.

You can see the frame sizes using "GST_DEBUG=mpegaudioparse:5"

Comment 32 Carlos Rafael Giani 2016-03-29 11:44:48 UTC

Ah, but you are talking about byte-level seeking. I was referring to operations in the time domain. It is my impression that the number of *samples* per frame is constant, but that in VBR MP3s, these fixed number of samples can be encoded with a varying bitrate. So, if downstream requests a time seek, then it should work.

This does break down however if downstream does a byte seek, yes.

There is still the option to constrain the segment though. So far, this one still seems like the least problematic option to me.

Comment 33 Christoph Reiter (lazka) 2016-03-29 11:57:00 UTC

> Ah, but you are talking about byte-level seeking. I was referring to operations in the time domain. It is my impression that the number of *samples* per frame is constant, but that in VBR MP3s, these fixed number of samples can be encoded with a varying bitrate. So, if downstream requests a time seek, then it should work.

Ah, yes, samples are constant. But afaik time based seeking just uses the xiph seek table and interpolation and thus isn't exact. If we are one frame of we might clip away the real signal.

Comment 34 Carlos Rafael Giani 2016-03-29 15:59:23 UTC

(In reply to Christoph Reiter (lazka) from comment #33)
> > Ah, but you are talking about byte-level seeking. I was referring to operations in the time domain. It is my impression that the number of *samples* per frame is constant, but that in VBR MP3s, these fixed number of samples can be encoded with a varying bitrate. So, if downstream requests a time seek, then it should work.
> 
> Ah, yes, samples are constant. But afaik time based seeking just uses the
> xiph seek table and interpolation and thus isn't exact. If we are one frame
> of we might clip away the real signal.

Then I'd stick with the segment based approach. It works well (it is in use in production here already), does not need any additional buffering, and can clip exactly, because it relies on Xing/LAME/VBRI information. (For example, mpg123 also relies on the duration computed with Xing/LAME/VBRI data for trimming the padding samples, and it works flawlessly.) The baseparse patch needs cleanup though.

Comment 35 Cédric Bellegarde 2017-11-28 16:03:19 UTC

Any news on this feature?

Comment 36 Olivier Berten 2018-03-01 13:02:39 UTC

bump! ;-)

Comment 37 GStreamer system administrator 2018-11-03 14:42:23 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/31.