After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 759778 - flacparse: large padding headers cause high memory consumption
flacparse: large padding headers cause high memory consumption
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal normal
: git master
Assigned To: Reynaldo H. Verdejo Pinochet
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2015-12-22 15:02 UTC by Carlos Rafael Giani
Modified: 2018-11-03 15:06 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
skip padding metadata blocks (4.15 KB, patch)
2015-12-29 18:23 UTC, Reynaldo H. Verdejo Pinochet
none Details | Review
skip padding metadata blocks (5.02 KB, patch)
2015-12-30 06:32 UTC, Reynaldo H. Verdejo Pinochet
none Details | Review
skip padding metadata blocks (5.44 KB, patch)
2015-12-30 06:47 UTC, Reynaldo H. Verdejo Pinochet
reviewed Details | Review

Description Carlos Rafael Giani 2015-12-22 15:02:46 UTC
Some FLAC files have large padding headers. According to the FLAC spec, padding headers do not contain valid data and mainly exist as a reserve space for future metadata modifications/additions. Yet, flacparse accumulates incoming buffers until all headers are parsed, including padding headers, and then feeds all this data in one go to downstream. As a result, playback can take considerably longer to start, especially on embedded devices. Memory usage also greatly increases, and can lead to out-of-memory errors on such devices. It would be good if flacparse ignores/discards padding headers and _not_ feed them to downstream. Better yet, since the size of the header is known from the start, just skip it entirely (at least in pull mode).

Test files can be generated with the following gst-launch line:

gst-launch-1.0 audiotestsrc wave=sine num-buffers=30 samplesperbuffer=44100 ! "audio/x-raw, rate=44100, channels=2" ! flacenc padding=16777215 ! filesink location=test-padding.flac

Note the padding=16777215 property. 16777215 is the maximum size flacenc will allow for a padding block. Values larger than that cause libflac to apply padding in a different, undetermined way.
Comment 1 Sebastian Dröge (slomo) 2015-12-28 12:11:02 UTC
Should be possible to let flacparse directly skip over padding. Nowadays it's possible for baseparse to skip more than the available data.

Do you want to provide a patch for this?
Comment 2 Tim-Philipp Müller 2015-12-28 15:00:19 UTC
The question is: should flacparse really skip over such padding and effectively strip it off?

Or, in other words, should:

 filesrc ! flacparse ! filesink

result in a different file in case of a perfectly-formed input file?

I'm not so sure about that.
Comment 3 Reynaldo H. Verdejo Pinochet 2015-12-28 22:57:15 UTC
What about making it property-set conditional? If the user
explicitly ask for padding headers to be stripped then 
filesrc ! flacparse ! filesink not being an identity seems
like a sane compromise - if at compromise at all.
Comment 4 Tim-Philipp Müller 2015-12-28 23:05:11 UTC
I think making this an opt-in property is the worst of all options, that doesn't work well in a decodebin/playbin context, leading to the reported issue. We can probably/hopefully do something smarter that always works right :)
Comment 5 Carlos Rafael Giani 2015-12-29 12:03:44 UTC
If anything, it should be opt-out. As the spec says, the padding header is mostly useful for when mastering FLAC data, so skipping it is preferable in playback situations. To me, this means that aside from pipelines where both in- and output are FLAC and the user wants to preserve the headers, there aren't any use cases where you want to keep the padding header.

The remaining question is if such FLAC-editing pipelines would still use decodebin. I think it is a possibility. It is however still possible to intercept the newly created elements during the autoplugging procedure and set such a "skip-padding" property to FALSE. So, I'd add a note to flacparse about this issue.
Comment 6 Reynaldo H. Verdejo Pinochet 2015-12-29 18:23:11 UTC
Created attachment 318009 [details] [review]
skip padding metadata blocks

Drafted. ~Working only if modded  as:

- skipsize = framesize;
+ skipsize = framesize - map.size;

As it is, it seems to fail to find any valid data after
the skip. Still working on it, maybe I'm over-skipping or
failing to set the parsing state accordingly (maybe it's
the last metadata block?).  Should be enough to confirm
the approach is correct though. All comments welcome ;)
Comment 7 Reynaldo H. Verdejo Pinochet 2015-12-30 06:32:16 UTC
Created attachment 318021 [details] [review]
skip padding metadata blocks

I was indeed missing proper parsing state handling. This one works
as expected.
Comment 8 Reynaldo H. Verdejo Pinochet 2015-12-30 06:47:24 UTC
Created attachment 318022 [details] [review]
skip padding metadata blocks

And somehow I botched the getter for the new prop. Fixed now.
Comment 9 Sebastian Dröge (slomo) 2015-12-30 08:50:03 UTC
Comment on attachment 318022 [details] [review]
skip padding metadata blocks

Looks good to me, opinions on the property and default setting?
Comment 10 Carlos Rafael Giani 2015-12-30 08:54:28 UTC
I agree with the default setting (it is precisely the opt-out I mentioned earlier). I do not think the "Skipping %u bytes of padding metadata" log should use the warning level, since warnings should indicate that something unexpected, undefined, incorrect happened that might lead to trouble. Perhaps use INFO instead of WARNING.
Comment 11 Reynaldo H. Verdejo Pinochet 2015-12-30 09:19:25 UTC
(In reply to Carlos Rafael Giani from comment #10)
> [..]
> earlier). I do not think the "Skipping %u bytes of padding metadata" log
> should use the warning level, since warnings should indicate that something
> unexpected, undefined, incorrect happened that might lead to trouble.
> Perhaps use INFO instead of WARNING.

I understand your concern but IMHO I'm abusing the level for a good
reason: The parser is discarding valid data. The warning is signaling a
switchable (default) action that might lead to (sic) *problems* afterwards ->
The stream being stripped of a - valid - metadata block the user might
actually be relying on.
Comment 12 Carlos Rafael Giani 2015-12-30 09:28:47 UTC
But this is what the INFO level is there for. WARNING is for *abnormal* behavior. Also, as said, the spec itself states that the padding header does not contain any valid data. Stripping it is perfectly OK in the spec. The only situation where it might cause an issue is if the user wants to preserve the size of the padding header for some reason (typically during editing). Worst case, the padding header is stripped, metadata is expanded later, and the encoder/editor has to perform more I/O work (because it has to shift the succeeding audio blocks further outwards to make room for the extra metadata), and perhaps a new padding block is generated. But no information is ever actually *lost*.
Comment 13 Tim-Philipp Müller 2015-12-30 09:43:32 UTC
Ingnoring memory usage for now, where does all the processing delay/overhead actually occure? In the parser? Why is that, because of the time it takes to read the data from a slow medium? Processing overhead in the parser? Processing overhead in the decodersink(?)?

I think we're just going for the easy stop-gap solution here, I'm not convinced yet that it's the best solution.
Comment 14 Sebastian Dröge (slomo) 2015-12-30 09:51:50 UTC
The memory usage comes from putting the whole padding header into a buffer at once.
Comment 15 Carlos Rafael Giani 2015-12-30 10:00:42 UTC
It also causes measurable delays in flacdec on embedded devices (the PC is probably too fast to notice). Also, if log levels are turned up high enough, the *entire* buffer is logged, causing even bigger memory usage, and much higher delays. Worst of all, it is totally pointless - it logs contents which are meaningless anyway. I am not talking about memdumps here, but about lines like:

"flacdec gstflacdec.c:250:gst_flac_dec_set_format:<flacdec0> sink caps: audio/x-flac, channels=(int)2, framed=(boolean)true, rate=(int)44100, streamheader=(buffer)< 7f464c41430100000 ....."
Comment 16 Tim-Philipp Müller 2015-12-31 14:55:20 UTC
> Also, if log levels are turned up high enough,
> the *entire* buffer is logged, causing even bigger memory usage, and much
> higher delays. Worst of all, it is totally pointless - it logs contents
> which are meaningless anyway. I am not talking about memdumps here, but
> about lines like (snip)

This is a general and separate problem, and it requires a general solution, see bug #760027.
Comment 17 GStreamer system administrator 2018-11-03 15:06:55 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/248.