Bug 727802 – md5 checksum is missing from flac files encoded with encodebin

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 727802 - md5 checksum is missing from flac files encoded with encodebin


Summary:	md5 checksum is missing from flac files encoded with encodebin


Status:	RESOLVED DUPLICATE of bug 792985

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-base
Version:	1.2.3
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	1.10.1
Assigned To:	Luis de Bethencourt
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:	707963

Reported:	2014-04-08 08:52 UTC by Christophe Fergeau
Modified:	2018-02-05 00:46 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
only plug a parser if needed (2.20 KB, patch) 2016-10-13 11:02 UTC, Vincent Penquerc'h	none	Details \| Review
fix header rewriting (3.77 KB, patch) 2016-10-21 14:52 UTC, Vincent Penquerc'h	none	Details \| Review
fix header rewriting (4.79 KB, patch) 2016-10-24 11:20 UTC, Vincent Penquerc'h	none	Details \| Review
fix header rewriting (5.35 KB, patch) 2016-10-24 13:52 UTC, Vincent Penquerc'h	none	Details \| Review
fix header rewriting (5.38 KB, patch) 2016-10-25 10:21 UTC, Vincent Penquerc'h	committed	Details \| Review

Description Christophe Fergeau 2014-04-08 08:52:00 UTC

Sound-Juicer uses encodebin to encode the files it rips. A bug report was filed in fedora that the flac files it generates don't contain the stream md5 checksum which is used to verify that the file is not corrupt. This is https://bugzilla.redhat.com/show_bug.cgi?id=961881
flac --test produces the error:
WARNING, cannot check MD5 signature since it was unset in the STREAMINFO

I've investigated this a bit, the MD5 signature is written by libFLAC in update_metadata_() in src/libFLAC/stream_encoder.c using seek() and write() vfuncs, which properly call back into gstflacenc.c. The seeking seems to work, but the data never seems to get to the file sink to get written to disk.



I've been able to reproduce this using 'encoding' from gstreamer source:
(wav file came from http://www.nch.com.au/acm/11k16bitpcm.wav linked from https://en.wikipedia.org/wiki/WAV but any random file should do the trick).
 
$ gst-plugins-base/tests/examples/encoding -f audio/x-flac -a audio/x-flac  -o /home/teuf/output.flac file:///home/teuf/11k16bitpcm.wav

After that: (metaflac comes with the 'flac' package)

$ metaflac --list --block-number=0 /home/teuf/output.flac
METADATA block #0
  type: 0 (STREAMINFO)
  is last: false
  length: 34
  minimum blocksize: 4608 samples
  maximum blocksize: 4608 samples
  minimum framesize: 0 bytes
  maximum framesize: 0 bytes
  sample_rate: 11025 Hz
  channels: 1
  bits-per-sample: 16
  total samples: 0
  MD5 signature: 00000000000000000000000000000000



If instead I do things manually with:

$ gst-launch-1.0 filesrc location=/home/teuf/11k16bitpcm.wav ! wavparse ! flacenc ! filesink location=/home/teuf/output.flac

$ metaflac --list --block-number=0 /home/teuf/output.flac
METADATA block #0
  type: 0 (STREAMINFO)
  is last: false
  length: 34
  minimum blocksize: 4608 samples
  maximum blocksize: 4608 samples
  minimum framesize: 214 bytes
  maximum framesize: 7700 bytes
  sample_rate: 11025 Hz
  channels: 1
  bits-per-sample: 16
  total samples: 152267
  MD5 signature: 7ad725dd59f9d1f4bf6d50cf80aee75d

The minimum/maximum framesize should be written by libFLAC in the same way (calls to seek/write vfuncs when the FLAC stream is finished). libFLAC also tries to write seek tables when the stream is finished, but I haven't checked whether they are impacted by this bug or not.

I've run the 'encoding' test with the 1.2 git branch, the gst-launch test has been done with the fedora 20 packages (gstreamer 1.2.3)

Comment 1 Sebastian Dröge (slomo) 2014-04-08 08:56:24 UTC

This sounds like something inside encodebin causes the seeking to the beginning at the end of the track to fail (or to be not possible at all).

Comment 2 Sebastian Dröge (slomo) 2014-05-03 14:49:38 UTC

Can be reproduced with
> ... ! flacenc ! flacparse ! filesink

Problem here is that flacparse claims to do the seeks for rewriting the headers properly... but actually doesn't and also drops the data we send after seeking (as they are incomplete flac frames, it's just the few bytes that need to be rewritten).

So multiple problems here:
1) flacparse (and baseparse) should not claim to support seeking in BYTES format if they actually don't completely (flacparse supports it, but only if complete frames arrive
2) encodebin should not plug parsers unless required
3) Also that it plugs flactag here although not required is suspicious (but flactag does not break anything here). encodebin should detect that flacenc can do the tagging already

1) and 3) are more collateral problems, 2) is the real problem. Or one could implement proper seeking there but that seems rather complicated and not too useful.

Comment 3 Sebastian Dröge (slomo) 2014-05-06 20:57:18 UTC

2) can be fixed probably by having encodebin not plug parsers if no muxer/formatter is used or if they don't care about parsing

3) can probably be fixed by moving the formatters (which are only used if no muxer is used) from after the streamcombiner to between streamsplitter and streamcombiner... and only put them into those branches that don't have a GstTagSetter element

Comment 4 Luis de Bethencourt 2014-05-06 21:04:15 UTC

I can work on this :)

Comment 5 Sebastian Dröge (slomo) 2014-09-12 11:42:39 UTC

Luis?

Comment 6 Luis de Bethencourt 2014-10-24 12:56:11 UTC

I just retested this and it is still happening. Will add it to my ToDo list.

Sorry I hadn't looked more into it before.

Comment 7 Luis de Bethencourt 2014-10-24 13:21:17 UTC

Further reports from Sound Juicer users about this being a blocker for some. Will bump it up in the priority list.

Comment 8 Vincent Penquerc'h 2016-10-13 11:02:38 UTC

Created attachment 337578 [details] [review]
only plug a parser if needed

Is this what you had in mind ? I'm not super clear whether success/failure of linking without the parser is quite what's right here, but I don't see another way to tell if a parser is "needed".

This makes the encodebin flac output have the MD5, and forcing the "need parser" path has the previous no-MD5 behavior.

Comment 9 Sebastian Dröge (slomo) 2016-10-20 10:44:25 UTC

Review of attachment 337578 [details] [review]:

I don't think this is a good idea. We should always plug a parser if one exists, no matter what. It will add additional information, etc that will be useful for muxing.


Instead of this, we should rather fix flacparse to not drop MD5 :)

Comment 10 Vincent Penquerc'h 2016-10-20 14:07:12 UTC

Hmm, I might have misunderstood your comment 2 then.
I'll look at flacparse instead :)

Comment 11 Vincent Penquerc'h 2016-10-21 14:52:19 UTC

Created attachment 338192 [details] [review]
fix header rewriting

Turns out to need several changes. One of which is exposing gst_base_parse_drain so I can use it in flacparse (otherwise, we can get the last buffer reordered after the seek to header, which hilarious consequences). Is that acceptable ? I'll post the core patch later if so.

Comment 12 Sebastian Dröge (slomo) 2016-10-22 08:04:24 UTC

Review of attachment 338192 [details] [review]:

Not sure if this is the right direction to go

::: gst/audioparsers/gstflacparse.c
@@ +797,3 @@
+  if (flacparse->byte_mode) {
+    *skipsize = 0;
+    framesize = map.size;

This would mean that absolutely no parsing at all would happen if upstream has a BYTE segment, e.g. filesrc ! flacparse. That doesn't seem right.

Also even after a segment update we shouldn't stop parsing IMHO

@@ +1783,3 @@
+        gst_base_parse_set_min_frame_size (GST_BASE_PARSE (flacparse), 0);
+        /* we must drain any pending data before the seek */
+        gst_base_parse_drain (parse);

You probably want to do something with the return value here. Also you can get segment events that are only updates, not actual seeks.

Comment 13 Vincent Penquerc'h 2016-10-24 09:12:33 UTC

How can you tell whether you'll get frames or random unframed stuff ? Maybe detect whether you're going backwards, and only skip parsing when you did ?

Comment 14 Sebastian Dröge (slomo) 2016-10-24 10:12:48 UTC

best to check flacenc there. ideally we would only get framed data... but who knows

Comment 15 Vincent Penquerc'h 2016-10-24 10:20:41 UTC

We get a few buffers, 16 bytes, 5 bytes, 6 bytes, falling in the middle of the header. It seems likely other encoders also do this piecemeal type of patching.

Comment 16 Sebastian Dröge (slomo) 2016-10-24 10:23:31 UTC

We could remember where/how the headers were for this case :)

Comment 17 Vincent Penquerc'h 2016-10-24 10:26:00 UTC

OK, I'll do that.

Comment 18 Vincent Penquerc'h 2016-10-24 11:20:12 UTC

Created attachment 338337 [details] [review]
fix header rewriting

Comment 19 Sebastian Dröge (slomo) 2016-10-24 11:44:02 UTC

Review of attachment 338337 [details] [review]:

::: gst/audioparsers/gstflacparse.c
@@ +1603,3 @@
     res = GST_BASE_PARSE_FLOW_DROPPED;
   } else {
+    if (flacparse->rewriting_header) {

After rewriting the header it might in theory continue with the normal frames

@@ +1788,3 @@
+        gst_base_parse_set_min_frame_size (GST_BASE_PARSE (flacparse), 0);
+        /* we must drain any pending data before the seek */
+        gst_base_parse_drain (parse);

Maybe baseparse should already do that, but this still is problematic for SEGMENT events that are just updates and no seek

Comment 20 Vincent Penquerc'h 2016-10-24 13:52:41 UTC

Created attachment 338344 [details] [review]
fix header rewriting

Comment 21 Vincent Penquerc'h 2016-10-24 13:53:34 UTC

I don't test for the data crossing the header/data boundary in that patch. It can arguably mean either.

Comment 22 Sebastian Dröge (slomo) 2016-10-25 10:02:05 UTC

Review of attachment 338344 [details] [review]:

It can mean either what? If non-audio data is received after a seek after the header, that would be an error

::: gst/audioparsers/gstflacparse.c
@@ +1788,3 @@
+      if (segment->format == GST_FORMAT_BYTES) {
+        /* we must pass every header update now */
+        gst_base_parse_set_min_frame_size (GST_BASE_PARSE (flacparse), 0);

Only if we know the header size though. This probably has weird effects if you receive the first BYTES segment?

Comment 23 Vincent Penquerc'h 2016-10-25 10:20:45 UTC

I meant, if a buffer of N bytes is written such as the start of the data is before the end of the header, but the last byte is after it. Shouldn't happen in practice, though.

Comment 24 Vincent Penquerc'h 2016-10-25 10:21:17 UTC

Created attachment 338392 [details] [review]
fix header rewriting

Comment 25 Sebastian Dröge (slomo) 2016-10-25 10:39:13 UTC

Review of attachment 338392 [details] [review]:

Let's go with this (after 1.10.0)

Comment 26 Sebastian Dröge (slomo) 2016-11-01 17:53:07 UTC

Comment on attachment 338392 [details] [review]
fix header rewriting

Please merge, this also needs the baseparse part of the patch

Comment 27 Vincent Penquerc'h 2016-11-10 12:56:31 UTC

commit 4caf66fbca89087baa346680bcd46e6809964c6c
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Thu Nov 10 12:47:37 2016 +0000

    baseparse: expose gst_base_parse_drain

commit adeee44b07a173b9ab4253216caba8f66dd43abb
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Fri Oct 21 15:49:36 2016 +0100

    flacparse: fix header rewriting being ignored
    
    https://bugzilla.gnome.org/show_bug.cgi?id=727802

Comment 28 Phillip Wood 2017-09-29 15:28:35 UTC

Unfortunately while this fixes the MD5 metadata it breaks the vorbis comments. There have been some reports that sound-juicer is extracting flac flies with corrupted metadata recently[1,2], reverting the changes to gstflacparse.[ch] in commit adeee44b fixes the problem with the vorbis tags. Here is the output of metaflac --list showing the first and third metadata data blocks with gst-plugins-good 1.12.3 and the same version with the changes to gstflacparse.[ch] in commit adeee44b reverted

v 1.12.3
METADATA block #0
  type: 0 (STREAMINFO)
  is last: false
  length: 34
  minimum blocksize: 4608 samples
  maximum blocksize: 4608 samples
  minimum framesize: 798 bytes
  maximum framesize: 10089 bytes
  sample_rate: 44100 Hz
  channels: 2
  bits-per-sample: 16
  total samples: 3705576
  MD5 signature: 3e664118223eadd47b5830ccedfa16cc
METADATA block #3
  type: 4 (VORBIS_COMMENT)
  is last: true
  length: 321
  vendor string: GStreamer encoded vorbiscomment
  comments: 7
    comment[0]: TITLE=My Lady Hunnsdon's Puffe (P 54)
    comment[1]: ARTIST=Paul O'Dette
    comment[2]: TRACKNUMBER=1
    comment[3]: TRACKTOTAL=23
    comment[4]: ALBUM=My Favorite D
    comment[5]: 
    comment[6]: 

v 1.12.3 with adeee44b reverted
With this change reverted
METADATA block #0
  type: 0 (STREAMINFO)
  is last: false
  length: 34
  minimum blocksize: 4608 samples
  maximum blocksize: 4608 samples
  minimum framesize: 0 bytes
  maximum framesize: 0 bytes
  sample_rate: 44100 Hz
  channels: 2
  bits-per-sample: 16
  total samples: 3705576
  MD5 signature: 00000000000000000000000000000000
METADATA block #3
  type: 4 (VORBIS_COMMENT)
  is last: true
  length: 321
  vendor string: GStreamer encoded vorbiscomment
  comments: 11
    comment[0]: TITLE=My Lady Hunnsdon's Puffe (P 54)
    comment[1]: ARTIST=Paul O'Dette
    comment[2]: TRACKNUMBER=1
    comment[3]: TRACKTOTAL=23
    comment[4]: ALBUM=My Favorite Dowland
    comment[5]: COMPOSER=John Dowland
    comment[6]: ALBUMARTIST=Paul O'Dette
    comment[7]: GENRE=Classical
    comment[8]: DATE=2014
    comment[9]: DISCID=4711a817
    comment[10]: MUSICBRAINZ_DISCID=Oz7pe6Alnaq6h3zQsog7caD5tMM-


[1] https://bugzilla.gnome.org/show_bug.cgi?id=784092
[2] https://bugzilla.gnome.org/show_bug.cgi?id=785558

Comment 29 gkrithi8 2017-11-06 20:00:29 UTC

Can confirm this problem on latest rhythmbox git ( with gstreamer 1.12.3 ). Soundjuicer uses rhythmbox encoding profiles. Hence, equally affected by this issue. 

[user@test:~/bugs/rhythmbox/flac] $ flac -tVF test.flac 
flac 1.3.2
test.flac: ERROR while decoding metadata
           state = FLAC__STREAM_DECODER_END_OF_STREAM

[Rhythmbox bug]: https://bugzilla.gnome.org/show_bug.cgi?id=789946

Comment 30 gkrithi8 2017-11-06 20:13:12 UTC

One thing I couldn't figure out is how banshee ( with same gstreamer 1.12.3 ) works fine on the system.

[user@test:~/bugs/rhythmbox/flac] $ flac -tVF test_banshee.flac
flac 1.3.2
test_banshee.flac: ok

Comment 31 gkrithi8 2017-11-06 22:12:17 UTC

(In reply to gkrithi8 from comment #30)
> One thing I couldn't figure out is how banshee ( with same gstreamer 1.12.3
> ) works fine on the system.

Uses "flacenc" directly rather than "encodebin".

Comment 32 sam tygier 2018-02-04 12:55:58 UTC

Should this be reopened now that the commit has been reverted? https://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/?id=d4c04cb079661948ea520d55da2cc77e5d7ae295

Comment 33 Tim-Philipp Müller 2018-02-05 00:46:13 UTC

There's another / a new bug for it now: bug #792985

Comment 34 Tim-Philipp Müller 2018-02-05 00:46:35 UTC

I guess we should not leave it marked as fixed then

*** This bug has been marked as a duplicate of bug 792985 ***