GNOME Bugzilla – Bug 727802
md5 checksum is missing from flac files encoded with encodebin
Last modified: 2018-02-05 00:46:35 UTC
Sound-Juicer uses encodebin to encode the files it rips. A bug report was filed in fedora that the flac files it generates don't contain the stream md5 checksum which is used to verify that the file is not corrupt. This is https://bugzilla.redhat.com/show_bug.cgi?id=961881 flac --test produces the error: WARNING, cannot check MD5 signature since it was unset in the STREAMINFO I've investigated this a bit, the MD5 signature is written by libFLAC in update_metadata_() in src/libFLAC/stream_encoder.c using seek() and write() vfuncs, which properly call back into gstflacenc.c. The seeking seems to work, but the data never seems to get to the file sink to get written to disk. I've been able to reproduce this using 'encoding' from gstreamer source: (wav file came from http://www.nch.com.au/acm/11k16bitpcm.wav linked from https://en.wikipedia.org/wiki/WAV but any random file should do the trick). $ gst-plugins-base/tests/examples/encoding -f audio/x-flac -a audio/x-flac -o /home/teuf/output.flac file:///home/teuf/11k16bitpcm.wav After that: (metaflac comes with the 'flac' package) $ metaflac --list --block-number=0 /home/teuf/output.flac METADATA block #0 type: 0 (STREAMINFO) is last: false length: 34 minimum blocksize: 4608 samples maximum blocksize: 4608 samples minimum framesize: 0 bytes maximum framesize: 0 bytes sample_rate: 11025 Hz channels: 1 bits-per-sample: 16 total samples: 0 MD5 signature: 00000000000000000000000000000000 If instead I do things manually with: $ gst-launch-1.0 filesrc location=/home/teuf/11k16bitpcm.wav ! wavparse ! flacenc ! filesink location=/home/teuf/output.flac $ metaflac --list --block-number=0 /home/teuf/output.flac METADATA block #0 type: 0 (STREAMINFO) is last: false length: 34 minimum blocksize: 4608 samples maximum blocksize: 4608 samples minimum framesize: 214 bytes maximum framesize: 7700 bytes sample_rate: 11025 Hz channels: 1 bits-per-sample: 16 total samples: 152267 MD5 signature: 7ad725dd59f9d1f4bf6d50cf80aee75d The minimum/maximum framesize should be written by libFLAC in the same way (calls to seek/write vfuncs when the FLAC stream is finished). libFLAC also tries to write seek tables when the stream is finished, but I haven't checked whether they are impacted by this bug or not. I've run the 'encoding' test with the 1.2 git branch, the gst-launch test has been done with the fedora 20 packages (gstreamer 1.2.3)
This sounds like something inside encodebin causes the seeking to the beginning at the end of the track to fail (or to be not possible at all).
Can be reproduced with > ... ! flacenc ! flacparse ! filesink Problem here is that flacparse claims to do the seeks for rewriting the headers properly... but actually doesn't and also drops the data we send after seeking (as they are incomplete flac frames, it's just the few bytes that need to be rewritten). So multiple problems here: 1) flacparse (and baseparse) should not claim to support seeking in BYTES format if they actually don't completely (flacparse supports it, but only if complete frames arrive 2) encodebin should not plug parsers unless required 3) Also that it plugs flactag here although not required is suspicious (but flactag does not break anything here). encodebin should detect that flacenc can do the tagging already 1) and 3) are more collateral problems, 2) is the real problem. Or one could implement proper seeking there but that seems rather complicated and not too useful.
2) can be fixed probably by having encodebin not plug parsers if no muxer/formatter is used or if they don't care about parsing 3) can probably be fixed by moving the formatters (which are only used if no muxer is used) from after the streamcombiner to between streamsplitter and streamcombiner... and only put them into those branches that don't have a GstTagSetter element
I can work on this :)
Luis?
I just retested this and it is still happening. Will add it to my ToDo list. Sorry I hadn't looked more into it before.
Further reports from Sound Juicer users about this being a blocker for some. Will bump it up in the priority list.
Created attachment 337578 [details] [review] only plug a parser if needed Is this what you had in mind ? I'm not super clear whether success/failure of linking without the parser is quite what's right here, but I don't see another way to tell if a parser is "needed". This makes the encodebin flac output have the MD5, and forcing the "need parser" path has the previous no-MD5 behavior.
Review of attachment 337578 [details] [review]: I don't think this is a good idea. We should always plug a parser if one exists, no matter what. It will add additional information, etc that will be useful for muxing. Instead of this, we should rather fix flacparse to not drop MD5 :)
Hmm, I might have misunderstood your comment 2 then. I'll look at flacparse instead :)
Created attachment 338192 [details] [review] fix header rewriting Turns out to need several changes. One of which is exposing gst_base_parse_drain so I can use it in flacparse (otherwise, we can get the last buffer reordered after the seek to header, which hilarious consequences). Is that acceptable ? I'll post the core patch later if so.
Review of attachment 338192 [details] [review]: Not sure if this is the right direction to go ::: gst/audioparsers/gstflacparse.c @@ +797,3 @@ + if (flacparse->byte_mode) { + *skipsize = 0; + framesize = map.size; This would mean that absolutely no parsing at all would happen if upstream has a BYTE segment, e.g. filesrc ! flacparse. That doesn't seem right. Also even after a segment update we shouldn't stop parsing IMHO @@ +1783,3 @@ + gst_base_parse_set_min_frame_size (GST_BASE_PARSE (flacparse), 0); + /* we must drain any pending data before the seek */ + gst_base_parse_drain (parse); You probably want to do something with the return value here. Also you can get segment events that are only updates, not actual seeks.
How can you tell whether you'll get frames or random unframed stuff ? Maybe detect whether you're going backwards, and only skip parsing when you did ?
best to check flacenc there. ideally we would only get framed data... but who knows
We get a few buffers, 16 bytes, 5 bytes, 6 bytes, falling in the middle of the header. It seems likely other encoders also do this piecemeal type of patching.
We could remember where/how the headers were for this case :)
OK, I'll do that.
Created attachment 338337 [details] [review] fix header rewriting
Review of attachment 338337 [details] [review]: ::: gst/audioparsers/gstflacparse.c @@ +1603,3 @@ res = GST_BASE_PARSE_FLOW_DROPPED; } else { + if (flacparse->rewriting_header) { After rewriting the header it might in theory continue with the normal frames @@ +1788,3 @@ + gst_base_parse_set_min_frame_size (GST_BASE_PARSE (flacparse), 0); + /* we must drain any pending data before the seek */ + gst_base_parse_drain (parse); Maybe baseparse should already do that, but this still is problematic for SEGMENT events that are just updates and no seek
Created attachment 338344 [details] [review] fix header rewriting
I don't test for the data crossing the header/data boundary in that patch. It can arguably mean either.
Review of attachment 338344 [details] [review]: It can mean either what? If non-audio data is received after a seek after the header, that would be an error ::: gst/audioparsers/gstflacparse.c @@ +1788,3 @@ + if (segment->format == GST_FORMAT_BYTES) { + /* we must pass every header update now */ + gst_base_parse_set_min_frame_size (GST_BASE_PARSE (flacparse), 0); Only if we know the header size though. This probably has weird effects if you receive the first BYTES segment?
I meant, if a buffer of N bytes is written such as the start of the data is before the end of the header, but the last byte is after it. Shouldn't happen in practice, though.
Created attachment 338392 [details] [review] fix header rewriting
Review of attachment 338392 [details] [review]: Let's go with this (after 1.10.0)
Comment on attachment 338392 [details] [review] fix header rewriting Please merge, this also needs the baseparse part of the patch
commit 4caf66fbca89087baa346680bcd46e6809964c6c Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Thu Nov 10 12:47:37 2016 +0000 baseparse: expose gst_base_parse_drain commit adeee44b07a173b9ab4253216caba8f66dd43abb Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Fri Oct 21 15:49:36 2016 +0100 flacparse: fix header rewriting being ignored https://bugzilla.gnome.org/show_bug.cgi?id=727802
Unfortunately while this fixes the MD5 metadata it breaks the vorbis comments. There have been some reports that sound-juicer is extracting flac flies with corrupted metadata recently[1,2], reverting the changes to gstflacparse.[ch] in commit adeee44b fixes the problem with the vorbis tags. Here is the output of metaflac --list showing the first and third metadata data blocks with gst-plugins-good 1.12.3 and the same version with the changes to gstflacparse.[ch] in commit adeee44b reverted v 1.12.3 METADATA block #0 type: 0 (STREAMINFO) is last: false length: 34 minimum blocksize: 4608 samples maximum blocksize: 4608 samples minimum framesize: 798 bytes maximum framesize: 10089 bytes sample_rate: 44100 Hz channels: 2 bits-per-sample: 16 total samples: 3705576 MD5 signature: 3e664118223eadd47b5830ccedfa16cc METADATA block #3 type: 4 (VORBIS_COMMENT) is last: true length: 321 vendor string: GStreamer encoded vorbiscomment comments: 7 comment[0]: TITLE=My Lady Hunnsdon's Puffe (P 54) comment[1]: ARTIST=Paul O'Dette comment[2]: TRACKNUMBER=1 comment[3]: TRACKTOTAL=23 comment[4]: ALBUM=My Favorite D comment[5]: comment[6]: v 1.12.3 with adeee44b reverted With this change reverted METADATA block #0 type: 0 (STREAMINFO) is last: false length: 34 minimum blocksize: 4608 samples maximum blocksize: 4608 samples minimum framesize: 0 bytes maximum framesize: 0 bytes sample_rate: 44100 Hz channels: 2 bits-per-sample: 16 total samples: 3705576 MD5 signature: 00000000000000000000000000000000 METADATA block #3 type: 4 (VORBIS_COMMENT) is last: true length: 321 vendor string: GStreamer encoded vorbiscomment comments: 11 comment[0]: TITLE=My Lady Hunnsdon's Puffe (P 54) comment[1]: ARTIST=Paul O'Dette comment[2]: TRACKNUMBER=1 comment[3]: TRACKTOTAL=23 comment[4]: ALBUM=My Favorite Dowland comment[5]: COMPOSER=John Dowland comment[6]: ALBUMARTIST=Paul O'Dette comment[7]: GENRE=Classical comment[8]: DATE=2014 comment[9]: DISCID=4711a817 comment[10]: MUSICBRAINZ_DISCID=Oz7pe6Alnaq6h3zQsog7caD5tMM- [1] https://bugzilla.gnome.org/show_bug.cgi?id=784092 [2] https://bugzilla.gnome.org/show_bug.cgi?id=785558
Can confirm this problem on latest rhythmbox git ( with gstreamer 1.12.3 ). Soundjuicer uses rhythmbox encoding profiles. Hence, equally affected by this issue. [user@test:~/bugs/rhythmbox/flac] $ flac -tVF test.flac flac 1.3.2 test.flac: ERROR while decoding metadata state = FLAC__STREAM_DECODER_END_OF_STREAM [Rhythmbox bug]: https://bugzilla.gnome.org/show_bug.cgi?id=789946
One thing I couldn't figure out is how banshee ( with same gstreamer 1.12.3 ) works fine on the system. [user@test:~/bugs/rhythmbox/flac] $ flac -tVF test_banshee.flac flac 1.3.2 test_banshee.flac: ok
(In reply to gkrithi8 from comment #30) > One thing I couldn't figure out is how banshee ( with same gstreamer 1.12.3 > ) works fine on the system. Uses "flacenc" directly rather than "encodebin".
Should this be reopened now that the commit has been reverted? https://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/?id=d4c04cb079661948ea520d55da2cc77e5d7ae295
There's another / a new bug for it now: bug #792985
I guess we should not leave it marked as fixed then *** This bug has been marked as a duplicate of bug 792985 ***