GNOME Bugzilla – Bug 615813
id3v2mux: Transcoding from flac to mp3, UTF-8 tags get mangled
Last modified: 2018-11-03 14:41:41 UTC
I'm using gstreamer to transcode flac files to mp3 while keeping the metadata, using the following pipeline: gst-launch filesrc location="foo.flac" ! decodebin ! audioconvert ! \ lamemp3enc target=quality quality=4 encoding-engine-quality=2 ! xingmux ! id3v2mux ! \ filesink location="bar.mp3" It looks like UTF-8 encoded tags in the flac files are being treated as 8-bit ISO-8859-1 tags, so each byte of any multi-byte UTF-8 characters in the original tag will mistakenly be re-encoded into UTF-8, resulting in garbage. For example, I have foo.flac containing the following metadata (according to metaflac --list): comment[1]: producer=Peter Tägtgren That's in UTF-8, so the "ä" character is encoded as two bytes, C3 A4. After the pipeline, I get an bar.mp3 containing this (according to id3demux ! fakesink -t): producer[xxx]=Peter Tägtgren That's also in UTF-8, so the original single "ä" character is now four hideous bytes, C3 83 C2 A4. Should (or can) id3v2mux identify what encoding is used by incoming metadata? If not, is there a manual workaround?
Could you attach the beginning of the input file by any chance? $ head --bytes=900k foo.flac > foo-head.flac should do the trick. Could you also post the output of this command (just to make sure): $ hexdump foo.flac | tail -n 10
Created attachment 158820 [details] 900k head of flac file exhibiting metadata encoding translation issue
Certainly. Here's the tail hexdump: * 27a31a0 ffff ffff ffff ffff ffff c3ff ffff ffff 27a31b0 ffff ffff ffff ffff ffff ffff ffff ffff * 27a32a0 ffff ffff ffff ffff ffff fcff d55f f8ff 27a32b0 18c9 b2e0 2fae 0000 0000 0000 1863 f8ff 27a32c0 18c9 b2e0 28af 0000 0000 0000 6df4 f8ff 27a32d0 18c9 b2e0 75b0 0000 0000 0000 f95c f8ff 27a32e0 1879 b2e0 0ab1 745b 0000 0000 0000 154a 27a32f0 And I have attached the head as a binary file.
Ah, I see, something is going wrong with the freeform 'extended comment' tags. Thanks for the sample file.
This looks like a bug in taglib at first glance (1.6.2-1 debian sid version here): we clearly set the text encoding type to UTF-8 and pass the string as UTF-8.
For what it's worth, id3mux from gst-plugins-bad (which at some point will replace the taglib-based id3v2mux in -good) seems to get it right.
Thanks for that last hint. I wasn't aware that id3mux was intended to replace id3v2mux; I had assumed the other way around. I'll try it. Actually, I did previously try it at some point, and at the time id3mux failed to carry over the embedded cover art, which id3v2mux did accomplish. But I was using an earlier GST version then, and I have since updated to the latest versions (before reporting this bug), so I'll try again with an up-to-date id3mux.
There was a rather broken id3mux in -ugly for a long time (bundled with mad iirc); id3v2mux was then written to replace that. At some point we then removed the broken id3mux in -ugly and added a new-from-scratch id3mux to -bad.
Well, updating my pipeline to replace id3v2mux in -good with id3mux in -bad certainly seems to solve the problem. I haven't probed it in great detail, but the tag discussed above ("producer" in a user-defined frame) is encoded in UTF-16, while the rest (containing only 7-bit characters) appear to be ISO-8859-1, and that's OK with me. This outcome doesn't exactly close the bug, but it's a successful workaround for me. And if id3v2mux's days are numbered, it's probably not worth putting too much into this. Incidentally, I am now getting cover art correctly via id3mux; I think Bug 598733 (resolved/fixed) explains why I didn't before, with an earlier version.
This is still reproducible, and in 1.14 we still have id3v2mux separate from id3mux !
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/26.