After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 615813 - id3v2mux: Transcoding from flac to mp3, UTF-8 tags get mangled
id3v2mux: Transcoding from flac to mp3, UTF-8 tags get mangled
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on: 626069
Blocks:
 
 
Reported: 2010-04-15 06:36 UTC by Clarke Wixon
Modified: 2018-11-03 14:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
900k head of flac file exhibiting metadata encoding translation issue (900.00 KB, application/octet-stream)
2010-04-15 15:09 UTC, Clarke Wixon
Details

Description Clarke Wixon 2010-04-15 06:36:44 UTC
I'm using gstreamer to transcode flac files to mp3 while keeping the metadata, using the following pipeline:

gst-launch filesrc location="foo.flac" ! decodebin ! audioconvert ! \
lamemp3enc target=quality quality=4 encoding-engine-quality=2 ! xingmux ! id3v2mux ! \
filesink location="bar.mp3"

It looks like UTF-8 encoded tags in the flac files are being treated as 8-bit ISO-8859-1 tags, so each byte of any multi-byte UTF-8 characters in the original tag will mistakenly be re-encoded into UTF-8, resulting in garbage.

For example, I have foo.flac containing the following metadata (according to metaflac --list):

comment[1]: producer=Peter Tägtgren

That's in UTF-8, so the "ä" character is encoded as two bytes, C3 A4.

After the pipeline, I get an bar.mp3 containing this (according to id3demux ! fakesink -t):

producer[xxx]=Peter Tägtgren

That's also in UTF-8, so the original single "ä" character is now four hideous bytes, C3 83 C2 A4.

Should (or can) id3v2mux identify what encoding is used by incoming metadata?  If not, is there a manual workaround?
Comment 1 Tim-Philipp Müller 2010-04-15 08:01:51 UTC
Could you attach the beginning of the input file by any chance?

 $ head --bytes=900k foo.flac > foo-head.flac

should do the trick. Could you also post the output of this command (just to make sure):

 $ hexdump foo.flac | tail -n 10
Comment 2 Clarke Wixon 2010-04-15 15:09:46 UTC
Created attachment 158820 [details]
900k head of flac file exhibiting metadata encoding translation issue
Comment 3 Clarke Wixon 2010-04-15 15:10:59 UTC
Certainly.  Here's the tail hexdump:

*
27a31a0 ffff ffff ffff ffff ffff c3ff ffff ffff
27a31b0 ffff ffff ffff ffff ffff ffff ffff ffff
*
27a32a0 ffff ffff ffff ffff ffff fcff d55f f8ff
27a32b0 18c9 b2e0 2fae 0000 0000 0000 1863 f8ff
27a32c0 18c9 b2e0 28af 0000 0000 0000 6df4 f8ff
27a32d0 18c9 b2e0 75b0 0000 0000 0000 f95c f8ff
27a32e0 1879 b2e0 0ab1 745b 0000 0000 0000 154a
27a32f0

And I have attached the head as a binary file.
Comment 4 Tim-Philipp Müller 2010-04-15 22:21:46 UTC
Ah, I see, something is going wrong with the freeform 'extended comment' tags. Thanks for the sample file.
Comment 5 Tim-Philipp Müller 2010-04-15 22:36:23 UTC
This looks like a bug in taglib at first glance (1.6.2-1 debian sid version here): we clearly set the text encoding type to UTF-8 and pass the string as UTF-8.
Comment 6 Tim-Philipp Müller 2010-04-15 22:40:07 UTC
For what it's worth, id3mux from gst-plugins-bad (which at some point will replace the taglib-based id3v2mux in -good) seems to get it right.
Comment 7 Clarke Wixon 2010-04-15 23:21:01 UTC
Thanks for that last hint.  I wasn't aware that id3mux was intended to replace id3v2mux; I had assumed the other way around.  I'll try it.

Actually, I did previously try it at some point, and at the time id3mux failed to carry over the embedded cover art, which id3v2mux did accomplish.  But I was using an earlier GST version then, and I have since updated to the latest versions (before reporting this bug), so I'll try again with an up-to-date id3mux.
Comment 8 Tim-Philipp Müller 2010-04-15 23:29:20 UTC
There was a rather broken id3mux in -ugly for a long time (bundled with mad iirc); id3v2mux was then written to replace that. At some point we then removed the broken id3mux in -ugly and added a new-from-scratch id3mux to -bad.
Comment 9 Clarke Wixon 2010-04-16 18:41:44 UTC
Well, updating my pipeline to replace id3v2mux in -good with id3mux in -bad certainly seems to solve the problem.  I haven't probed it in great detail, but the tag discussed above ("producer" in a user-defined frame) is encoded in UTF-16, while the rest (containing only 7-bit characters) appear to be ISO-8859-1, and that's OK with me.

This outcome doesn't exactly close the bug, but it's a successful workaround for me.  And if id3v2mux's days are numbered, it's probably not worth putting too much into this.

Incidentally, I am now getting cover art correctly via id3mux; I think Bug 598733 (resolved/fixed) explains why I didn't before, with an earlier version.
Comment 10 Olivier Crête 2018-05-04 11:16:27 UTC
This is still reproducible, and in 1.14 we still have id3v2mux separate from id3mux !
Comment 11 GStreamer system administrator 2018-11-03 14:41:41 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/26.