GNOME Bugzilla – Bug 451707
[tag] UTF-8 in ID3v1 tag not correctly decoded
Last modified: 2008-01-03 17:44:18 UTC
The bug has been opened on https://bugs.launchpad.net/bugs/117154 "Binary package hint: totem Totem/gstreamer displays ID3 tags info when playing MP3 files... But it does not autodetect correct character set of those ID3 tags nor it have any support to let user choose character set in Options dialog. ... http://launchpadlibrarian.net/7871666/divka_s_perlami.mp3 mp3 with cp1250 coded id3 tag (3.8 MiB, audio/mpeg) I use Feisty, latest updates. I am attaching an MP3 example (free redistribution and use is allowed). The ID3 tag is encoded in windows-1250 codepage. Totem expect ID3 tag in UTF-8, so it shows it incorrectly: Title: DÃvka s perlami ve vlasech Artist: AleÅ¡ Brichta Album: Best of (ProstÄ ÃºÅ¾asný) In Windows (czech localization, codepage 1250) and also e.g. in Audacious 1.2.2 / Gentoo Linux with CP1250 codepage set the info is shown correctly: Title: Dívka s perlami ve vlasech Artist: Aleš Brichta Album: Best of (Prostě úžasný)"
Actually, the encoding used in this tag is not WINDOWS-1250, but UTF-8. However, we fail to detect the UTF-8 correctly because we run g_utf8_validate() on the entire size allocated for the string within the ID3v1 tag, which includes zero string terminators in this case, which in turn are not valid UTF-8, hence g_utf8_validate() fails and the code will fall back to decoding the string using the current locale. In short, we're displaying garbage _because_ we're interpreting this string as WINDOWS-1250 :)
Should be fixed in CVS: 2007-06-27 Tim-Philipp Müller <tim at centricular dot net> * gst-libs/gst/tag/tags.c: (gst_tag_freeform_string_to_utf8): Don't pass trailing zeroes in fixed-size string arrays in ID3v1 tags to utf8-validate; fixes recognition of ID3v1 tags in UTF-8 encoding (#451707); also, output some debugging info when dealing with freeform strings. * tests/check/libs/tag.c: (GST_START_TEST), (tag_suite): Add unit test for the above. $ gst-launch-0.10 -t playbin uri=file:////samples/451707-divka_s_perlami.mp3 Setting pipeline to PAUSED ... Pipeline is PREROLLING ... FOUND TAG : found by element "id3demux0". title: Dívka s perlami ve vlasech artist: Aleš Brichta album: Best of (Prostě úžasný) date: 2000-01-01 comment: --- české písničky ---
Well, it works for attached divka_s_perlami.mp3, but it does not work for other files. I am attaching link to another file, for which this problem still exists. I can check with audacious that id3 tag is correct there: See this: -- audacious -- Title: Vidíš, vidíš Artist: Bratři Ebenové Album: Já na tom dělám -- totem -- Title: Vidí, vidí Artist: Bratøi Ebenové Album: Já na tom dìlám sample mp3 where the problem still exists: http://launchpadlibrarian.net/10535961/vidis_vidis.mp3 screenshot of totem: http://launchpadlibrarian.net/10536027/screenshot-totem.png screenshot of audacious: http://launchpadlibrarian.net/10536104/screenshot-audacious.png Should I fill a new bug or this one would be reopened?
Filled new bug about later problem: http://bugzilla.gnome.org/show_bug.cgi?id=507074