After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 451707 - [tag] UTF-8 in ID3v1 tag not correctly decoded
[tag] UTF-8 in ID3v1 tag not correctly decoded
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: Normal normal
: 0.10.14
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2007-06-27 19:18 UTC by Sebastien Bacher
Modified: 2008-01-03 17:44 UTC
See Also:
GNOME target: ---
GNOME version: 2.19/2.20



Description Sebastien Bacher 2007-06-27 19:18:28 UTC
The bug has been opened on https://bugs.launchpad.net/bugs/117154

"Binary package hint: totem

Totem/gstreamer displays ID3 tags info when playing MP3 files...
But it does not autodetect correct character set of those ID3 tags nor it
have any support to let user choose character set in Options dialog.
...
http://launchpadlibrarian.net/7871666/divka_s_perlami.mp3
mp3 with cp1250 coded id3 tag (3.8 MiB, audio/mpeg)

I use Feisty, latest updates.
I am attaching an MP3 example (free redistribution and use is allowed).
The ID3 tag is encoded in windows-1250 codepage.

Totem expect ID3 tag in UTF-8, so it shows it incorrectly:

Title: Dívka s perlami ve vlasech
Artist: Aleš Brichta
Album: Best of (ProstÄ ÃºÅ¾asný)

In Windows (czech localization, codepage 1250)
and also e.g. in Audacious 1.2.2 / Gentoo Linux
with CP1250 codepage set the info is shown correctly:

Title: Dí­vka s perlami ve vlasech
Artist: Aleš Brichta
Album: Best of (Prostě úžasný)"
Comment 1 Tim-Philipp Müller 2007-06-27 20:13:05 UTC
Actually, the encoding used in this tag is not WINDOWS-1250, but UTF-8. However, we fail to detect the UTF-8 correctly because we run g_utf8_validate() on the entire size allocated for the string within the ID3v1 tag, which includes zero string terminators in this case, which in turn are not valid UTF-8, hence g_utf8_validate() fails and the code will fall back to decoding the string using the current locale. In short, we're displaying garbage _because_ we're interpreting this string as WINDOWS-1250 :)
Comment 2 Tim-Philipp Müller 2007-06-27 22:32:13 UTC
Should be fixed in CVS:

 2007-06-27  Tim-Philipp Müller  <tim at centricular dot net>

        * gst-libs/gst/tag/tags.c: (gst_tag_freeform_string_to_utf8):
          Don't pass trailing zeroes in fixed-size string arrays in ID3v1 tags
          to utf8-validate; fixes recognition of ID3v1 tags in UTF-8 encoding
          (#451707); also, output some debugging info when dealing with
          freeform strings.

        * tests/check/libs/tag.c: (GST_START_TEST), (tag_suite):
          Add unit test for the above.

$ gst-launch-0.10 -t playbin uri=file:////samples/451707-divka_s_perlami.mp3 
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
FOUND TAG      : found by element "id3demux0".
           title: Dívka s perlami ve vlasech
          artist: Aleš Brichta
           album: Best of (Prostě úžasný)
            date: 2000-01-01
         comment: --- české písničky ---
Comment 3 Roman Polach 2007-12-26 18:06:00 UTC
Well, it works for attached divka_s_perlami.mp3, but it does not work for other
files. I am attaching link to another file, for which this problem still exists.
I can check with audacious that id3 tag is correct there: See this:

-- audacious --
Title: Vidíš, vidíš
Artist: Bratři Ebenové
Album: Já na tom dělám

-- totem --
Title: Vidí, vidí
Artist: Bratøi Ebenové
Album: Já na tom dìlám

sample mp3 where the problem still exists:
http://launchpadlibrarian.net/10535961/vidis_vidis.mp3

screenshot of totem:
http://launchpadlibrarian.net/10536027/screenshot-totem.png

screenshot of audacious:
http://launchpadlibrarian.net/10536104/screenshot-audacious.png

Should I fill a new bug or this one would be reopened?
Comment 4 Roman Polach 2008-01-03 17:44:18 UTC
Filled new bug about later problem:
http://bugzilla.gnome.org/show_bug.cgi?id=507074