After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 149274 - [PATCH] gst-plug mad gets id3v2 text tags in wrong encoding
[PATCH] gst-plug mad gets id3v2 text tags in wrong encoding
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins
0.8.2
Other All
: Normal normal
: 0.8.8
Assigned To: GStreamer Maintainers
GStreamer Maintainers
: 160816 321383 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2004-08-04 11:28 UTC by Xin Zhen
Modified: 2005-11-15 18:59 UTC
See Also:
GNOME target: ---
GNOME version: 2.9/2.10


Attachments
do "locale-to-utf8" conversion for id3v2 tags in mp3s (1.34 KB, patch)
2004-08-04 11:39 UTC, Xin Zhen
none Details | Review
id3v2 recoding with libenca (2.73 KB, patch)
2005-01-25 09:17 UTC, Vital Khilko
none Details | Review
id3v1 recoding with libenca (2.35 KB, patch)
2005-01-25 09:18 UTC, Vital Khilko
none Details | Review

Description Xin Zhen 2004-08-04 11:28:58 UTC
1.run Rhythmbox with gstreamer backend
2.load mp3s which are tagged in a locale-depending encoding
3.these tags get wrong display (All in ISO-8859-1 symbols)

When a text tags of id3 is marked as "ISO-8859-1", it desn't really mean
ISO-8859-1, but "locale-depending encoding". So a locale-to-utf8 conversion is
needed.

Id3v1 tags gets converted correctly in gst/tags/gstid3tag.c, but id3v2 are not.
They just get a "ISO8859-1-to-utf8" conversion (by libid3tag), so that won't
work correctly in non-ASCII locales.
Comment 1 Xin Zhen 2004-08-04 11:39:57 UTC
Created attachment 30200 [details] [review]
do "locale-to-utf8" conversion for id3v2 tags in mp3s

The little patch will do the "locale-to-utf8" conversion for gst-plugin-mad
when an ISO-8859-1 text tag are read
Comment 2 Penguin 2004-08-05 09:28:07 UTC
Thank you. It works for me. 
I hope it could be included in offical release.
Comment 3 Benjamin Otte (Company) 2004-08-11 14:33:18 UTC
According to the ID3 standard located at
http://www.id3.org/id3v2.4.0-structure.txt a tag marked as "ISO-8859-1" really
means "ISO-8859-1" and not "current locale".

In the case of ID3v1 the locale was never specified, so GStreamer uses the
hackish approach of looking at the locale.
Note that this often fails however, because modern Linux distributions use UTF-8
by default.

Which taggers do produce these broken files?
Comment 4 Christian Fredrik Kalager Schaller 2004-12-04 20:45:58 UTC
Benjamin, is this patch commited? If not should it be commited and this bug
closed? If it is not commited and shouldn't be commited either should this bug
be closed as wontfix?
Comment 5 Ronald Bultje 2004-12-16 12:04:41 UTC
Can any of the topicstarters please provide such a sample file as Benjamin
requested? We'd love to apply this patch or a similar one, but we need to be
sure that A) UTF-8 support doesn't break, B) the final patch fixes your bug and
C) check for general sanity. Please provide a file.
Comment 6 Pavel Ivanov 2005-01-01 23:48:33 UTC
Here is an mp3 for you to test.
id3tag is on russian cp1251 (windows-1251) encoding.

http://people.panama.by/pi/f0000873.mp3
Comment 7 Tim-Philipp Müller 2005-01-21 12:52:02 UTC
There is a spec for ID3v2 and always has been, and files like the above are 
clearly broken IMHO. 
 
But even _if_ one was to accept the approach you are taking, then the logic 
should at least be: 
 
  if (encoding == ID3V2_SPEC_ISO88591_REALLY_MEANING_LOCALE) 
  { 
     if (CURRENT_LOCALE == UTF-8) /* or any unicode really? */ 
     { 
        interpret_string_as_iso_8859_1(); 
     } 
     else 
     { 
        if (interpret_string_as_current_locale() == FAILED) 
          interpret_string_as_iso_8859_1(); 
     } 
  } 
 
 
I am not convinced, however, that this is the right approach. This approach 
has at least two problems: 
 
 * users tend to have files from multiple sources (friends, p2p networks, 
   or files from the days they still used Windows etc.), all of which 
   can be with strings encoded in whatever locale the source used. 
 
 * users might switch locales (e.g. consciously, or when upgrading their 
   distro, or when changing distros/OS etc.) 
 
 
I think the only semi-sane way to handle this problem is to have an 
environment variable like GSTREAMER_ID3_TAG_ENCODING that takes one or 
multiple ':'-separated encodings that are tried for ID3v2 strings in 
'iso-8859-1' encoding. 
 
Or someone should just write a small and easy-to-use app that converts those 
broken tags into unicode tags ;) 
 
Cheers 
 -Tim 
 
 
 
Comment 8 Vital Khilko 2005-01-25 09:17:19 UTC
Created attachment 36496 [details] [review]
id3v2 recoding with libenca
Comment 9 Vital Khilko 2005-01-25 09:18:20 UTC
Created attachment 36497 [details] [review]
id3v1 recoding with libenca
Comment 10 Vital Khilko 2005-01-25 09:19:34 UTC
this problem is very urgent for the languages of those using cyrillic symbols. 
So let us say for the Russian language there exists the entire "bouquet" of
encodings. And not always encoding in the file coincides with locale encoding.
I examined the published here patches and the initial code. 
And I create new patches. In them they are used the specialized library
(libenca) for determining the initial encoding. For this altogether it is only
necessary to know the language of tags (first two letters from locale name).
This approach justified itself, and now I don't have problems with codings of
tags id3v2. 
But problem with the tags id3v1 remained. Despite recoding in gst_tag_extract()
function , tags show erroneously. I don't understand why. If you please, you
will examine my patches, and, if it is possible, help with the solution of this
problem.
P.S. Sorry my bad english.
Comment 11 Benjamin Otte (Company) 2005-01-25 13:51:37 UTC
If anyone is asking me for an opinion (seems so, since I was added to the cc :)):

These files are broken.

Broken files should be hard but possible to support so there exists a way for
people to get them to work, but it involves some time on their part. Just so
they are annoyed enough that they fix their files.

I also don't want a solution that makes stuff much more complicated just because
of broken files. Adding a libenca dependency makes stuff much more complicated,
especially for packagers.


The best solution I can see is what __tim suggested: use an env var
GST_ID3_TAG_ENCODING. If set (to for example "cp1251"), ID3v1 tags and ID3v2 in
8859-1 tags are tried to be interpreted in that encoding first.
This is only a little more code, makes it possible to read tags and involves
work to get the broken files going.
Comment 12 Pavel Ivanov 2005-01-25 20:48:02 UTC
2 Benjamin,

This problem is really important for those users with *broken* files (I think
any user have some files like that). 

Users think: "If my files work in Windows why they don't work in Linux??". And
we can't describe them that it is their falce and they must change id3tags in
mp3 files. Some of them even don't know what id3tag is. 

Such "Look&Feel" problems casts shadow on whole Linux.

Part 2.

I'm as a packanger think that it is OK to use enca. But if you don't think so...
OK. 

Even if we will use env var the methods will be the same (we should add recoding
to the same functions). But our second patch (for id3v1) don't work. And we need
some help with it. You can forget about enca and just look if we have added a
recoding in necessary place?
Comment 13 Tim-Philipp Müller 2005-01-25 21:38:08 UTC
I'll have something based on environment variables ready for you to test by 
tomorrow. 
 
Can you provide one or two files that have ID3v2 tags and exhibit this 
problem? (The file above only has an ID3v1 tag as far as I can see) 
 
Cheers 
 -Tim 
 
Comment 14 Michael Shigorin 2005-01-26 08:21:03 UTC
2 Benjamin: err... "teaching people in a hard way" may work if it isn't really
lot of people and an awful lot of a hard way ahead -- I've exchanged several
emails with grip author and maintainers of freedb.org on a similar topic.

It all started as quite similar patches for grip to handle potentially different
encodings of freedb data, ID3s and filesystem where the filenames get stored. 
The problem was they broke with grip being modified to comply with freedb's
policy.  The latter was fine in theory but in practice there's an awful lot of
"latin1" information there which is really in e.g. windows-1251 charset.  In
fact, any cyrillic sample I could find was exactly that, filed in by some
non-compliant Windows software.

So please postpone your pedagogical talent till the time when it can be fruitful
-- this one isn't.  We have an ugly ignorant legacy problem, and it's not going
to fade away anytime soon by itself.  You're basically saying "hey use LaTeX" to
a secretary which doesn't know what Windows is -- all she knows is "Word". 
Prudent people at Star Division invented another way around this problem. :)

Sorry for the lengthy rant but the issue (and accompanying bugs and
troubleshooting) have annoyed me quite a bit to maintain Packages That Work and
not bother to update them if they break or even look at another e.g. players. 
People tend to follow the advice, too :)

---

Recapping: I can attach current patch in our grip package, and here's link to
another one applied to our xmms package (I happen to maintain both):

http://prdownloads.sourceforge.net/rusxmms/xmms-1.2.10-recode-csa27.4.tar.bz2?download
(project site: http://rusxmms.sourceforge.net)

-- 
WBR, Mike
Comment 15 Christophe Fergeau 2005-01-26 09:34:37 UTC
Really, what is wrong with teaching people about that stuff, ie giving them the
env var, and writing a "smart-tag-fixer" program which will guess the encoding
of the mp3s and fix them accordingly? If we add workarounds everywhere, the spec
will never be enforced, imo there is a point in time where all the legacy cruft
needs to be cleaned up, and now is probably a good time to start with the new
tag libs available (I'm mainly thinking to gstreamer and taglib).
Comment 16 Pavel Ivanov 2005-01-26 10:14:26 UTC
2 Tim-Philipp,

Here is a file with id3v2 in cp1251 encoding:
http://people.panama.by/pi/16-7b_-_poslednii_geroi_128_lame_cbr.mp3
Comment 17 Tim-Philipp Müller 2005-01-26 12:47:56 UTC
Try again with current CVS.

Depending on what your locale encoding is, you might need to set the appropriate
environment variable, e.g. like this (if you use bash):

  % export GST_ID3_TAG_ENCODING=CP1251
  % rhythmbox

or

  % GST_ID3_TAG_ENCODING=CP1251 rhythmbox


With the two test files provided I get (let's see how bugzilla handles unicode):

% GST_ID3_TAG_ENCODING=CP1251 gst-launch-0.8 -t filesrc location=f0000873.mp3 !
decodebin ! audio/x-raw-int ! fakesink
Running pipeline ...
Found tag: found by element »id3demux0«
          Title: Баллада о борьбе
         Artist: Высоцкий Владимир
          Album: http://mp3.tula.ru
        Comment: Средь оплывших свечей и вечерн

 % GST_ID3_TAG_ENCODING=CP1251 gst-launch-0.8 -t filesrc
location=16-7b_-_poslednii_geroi_128_lame_cbr.mp3 ! decodebin ! audio/x-raw-int
! fakesink
Running pipeline ...
Found tag: found by element »id3demux0«
          Title: 7Б - Последний герой
         Artist: Сборники Рок-музыки
          Album: Последний герой - Песни для г
          Genre: Other
           Date: 730851
Cheers 
 -Tim
Comment 18 Benjamin Otte (Company) 2005-01-26 13:20:05 UTC
Let me clarify my above statement a bit.
I'm all for adding workarounds if the workarounds make more files work.
I'm against workarounds if they
- break correct applications
- work in an unexpected way.

Guessing the tag encoding based on locale setting does both of this.
1) If an application correctly encodes "Die Ärzte" as ISO-8859-1 and you play it
back in Russia, you definitely want to get "Die Ärzte", no matter what locale
you chose. None of the patches I've seen so far can ensure this.
2) If I were an ordinary user I would never see the connection between changing
the language of my desktop and Rhythmbox failing to import my files.

If anyone can think of a patch that never fails for files encoded to spec, we'll
use it instantly, but as long as we need to break the spec, don't do it by default.
Comment 19 Tim-Philipp Müller 2005-01-26 14:58:31 UTC
*** Bug 160816 has been marked as a duplicate of this bug. ***
Comment 20 Christian Fredrik Kalager Schaller 2005-01-28 10:40:44 UTC
Fixed by Tim in CVS. Closing.
Comment 21 Tim-Philipp Müller 2005-11-15 18:59:38 UTC
*** Bug 321383 has been marked as a duplicate of this bug. ***