GNOME Bugzilla – Bug 326903
UTF-8 characters in ID3v1 tags
Last modified: 2006-02-07 12:51:48 UTC
There is no possibility to select character encoding for tags while ripping I have all my mp3's with ID3 tags in CP1251 encoding (as you can guess almost all russian mp3's on the net are in the same encoding) So i would like to keep my collection homogeneous.
Don't ID3 tags specify an encoding? I'm pretty much against using anything apart from UTF-8. Note that SJ doesn't have any say what encoding is used, this is a GStreamer issue.
I don't know whether it specifies or not, but i haven't seen any windows player which can show tags in something different from CP12** =) And I haven't seen any player which takes in account encoding information if it's placed in file... Maybe it's a bug of player :) but the only player which correctly displays tags in UTF and CP1251 simultaneously, was beep-media-player (only if I specify a fallback encoding for ID3 tags in mpeg layer 3 plugin properties. It uses this encoding if ID3 tag is not a valid UTF string)
hmmm, how i can chamge the encoding which GST use GST_TAG_ENCODING=CP1251 doesn't helped me...
ID3v2 tags contain information about which encoding is used, and for us it would be UTF-8. Which means that if applications use ID3v1 tags instead of the ID3v2, then they need bugs filed against them because both of those should be available to applications with the way GStreamer/lame plugin currently works.
You mean that ID3v1 & ID3v2 should be present for each file. And ID3v1 should have the same encoding as ID3v2?
As i have mentioned not all rippers (for Windows) insert information about encoding of id3v2 tags, and that's bad :) Is there a library which contain a function for specifying character encoding?
I have found (www.id3.org) that ID3v1 should only contain LATIN1 characters and ID3v2 -- UTF-8,Latin1,UFT-16(LE/BE) But Sound-juicer (or GST) puts UTF-8 strings in ID3v1
Damn that LAME encoder, it's so pants! Re-assigning to GStreamer.
Vitaliy, where did you find this information? id3.org only deals with ID3v2. From what I remember, ID3v1 doesn't have any encoding information, so what you should put in there is anyone's guess. Putting UTF-8 means that it's easier for modern applications to use.
Have you tried using id3mux instead of lame for writing ID3 tags? I think sound-juicer should only be writing ID3v2 tags really. ID3v1 creates the above-mentioned internationalisation issues, which are simply unsolvable. While GStreamer can be made to read ID3v1 tags that have a non-latin1 encoding (by setting a special environment variable GST_ID3_TAG_ENCODING), it is very very unlikely to ever support writing ID3v1 tags in any encoding other than latin1 or UTF8. Rippers that don't specify the correct tag encoding when writing ID3v2 tags are simply completely broken. I've seen (very few) tags like that, and I'm not sure there's much we can do about them. It is close to impossible to guess the right character encoding from a series of bytes in a locale-independent manner (and even then it is very very hard).
SJ currently uses a LAME element when writing MP3s, and just iterates down the pipeline looking for a taggable element. So why can't the LAME encoder write ID3v2 tags? I'll happily change the MP3 pipeline creation documentation to use id3mux if that is a better solution, but that won't help everyone who has already created one. Do you have an example pipeline I can play with?
Maybe it was taglip api documentation... But it's reasonable to store ONLY 8bit encoded strings in ID3v1, so that old applications on other OS'es (like win95/98/Me) can read this strings (winamp still doesn't support unicode :) and that is why I asked to place an option to select encoding for ID3v1 tag it will be better to write id3v2 tags in UTF-8 with first byte set to $03 (from id3v2.4.0). New players understand this bit (if $00 was written instead conversion latin->utf-8 has to be done, and instead of readable chars we'll see something unreadable :)
OK, so ID3v2 is the way to go. Unfortunately you wouldn't get to id3mux when iterating down the pipeline if lame also supports tag writing; so really this is a dup of 329184. (The 0.8.x stuff won't be fixed.) *** This bug has been marked as a duplicate of 329184 ***
I believe SJ in this case would as it iterates over all elements and doesn't stop on the first taggable element it finds.