GNOME Bugzilla – Bug 325428
Encoding
Last modified: 2006-12-15 18:13:36 UTC
Banshee does not display correctly my music collection tag information while Quod Libet displays everything perfectly. For example banshee display the following (大塿) instead of スーパーマン The tag information is in japanese using UTF-8 or EUC-JP.
It seems banshee assumes the tag information is always in UTF-8. There are three ways to solve this. (1) Edit the banshee database (2) Edit the audio file (3) Convert strings to UTF-8 when importing file (1) is possible now. Is it necessary to implement (2) and/or (3)?
The same problem: i have all my ID3 Tags in CP1251(Ciryllic) Maybe there should be an option for choosing ID3 encoding
The previous note has been writen in different bug report because it has nothing to do with this bug, sorry
I have to files: 1st encoded with sound-juicer (tags in russian (UTF)) 2nd encoded with goobox (tags in russian (UTF)) But banshee display tags correctly only for the first file
People can have many MP3 files and ID3v1 tag doesn't have charset information. Some of them are encoded in CP1251 and others are encoded in Shift-JIS. amaroK have the UI in which users can specify charset for ID3v1 tag. But if Shift-JIS is set as charset for ID3v3 tag, Tags written in other charset (CP1251, UTF-8, etc.) are displayed as garbage.
That's not the problem, if people keep their tags in different charsets that's their problems :) They can only import songs in 2 or 3 steps if it is well known which files are encoded in KOI8-r and which in Shift-JIS: you can change preferences in amarok and import primarily first portion of data and then second... But this option only works good if files contain ID3v1 if this files contain only ID3v2 (i don't know for shure, but ID3v2 should specify encoding in which it was encoded) then the problem starts because encoders which i use under Windows doesn't specify character encoding... Maybe there is a point in adding options: [x] Override default ID3v2 encoding (like in XMMS) [x] Convert ID3v1 encoding (like in Amarok) That's would be great
*** Bug 326902 has been marked as a duplicate of this bug. ***
Answer for the first and second post: Banshee assumes that ID3v1 informations is UTF8 encoded As for ID3v2 banshee follows standarts (id3v2.4.0) if first byte of string is $00 -- latin1 $01 -- UTF16 $02 -- UTF16 $03 -- UTF8 If $00 set for UTF8 encoded string latin1->utf8 conversion will be done and you'll see wrong chars Try to set this bytes (for each field which contain non-latin symbols) to $03 and reimport this files to banshee this method helped me with utf8 encoded cyrillic srtings
Sorry $02 means UTF-16BE (without BOM) if BOM is pecified $01 should be used
*** Bug 327206 has been marked as a duplicate of this bug. ***
It'll be a good idea to use GST_TAG_ENCODING for id3v1 tags. BUT before using this variable try to check: maybe it's allready a valid utf8 (g_utf8_validate(...) GLib function) and only then GST_TAG_ENCODING->utf8 convertion Maybe "override default id3v2 encoding" can be possible (ONLY for fields wich claim to be a latin1 field). I know that it is not a standard way, but it is really necessary. Some ways of handling such tags: (1) suggest to convert all tags to utf8 (2) don't allow to edit this tags (3) allow to read and write this tags. Some things should be done with iPod and internationalized tags. I don't know which encoding ipod uses for reading tags, but it can handle cyrillic names (maybe iTunes does character convertions... I haven't tried to use libgpod (I don't own an iPod)). P.S. iRiver use CP1251 for russian tags (it's for shure because I stored all songs with USB Mass Storage firmaware installed), so an idea about iPod appeared.
Moving to the Metadata component.
Rhythmbox seems to like my ID3 tags, don't know why Banshee doesn't. Banshee should do the same, specially if this worked 2 months ago.
Is this any better with 0.11.2+? We switched to taglib-sharp for tag reading, and a bunch of the encoding issues went away.
Yes, it's all ok now I've tested it with id3v2.3 tags and UTF-16 encoded fields
Excellent, closing.