After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 325428 - Encoding
Encoding
Status: RESOLVED FIXED
Product: banshee
Classification: Other
Component: Metadata
unspecified
Other Linux
: Normal normal
: 2.x
Assigned To: Banshee Maintainers
Banshee Maintainers
: 326902 327206 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2006-01-01 04:22 UTC by Nadeem Bitar
Modified: 2006-12-15 18:13 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Nadeem Bitar 2006-01-01 04:22:27 UTC
Banshee does not display correctly my music collection tag information while Quod Libet displays everything perfectly. For example banshee display the following (大塿) instead of スーパーマン 
The tag information is in japanese using UTF-8 or EUC-JP.
Comment 1 Koike Kazuhiko 2006-01-04 01:29:44 UTC
It seems banshee assumes the tag information is always in UTF-8.

There are three ways to solve this.

(1) Edit the banshee database
(2) Edit the audio file
(3) Convert strings to UTF-8 when importing file

(1) is possible now. Is it necessary to implement (2) and/or (3)?
Comment 2 Vitaliy Ischenko 2006-01-13 16:47:26 UTC
The same problem: i have all my ID3 Tags in CP1251(Ciryllic)
Maybe there should be an option for choosing ID3 encoding
Comment 3 Vitaliy Ischenko 2006-01-13 21:03:11 UTC
The previous note has been writen in different bug report because it has nothing to do with this bug, sorry
Comment 4 Vitaliy Ischenko 2006-01-13 21:05:27 UTC
I have to files:
1st encoded with sound-juicer (tags in russian (UTF))
2nd encoded with goobox (tags in russian (UTF))

But banshee display tags correctly only for the first file
Comment 5 Koike Kazuhiko 2006-01-14 02:32:13 UTC
People can have many MP3 files and ID3v1 tag doesn't have charset information.
Some of them are encoded in CP1251 and others are encoded in Shift-JIS.

amaroK have the UI in which users can specify charset for ID3v1 tag.
But if Shift-JIS is set as charset for ID3v3 tag, Tags written in other charset
(CP1251, UTF-8, etc.) are displayed as garbage.

Comment 6 Vitaliy Ischenko 2006-01-14 12:03:56 UTC
That's not the problem, if people keep their tags in different charsets that's their problems :) They can only import songs in 2 or 3 steps if it is well known which files are encoded in KOI8-r and which in Shift-JIS: you can change preferences in amarok and import primarily first portion of data and then second...

But this option only works good if files contain ID3v1
if this files contain only ID3v2 (i don't know for shure, but ID3v2 should specify encoding in which it was encoded) then the problem starts because encoders which i use under Windows doesn't specify character encoding...

Maybe there is a point in adding options:
[x] Override default ID3v2 encoding (like in XMMS)
[x] Convert ID3v1 encoding (like in Amarok)

That's would be great
Comment 7 Aaron Bockover 2006-01-14 19:46:14 UTC
*** Bug 326902 has been marked as a duplicate of this bug. ***
Comment 8 Vitaliy Ischenko 2006-01-16 13:44:40 UTC
Answer for the first and second post:

Banshee assumes that ID3v1 informations is UTF8 encoded

As for ID3v2 banshee follows standarts (id3v2.4.0)
if first byte of string is $00 -- latin1
$01 -- UTF16
$02 -- UTF16
$03 -- UTF8
If $00 set for UTF8 encoded string latin1->utf8 conversion will be done and you'll see wrong chars
Try to set this bytes (for each field which contain non-latin symbols) to $03 and reimport this files to banshee

this method helped me with utf8 encoded cyrillic srtings
Comment 9 Vitaliy Ischenko 2006-01-16 13:51:49 UTC
Sorry $02 means UTF-16BE (without BOM) if BOM is pecified $01 should be used
Comment 10 Aaron Bockover 2006-01-17 02:33:26 UTC
*** Bug 327206 has been marked as a duplicate of this bug. ***
Comment 11 Vitaliy Ischenko 2006-01-17 12:23:43 UTC
It'll be a good idea to use GST_TAG_ENCODING for id3v1 tags.
BUT before using this variable try to check: maybe it's allready a valid utf8
(g_utf8_validate(...) GLib function) and only then GST_TAG_ENCODING->utf8
convertion

Maybe "override default id3v2 encoding" can be possible (ONLY for fields wich
claim to be a latin1 field). I know that it is not a standard way, but it is
really necessary.
Some ways of handling such tags:
   (1) suggest to convert all tags to utf8
   (2) don't allow to edit this tags
   (3) allow to read and write this tags.

Some things should be done with iPod and internationalized tags.
I don't know which encoding ipod uses for reading tags, but it can handle
cyrillic names (maybe iTunes does character convertions... I haven't tried to
use libgpod (I don't own an iPod)).
P.S. iRiver use CP1251 for russian tags (it's for shure because I stored all
songs with USB Mass Storage firmaware installed), so an idea about iPod
appeared.
Comment 12 Ruben Vermeersch 2006-04-10 17:48:09 UTC
Moving to the Metadata component.
Comment 13 Diego Escalante Urrelo (not reading bugmail) 2006-10-24 02:39:46 UTC
Rhythmbox seems to like my ID3 tags, don't know why Banshee doesn't.
Banshee should do the same, specially if this worked 2 months ago.
Comment 14 Aaron Bockover 2006-12-07 17:06:39 UTC
Is this any better with 0.11.2+? We switched to taglib-sharp for tag reading, and a bunch of the encoding issues went away.
Comment 15 Vitaliy Ischenko 2006-12-15 17:35:11 UTC
Yes, it's all ok now
I've tested it with id3v2.3 tags and UTF-16 encoded fields
Comment 16 Aaron Bockover 2006-12-15 18:13:36 UTC
Excellent, closing.