After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 342364 - [id3demux] Chinese id3 tags sometimes read incorrectly
[id3demux] Chinese id3 tags sometimes read incorrectly
Status: RESOLVED NOTABUG
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
0.10.3
Other All
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2006-05-19 20:38 UTC by Alissa
Modified: 2006-05-24 10:46 UTC
See Also:
GNOME target: ---
GNOME version: 2.13/2.14


Attachments
First 512kb of the problematic file given in the example above (512.00 KB, application/octet-stream)
2006-05-20 04:14 UTC, Alissa
Details
Last 512kb of the problematic file given in the example above (512.00 KB, application/octet-stream)
2006-05-20 04:16 UTC, Alissa
Details

Description Alissa 2006-05-19 20:38:44 UTC
Please describe the problem:
Some Unicode id3 tags with Chinese characters can be read by other id3 programs
but not by GStreamer apps.

For example, a file by the artist Faye Wong (王菲) shows up as gibberish çè²
in GStreamer.  See output below more info:

$ id3v2 -l "王菲 - Live - 01.mp3" id3v1 tag info for 王菲 - Live - 01.mp3:
Title  :                                 Artist: 王菲
Album  : Live                            Year:     , Genre: Pop (13)
Comment: 00000C79 00000B32 0000F780 0    Track: 1
id3v2 tag info for 王菲 - Live - 01.mp3:
TALB (Album/Movie/Show title): Live
TRCK (Track number/Position in set): 01
TCON (Content type): Pop (13)
COMM (Comments): ()[]: 00000C79 00000B32 0000F780 0
TLEN (Length): 316000
TPE1 (Lead performer(s)/Soloist(s)): 王菲


$ gst-launch-0.10 filesrc location="王菲 - Live - 01.mp3" ! id3demux ! fakesink -t
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
FOUND TAG      : found by element "id3demux0".
         artist: çè²
          album: Live
        comment: 00000C79 00000B32 0000F780 0
   track number: 1
          genre: Pop
       duration: 316000000000
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 182947000 ns.
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
FREEING pipeline ...

Steps to reproduce:
Problem does not seem dependent on Chinese characters (other Faye Wong tags in
Chinese work fine).  So not sure what triggers the proble.  But on the
problematic tags, the problem is seen everytime in a GStreamer app.

Actual results:


Expected results:


Does this happen every time?
Yes

Other information:
If you want the actual mp3 file, email me at monochromatic_rainbow@yahoo.com
Comment 1 Alissa 2006-05-20 04:14:43 UTC
Created attachment 65875 [details]
First 512kb of the problematic file given in the example above

Includes the id3v2 tag
Comment 2 Alissa 2006-05-20 04:16:34 UTC
Created attachment 65876 [details]
Last 512kb of the problematic file given in the example above

Includes the id3v1 tag
Comment 3 Tim-Philipp Müller 2006-05-20 10:25:41 UTC
Which other applications read this ID3v2 tag at the start correctly?

The artist frame in the tag is simply broken/incorrect. It claims to be of encoding type 0 (=ISO-8859-1) while it really is encoding 3 (UTF-8).

As far as I can tell there is no way to know that this is not correct, because ISO-8859-1 covers the entire range from 0x00-0xff so we can't even say "ooh, it doesn't look like valid ISO-8859-1, let's check whether it's UTF-8".
Comment 4 Tim-Philipp Müller 2006-05-20 10:28:31 UTC
Sorry, I missed the reference to the 'id3v2' tool above. Indeed, that tool displays the tag fine here as well (so it doesn't take into account locales), will need to see how it does that.
Comment 5 Alissa 2006-05-20 12:32:25 UTC
Yeah, several command line id3 tools read the tag as Unicode (id3v2 being one).  Playing around with this has also made me think that Gstreamer apps (like Rhythmbox and Nautilis) are using the id3v1 tag for display.  And according to some notes I was reading in Easytag (the app I use to edit tags), it says that id3v1 tags are always saved as single-byte.  So doesn't that mean basically the id3v1 tags will never fully support Chinese (which requires double-byte)?  Thus, the problem is not so much about reading it wrong (I guess Gstreamer reads id3v1 as single-byte as it technically "should") but rather that it would be better to be using the id3v2 tag for display?  At least, it seems that it should first look for id3v2 and if it doesn't exist then resort to the id3v1?  Feel free to correct me if I have muddled up everything here...I'm just a user, not a developer.
Comment 6 Tim-Philipp Müller 2006-05-20 14:29:19 UTC
> And according to some notes I was reading in Easytag (the app I use to
> edit tags), it says that id3v1 tags are always saved as single-byte.
> So doesn't that mean basically the id3v1 tags will never fully
> support Chinese (which requires double-byte)? 

That is correct. ID3v1 was only really meant to hold Western European strings (ISO-8859-1/ASCII) and nothing else. How apps/readers/writers deal with that problem differs. It's pretty much a mess and pretty much unsolvable (GStreamer falls back on the encoding specified in GST_ID3_TAG_ENCODING for ID3v1 tags if it's not valid UTF-8, this is a hack though because there are so many ID3v1 tags with other charsets out there)

In short: just don't use ID3v1 tags.



> Yeah, several command line id3 tools read the tag as Unicode (id3v2 being 
> one). Playing around with this has also made me think that Gstreamer apps 
> (like Rhythmbox and Nautilis) are using the id3v1 tag for display.  

Gstreamer shouldn't prefer the ID3v1 tag, at least not by default. By default the 'id3demux' element should use the tags from the ID3v2 tag if it finds both an ID3v2 tag and an ID3v1 tag. If it doesn't do that, that's a bug :)

However, GStreamer in fact does read the ID3v2 "wrongly" as well (where "wrongly = correctly according to spec"), you can see that from the debug log if you use gst-launch like this:

 $ GST_DEBUG=id3demux:5 ....

(also, I'm only working with the beginning of the file, which doesn't have the ID3v1 tag).

Needs more looking into ...

Comment 7 Jan Schmidt 2006-05-24 10:07:02 UTC
No, it's definitely taking the ID3v2 tag as it should, and that tag is improperly put together.

It seems to me the only reason the id3v2 program seems to get this tag right is that it seems to ignore the indicated text encoding in the field entirely as far as I can tell.

Also, if this tag was written by Easytag, it was a broken version, because version  1.99.11 can't read the field correctly either.

In short, I don't think there's much we can do to 'handle' this file - it's just broken.
Comment 8 Alissa 2006-05-24 10:46:20 UTC
Yeah, I have started to realize that the source of my problems is Easytag.  It seems that it doesn't set the encoding type properly.  But I would note that I used v1.99.11 to make the tag.  Anyways, I guess that just means 1.99.11 has a bug.

In the meantime, I figured out I can save Unicode tags in Easytag with ISO-8859-1, then use id3iconv (dl off the web) to convert it properly to Unicode.  Rhythmbox and Nautilus could read those converted tags fine.  So, clearly, it's not a gstreamer bug.