After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 341774 - Fails to read tags in file
Fails to read tags in file
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
0.10.3
Other All
: Normal normal
: 0.10.4
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2006-05-14 20:58 UTC by Sven Arvidsson
Modified: 2006-05-15 14:31 UTC
See Also:
GNOME target: ---
GNOME version: 2.13/2.14


Attachments
MP3 file with ID3 version 2.3.0 tag (400.00 KB, audio/x-mpeg)
2006-05-14 20:59 UTC, Sven Arvidsson
  Details
Fix for broken UTF-16 with multiple BOM markers (5.71 KB, patch)
2006-05-15 09:51 UTC, Jan Schmidt
committed Details | Review

Description Sven Arvidsson 2006-05-14 20:58:18 UTC
Please describe the problem:
This bug was reported to the Debian BTS.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=361310

The original report was:
"Rhythmbox crashes when importing certain mp3 files.

I have identified which mp3 files cause Rhythmbox to crash. They are playable in
VLC. When i launch VLC from gnome-terminal to play 
these files i see the following error message:

(.:12489): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()

Viewing their meta-info in VLC shows unrecognized characters. A screenshot is
available at:
http://i25.photobucket.com/albums/c79/stoicjed/screenshots/vlc_meta.png

I am able to import these files using Rhythmbox 0.9.1 on Ubuntu Breezy without
any problems, with id3 tag information displaying 
normally. I am able to import these files in Rhythmbox 0.9.3.1-1 on Debian Sid
only after deleting the id3 tags, otherwise Rhythmbox 
will crash."

Rhythmbox does no longer crash with this file, but fails to read metadata for
anything but the genre.

When I try these with Gstreamer 0.8 I get warnings about not valid utf-8,
suggesting the tags might be faulty, but the submitter claims earlier versions
of Rhythmbox could read the tags in these files.

This might be a duplicate or at least similar to bug 320188.


Steps to reproduce:


Actual results:


Expected results:


Does this happen every time?


Other information:
Comment 1 Sven Arvidsson 2006-05-14 20:59:18 UTC
Created attachment 65460 [details]
MP3 file with ID3 version 2.3.0 tag
Comment 2 Jan Schmidt 2006-05-15 09:50:38 UTC
This file does contain bad UTF. It has UTF strings that start with a UTF-16LE BOM marker followed by a UTF-16BE BOM marker, and then contains data that is actually UTF-16BE. 

I have no idea which tag writer wrote this, but it's busted.

Anyway, the patch I'm about to attach and commit reworks the string parsing a little, and adds a workaround that strips all BOM markers, using the innermost (last) one, and then to tries interpreting UTF16 strings in both endiannesses if the indicated one isn't correct.

With this patch I get this from the file:

Metadata for v2/bug-341774.mp3:
            album: Such Blinding Stars For Starving Eyes
           artist: Cursive
     track number: 4
            title: The Dirt of the Vineard
             date: 1997-01-01
            genre: Indie
Comment 3 Jan Schmidt 2006-05-15 09:51:34 UTC
Created attachment 65482 [details] [review]
Fix for broken UTF-16 with multiple BOM markers
Comment 4 Jan Schmidt 2006-05-15 09:58:50 UTC
as an aside, I can't find any other ID3 reader that manages to extract useful strings from this tag - I was tempted just to call it broken and forget it except that I've seen 1 or 2 other files with similar brokenness.
Comment 5 Jan Schmidt 2006-05-15 14:31:19 UTC
Committed to CVS:

        * gst/id3demux/id3v2frames.c: (find_utf16_bom),
        (parse_insert_string_field), (parse_split_strings):
        Rework string parsing to always walk over BOM markers in UTF16
        strings, using the endianness indicated by the innermost one,
        then trying the opposite endianness if that fails to convert
        to valid UTF-8. Fixes #341774