GNOME Bugzilla – Bug 341774
Fails to read tags in file
Last modified: 2006-05-15 14:31:19 UTC
Please describe the problem: This bug was reported to the Debian BTS. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=361310 The original report was: "Rhythmbox crashes when importing certain mp3 files. I have identified which mp3 files cause Rhythmbox to crash. They are playable in VLC. When i launch VLC from gnome-terminal to play these files i see the following error message: (.:12489): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text() Viewing their meta-info in VLC shows unrecognized characters. A screenshot is available at: http://i25.photobucket.com/albums/c79/stoicjed/screenshots/vlc_meta.png I am able to import these files using Rhythmbox 0.9.1 on Ubuntu Breezy without any problems, with id3 tag information displaying normally. I am able to import these files in Rhythmbox 0.9.3.1-1 on Debian Sid only after deleting the id3 tags, otherwise Rhythmbox will crash." Rhythmbox does no longer crash with this file, but fails to read metadata for anything but the genre. When I try these with Gstreamer 0.8 I get warnings about not valid utf-8, suggesting the tags might be faulty, but the submitter claims earlier versions of Rhythmbox could read the tags in these files. This might be a duplicate or at least similar to bug 320188. Steps to reproduce: Actual results: Expected results: Does this happen every time? Other information:
Created attachment 65460 [details] MP3 file with ID3 version 2.3.0 tag
This file does contain bad UTF. It has UTF strings that start with a UTF-16LE BOM marker followed by a UTF-16BE BOM marker, and then contains data that is actually UTF-16BE. I have no idea which tag writer wrote this, but it's busted. Anyway, the patch I'm about to attach and commit reworks the string parsing a little, and adds a workaround that strips all BOM markers, using the innermost (last) one, and then to tries interpreting UTF16 strings in both endiannesses if the indicated one isn't correct. With this patch I get this from the file: Metadata for v2/bug-341774.mp3: album: Such Blinding Stars For Starving Eyes artist: Cursive track number: 4 title: The Dirt of the Vineard date: 1997-01-01 genre: Indie
Created attachment 65482 [details] [review] Fix for broken UTF-16 with multiple BOM markers
as an aside, I can't find any other ID3 reader that manages to extract useful strings from this tag - I was tempted just to call it broken and forget it except that I've seen 1 or 2 other files with similar brokenness.
Committed to CVS: * gst/id3demux/id3v2frames.c: (find_utf16_bom), (parse_insert_string_field), (parse_split_strings): Rework string parsing to always walk over BOM markers in UTF16 strings, using the endianness indicated by the innermost one, then trying the opposite endianness if that fails to convert to valid UTF-8. Fixes #341774