GNOME Bugzilla – Bug 508291
[id3demux] must check if language code in id3v2 COMM frames is UTF-8
Last modified: 2008-01-09 15:59:09 UTC
The following warning is displayed with attached clip: $ gst-launch playbin uri=file:///tmp/kpa.mp3 Setting pipeline to PAUSED ... (gst-launch-0.10:1341): GStreamer-WARNING **: Trying to set string field 'extended-comment' on structure, but string is not valid UTF-8. Please file a bug. Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstAudioSinkClock The warning is caused by strange language code field in id3v2 comment tag of the file, so maybe id3demux should check this field too for non-utf8 strings. Also, the warning comes from gst_structure_id_set_value() where non-utf8 strings are skipped, but only if gst-debug is enabled. To avoid confusion I think enabling gst-debug should only add the debug prints and not change the behaviour of the code.
Created attachment 102459 [details] kpa.mp3
Created attachment 102460 [details] [review] id3v2frames.c.lang_utf8_patch
Created attachment 102461 [details] [review] gststructure.c.utf8_patch
Nice catch! Indeed, we should check that. I've solved it a bit differently though: 2008-01-09 Tim-Philipp Müller <tim at centricular dot net> Based on patch by: Tommi Myöhänen <ext-tommi.myohanen nokia com> * gst/id3demux/id3v2frames.c: (parse_comment_frame): Make sure the ISO 639-X language code in ID3v2 COMM frames is actually valid UTF-8 (or rather: ASCII), so we don't end up with non-UTF8 strings in tags if there's garbage in the language field. Also make sure the language code is always lower case. Fixes: #508291. > Also, the warning comes from gst_structure_id_set_value() where non-utf8 > strings are skipped, but only if gst-debug is enabled. To avoid confusion I > think enabling gst-debug should only add the debug prints and not change the > behaviour of the code. I agree it shouldn't change behaviour of the code, but allowing non-UTF8 strings to end up in structures/taglists is just a very bad idea, since it can easily lead to random crashes or invalid memory accesses when apps pass those strings to g_markup_escape() or g_markup_printf_escaped(), like they would if they wanted to construct fancy pango markup. Therefore, I've changed gststructure.c to always check strings for UTF-8 conformance and to always return if the string is non-UTF-8. The performance impact should be negligible, and it might reveal bugs in proprietary plugins that are only tested in environments where gst-debug is disabled: 2008-01-09 Tim-Philipp Müller <tim at centricular dot net> * gst/gststructure.c: (gst_structure_id_set_value): Always check UTF-8 conformance of structure strings and not only if the debugging system is enabled; reasoning: the behaviour of the actual code shouldn't really change depending on whether the debugging system is enabled or not (#508291).
Thanks, your modifications look good.