GNOME Bugzilla – Bug 741144
id3demux: support UTF-16 -> UTF-8 conversion on systems with crippled iconv
Last modified: 2015-05-25 21:42:25 UTC
Created attachment 292169 [details] [review]
patch to fix the issue that id3 tags utf16 charaters cannot be extreacted
in id3demux when I tried to get the id3v2 tag such as TIT2, TALB etc. it will return extract failed.
Checked in id3v2frame.c, When parse the UTF-16 streams, it used g_convert()to convert the buffer from UTF-16 to UTF-8, however it will return err that this conversion is not supported which cause the extraction failed with these UTF-16 characters.
In the patch, use g_utf16_to_utf8() instead of g_convert, which can convert the character format successfully.
This issue was found on my gst-plugins-base-1.2.3 environment, however i see it is the same at the latest code.
Created attachment 292172 [details]
The mp3 stream with UTF-16 TAG to reproduce this issue
This component should be gst-plugins-good
The actual tag parsing is done in libgsttag in -base, so it's most likely in -base.
(In reply to comment #3)
> The actual tag parsing is done in libgsttag in -base, so it's most likely in
yes, the conversion is realized in base class. it's in plugins-base.
Could you explain what you're doing, how and where things go wrong and what the expected result is?
This works perfectly fine for me, in 1.2, 1.4 and git master:
$ gst-launch-1.0 uridecodebin uri=file:///home/tpm/samples/misc/741144-id3-utf16-tag-extraction.mp3 ! fakesink -t
$ gst-discoverer-1.0 ~/samples/misc/741144-id3-utf16-tag-extraction.mp3
Created attachment 292868 [details]
the log when fetch the tag
Please see the attached log file.
modified field = g_convert (data, data_size, "UTF-8", in_encode, NULL, NULL, NULL);
add error return message.
field = g_convert (data, data_size, "UTF-8", in_encode, NULL, NULL, &error);
g_print("=== error message: %s === \n", error->message);
Yes, the original code works fine on x86 platform
However, I'm debugging on my board which is arm based platform.
As from the debugging log message, the g_convert() always return: Conversion from character set 'UTF-16LE' to 'UTF-8' is not supported when converting from UTF-16 to UTF-8 or UTF-8 TO UTF-16.
But with previous patch, by change the g_convert to g_utf16_to_utf8(), it works when converting UTF-16 to UTF-8
The problem is probably that g_convert() uses iconv, and it's not guaranteed which charsets are supported by iconv. If it's using the system version, it depends on the configuration of libc and which libc variant it is... if it's using a separately built iconv, it depends on whether that one is stripped down or not.
g_utf16_to_utf8() is not using iconv but does the conversion directly.
Thanks for the explanation.
I was suspecting the dependency problem also.
So would it be better if we just use g_utf16_to_utf8() directly instead of g_conv()?
BTW, I see that in the ISO8859 branch, the g_convert() is commented also and use string_utf8_dup(), in which it may get charset or use g_locale_to_utf8().
Ah ok. I think this would be an acceptable change in principle, but not like the patch proposed.
- please supply a patch in git format-patch format
- please run gst-indent on the file
- I don't think we can modify the data in place, even if it's missing a const
- the patch assumes host endianness = little endian, we need a patch that works on all systems
Thanks, I will propose a patch as your comments.
Author: Tim-Philipp Müller <email@example.com>
Date: Mon May 25 22:37:56 2015 +0100
tag: id3v2: fix parsing of UTF-16 text on systems with crippled iconv
Use g_utf16_to_utf8() instead of the more generic g_convert(), so
that we can extract text in UTF-16 format even on embedded systems
with crippled iconv support.
This code path is exercised by the id3demux test_unsync_v23
check in gst-plugins-good.