After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 741144 - id3demux: support UTF-16 -> UTF-8 conversion on systems with crippled iconv
id3demux: support UTF-16 -> UTF-8 conversion on systems with crippled iconv
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: Normal enhancement
: 1.5.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-12-05 07:52 UTC by Lyon
Modified: 2015-05-25 21:42 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch to fix the issue that id3 tags utf16 charaters cannot be extreacted (1.26 KB, patch)
2014-12-05 07:52 UTC, Lyon
needs-work Details | Review
The mp3 stream with UTF-16 TAG to reproduce this issue (476.31 KB, audio/mpeg)
2014-12-05 09:16 UTC, Lyon
  Details
the log when fetch the tag (6.24 KB, application/octet-stream)
2014-12-17 06:16 UTC, Lyon
  Details

Description Lyon 2014-12-05 07:52:43 UTC
Created attachment 292169 [details] [review]
patch to fix the issue that id3 tags utf16 charaters cannot be extreacted 

in id3demux when I tried to get the id3v2 tag such as TIT2, TALB etc. it will return extract failed.

Checked in id3v2frame.c,  When parse the UTF-16 streams, it used g_convert()to convert the buffer from UTF-16 to UTF-8, however it will return err that this conversion is not supported which cause the extraction failed with these UTF-16 characters.

In the patch, use g_utf16_to_utf8() instead of g_convert, which can convert the character format successfully.

This issue was found on my gst-plugins-base-1.2.3 environment, however i see it is the same at the latest code.
Comment 1 Lyon 2014-12-05 09:16:55 UTC
Created attachment 292172 [details]
The mp3 stream with UTF-16 TAG to reproduce this issue
Comment 2 Lyon 2014-12-15 09:33:21 UTC
This component should be gst-plugins-good
Comment 3 Tim-Philipp Müller 2014-12-15 11:39:47 UTC
The actual tag parsing is done in libgsttag in -base, so it's most likely in -base.
Comment 4 Lyon 2014-12-16 02:41:58 UTC
(In reply to comment #3)
> The actual tag parsing is done in libgsttag in -base, so it's most likely in
> -base.

yes, the conversion is realized in base class. it's in plugins-base.
Comment 5 Tim-Philipp Müller 2014-12-17 01:15:50 UTC
Could you explain what you're doing, how and where things go wrong and what the expected result is?

This works perfectly fine for me, in 1.2, 1.4 and git master:

$ gst-launch-1.0 uridecodebin uri=file:///home/tpm/samples/misc/741144-id3-utf16-tag-extraction.mp3 ! fakesink -t

$ gst-discoverer-1.0 ~/samples/misc/741144-id3-utf16-tag-extraction.mp3
Comment 6 Lyon 2014-12-17 06:16:39 UTC
Created attachment 292868 [details]
the log when fetch the tag

Please see the attached log file.
export GST_DEBUG=id3demux:6,id3v2:6

modified field = g_convert (data, data_size, "UTF-8", in_encode, NULL, NULL, NULL);
add error return message.
       field = g_convert (data, data_size, "UTF-8", in_encode, NULL, NULL, &error);
       g_print("=== error message: %s === \n", error->message);
Comment 7 Lyon 2014-12-17 06:22:17 UTC
Yes, the original code works fine on x86 platform
However, I'm debugging on my board which is arm based platform.

As from the debugging log message, the g_convert() always return: Conversion from character set 'UTF-16LE' to 'UTF-8' is not supported  when converting from UTF-16 to UTF-8 or UTF-8 TO UTF-16.
But with previous patch, by change the g_convert to g_utf16_to_utf8(), it works when converting UTF-16 to UTF-8
Comment 8 Sebastian Dröge (slomo) 2014-12-17 08:56:52 UTC
The problem is probably that g_convert() uses iconv, and it's not guaranteed which charsets are supported by iconv. If it's using the system version, it depends on the configuration of libc and which libc variant it is... if it's using a separately built iconv, it depends on whether that one is stripped down or not.

g_utf16_to_utf8() is not using iconv but does the conversion directly.
Comment 9 Lyon 2014-12-17 09:10:30 UTC
Thanks for the explanation.
I was suspecting the dependency problem also.

So would it be better if we just use g_utf16_to_utf8() directly instead of g_conv()?

BTW, I see that in the ISO8859 branch, the g_convert() is commented also and use string_utf8_dup(), in which it may get charset or use g_locale_to_utf8().
Comment 10 Tim-Philipp Müller 2014-12-17 11:39:28 UTC
Ah ok. I think this would be an acceptable change in principle, but not like the patch proposed.

 - please supply a patch in git format-patch format

 - please run gst-indent on the file

 - I don't think we can modify the data in place, even if it's missing a const

 - the patch assumes host endianness = little endian, we need a patch that works on all systems
Comment 11 Lyon 2014-12-18 02:22:43 UTC
Thanks, I will propose a patch as your comments.
Comment 12 Tim-Philipp Müller 2015-05-25 21:42:25 UTC
commit ea5d912b9fed5a596d11dbd5774623d1d839437c
Author: Tim-Philipp Müller <tim@centricular.com>
Date:   Mon May 25 22:37:56 2015 +0100

    tag: id3v2: fix parsing of UTF-16 text on systems with crippled iconv
    
    Use g_utf16_to_utf8() instead of the more generic g_convert(), so
    that we can extract text in UTF-16 format even on embedded systems
    with crippled iconv support.
    
    This code path is exercised by the id3demux test_unsync_v23
    check in gst-plugins-good.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=741144