After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 350034 - file{src,sink} don't convert to filesystem encoding
file{src,sink} don't convert to filesystem encoding
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
git master
Other Linux
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
: 521663 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2006-08-05 09:23 UTC by Alex Lancaster
Modified: 2018-05-04 08:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch (2.40 KB, patch)
2006-08-10 15:56 UTC, James "Doc" Livingston
reviewed Details | Review
possible patch (3.14 KB, patch)
2008-03-11 14:28 UTC, Wim Taymans
none Details | Review
improved patch (3.25 KB, patch)
2008-03-11 14:41 UTC, Wim Taymans
none Details | Review
updated patch (2.39 KB, patch)
2008-03-11 14:43 UTC, Wim Taymans
needs-work Details | Review

Description Alex Lancaster 2006-08-05 09:23:12 UTC
+++ This bug was initially created as a clone of Bug #322268 +++

Moving open issue reported in bug #322268 and closing that bug. 

#79 from Ernst Persson

> When using Swedish the songs on a CD are called Okänd ... (Unknown)
> That is a unicode 'ä' I guess and it got messed up when transferred to the
> library.

> Also track 4 was named differently than all other tracks when I transferred them.
> Unfortunately I don't have internet at home right now so I can't give you all
the details.

#80 from James "Doc" Livingston 

> Ernst: was the 'ä' that got messed up the one in the file/folder names, or in
> the tags?

> With respect to track 4, what was it called and what should it have been 
> called?

#81 from Ernst Persson 

> This is in the tags, but it's reflected in the filenames I think.
> Since I still don't have net at home this is only improvised.

> It was something like this:
> Okänd Sång 1
> Okänd Sång 2
> Okänd Sång 3
> Okänd Sång 4
> Okänd Sång 5

> DnD all songs to library,

> And became:
> Ok$€nd S@£ng 1
> Ok$€nd S@£ng 2
> Ok$€nd S@£ng 3
> 04 - Ok$€nd S@£ng.mp3
> Ok$€nd S@£ng 5
Comment 1 Alex Lancaster 2006-08-05 09:24:37 UTC
Ernst, can you confirm if this bug is still present?
Comment 2 Alex Lancaster 2006-08-05 09:31:55 UTC
Oops, NEEDINFO doesn't work since I'm the reporter ;)
Comment 3 James "Doc" Livingston 2006-08-10 07:52:03 UTC
Ernst: Do you happen to know what filesystem you were extracting it onto (e.g. FAT32, Ext3) and what encoding the filesystem was using?

If the filesystem wasn't using UTF-8, then I think this might have been caused by the filesink element not doing utf8->fs encoding translation before calling fopen().
Comment 4 James "Doc" Livingston 2006-08-10 15:53:00 UTC
After discussion on IRC with the GStreamer guys, this is a filesink (and src) issue so moving to GStreamer.
Comment 5 James "Doc" Livingston 2006-08-10 15:56:44 UTC
Created attachment 70657 [details] [review]
patch

This patch does two things:
* checks the URI/location being passes is valid UTF8, and convert from the locale encoding if it isn't
* converts the filename from utf8 to the filesystem encoding before passing it to fopen()


I don't have any non-utf8 filesystems, so I'm not sure that it would actually work. But I think it should.
Comment 6 Wim Taymans 2006-08-29 16:17:42 UTC
Two possibilities:

 1) always require UTF-8 as filenames, use GLib g_fopen to make this work on
    windows (which then does a conversion to UTF-16). Do conversion to the file
    system encoding before calling g_fopen on Posix. 

 2) Pass the pathname in the GLib file name encoding. On POSIX this is the
    actual on-disk encoding which might correspond to the locale settings of the
    process (or the G_FILENAME_ENCODING environment variable), or not.
    On Windows this is UTF-8.

GLib recommends 2) which requires the Application to figure out the file name encoding. 1) requires the sink to figure out the encoding, which is probably even harder.

This patch does 1) and requires the locale of the app to be set to the correct encoding of the target file system.
Comment 7 James "Doc" Livingston 2006-09-24 05:33:10 UTC
(2) would be fine, except that I think it would make different elements handle file:// URIs differently. AIUI gnomevfssrc expects them to be in UTF-8.
Comment 8 Alex Lancaster 2007-02-01 02:44:04 UTC
Ping: any update on including this patch in gstreamer?
Comment 9 James "Doc" Livingston 2007-02-01 09:21:49 UTC
Well, the issue still hasn't really been sorted out.

I think the point I made in my last comment, that applications shouldn't have to know the internals of the sink element to set the location correct, is fairly important. Especially if an app is using gst_element_make_from_uri() to create it.


If we want to do the above, mandating that the filenames/URIs be in UTF-8 seems the logical thing to do. However this means that GstFileSink needs to (on POSIX systems) figure out what filesystem the filename is on, and find out what encoding it uses for file names. I imagine that gnomevfs has some code we could steal to do this.
Comment 10 Kwang Yul Seo 2007-12-03 04:44:09 UTC
I applied the patch above and it didn't not work on Windows.

On Windows GLib uses UTF-8 for filenames, so g_filename_from_utf8() returns a UTF-8 string. Passing UTF-8 strings to fopen and open won't work. To fix this problem, g_locale_from_utf8 must be used instead of g_filename_from_utf8.

By the way, why don't you include the patch temporarily until you find the real solution?
 
Comment 11 Wim Taymans 2008-03-11 13:58:20 UTC
*** Bug 521663 has been marked as a duplicate of this bug. ***
Comment 12 Wim Taymans 2008-03-11 14:00:31 UTC
Bug #521663 suggested using g_open(). 
Comment 13 Sebastian Dröge (slomo) 2008-03-11 14:14:52 UTC
Well, AFAIK g_open() exists exactly for this reason so we should probably use it...
Comment 14 Wim Taymans 2008-03-11 14:18:32 UTC
I'm cooking up a patch now.
Comment 15 Wim Taymans 2008-03-11 14:28:13 UTC
Created attachment 107059 [details] [review]
possible patch

This patch turns out to be the same patch as in Comment #5 from James "Doc" Livingston, except for using g_open() and g_fopen().

The reasoning is now:
 - the application passes a filename in UTF8 or the locale encoding
 - the filename in internally stored in UTF8 (getting the filename will return a UTF8 string)
 - when opening the file, we use g_[f]open to make the open work on windows.
Comment 16 Wim Taymans 2008-03-11 14:41:28 UTC
Created attachment 107061 [details] [review]
improved patch

Previous patch was not very smart, we prefer to not mess with encodings too much. Instead we use the standard glib encodings, which is what is returned from g_dir_*. In combination with g_[f]open() this should result in consistent behaviour on Windows too (which uses UTF8 as the filename encoding).
Comment 17 Wim Taymans 2008-03-11 14:43:38 UTC
Created attachment 107062 [details] [review]
updated patch

previous patch contained some unwanted things.
Comment 18 Sebastian Dröge (slomo) 2008-03-11 15:41:24 UTC
Looks good except the "Windoes" typo IMHO :)
Comment 19 Edward Hervey 2008-03-11 22:40:52 UTC
word of caution here regarding using open/close/read/write.

 You can't pass around the fd accross methods which are located in different dlls under windows.

That is :
* if you use open in those plugins, you need to use write/read/close
* if you use g_open (the open is done within glib dll), you need to use methods in glib (that don't exist) for write/read/close
Comment 20 Tim-Philipp Müller 2008-03-12 11:48:46 UTC
Here's a log of a chat about this, which expands a bit on Edward's comments, and which might clarify things a bit for those of us not so win32-savvy:

<tim> anyone win32-savvy around who could comment on whether http://bugzilla.gnome.org/show_bug.cgi?id=350034#c19 is accurate? (hard to believe, since it would defeat the purpose of g_open, or so it seem to me at least)
<tm1> it is true
<tm1> the purpose of g_open() is to enable using Unicode filenames without having to resort to UTF-16 (wchar_t), i.e. UTF-8
<tim> so one could only really use the fd returned by g_open() in connection with iochannels then? Or how does one read/write/close them then?
<tm1> if one wanted to provide a GLib way around the different C runtime issue, one would also need to wrap close(), read(), write(), fread(), fclose(), popen() etc etc, i.e. all C functions that use file descriptors, including thos that use FILE *
<tm1> if the code that calls g_open() uses the same C runtime as GLib, then it can use the returned file descriptor normally
<tm1> if it uses another C runtime, then it can't
<tim> ah
<tm1> that's why gcc and MSVC6 (!) are the preferred compilers for GLib-using code on Windows
<tim> is there a way to check that somehow (at runtime)?
<tm1> umm, probably, but the person who builds something really should knw what he/she is doing ;)
<tim> hmm, yes, but I'd feel more comfortable if it was possible to just error out with a big fat "your build is broken, go fix it" :)
<tm1> well, not all apps necessarily use those APIs that cause problms, those work fine even if they use a different runtime
<tm1> but yeah, this is a mess..
<tm1> feel free to post my comments to the bug report
<tim> thanks
<tm1> I guess there is also always room for improvements to the documentation, things like this maybe need to be pointed out more explicitly
<tim> got a suggestion on what the best thing to do is if we want to allow people to build things with newer MSVCs too?
<tm1> well, one possibility would be to build glib and gtk+ with that MSVC too, then
<tm1> but when doing that, please then make sure to use different names for the DLLs to avoid potential mixup 
<tim> well, random people will build it and ship stuff, I don't think that's something that's going to be easy to enforce
<tm1> unfortunately it isn't really easy to build glib or gtk+ with MSVC. there are some "makefile.msc" files for nmake, but they require manual editing. they are maintaned by Hans Breuer
<tim> what's the alternative? Do the utf8=>utf16 conversion ourselves and use the utf16 functions directly with #ifdefs? would that work?
<tm1> glib has some VS8 project files in build/win32/vs8
<tm1> that is quite possible, sure
<tm1> but then those fds from your app can't be passed to glib APIs that take file descriptors, like g_io_channel_win32_new_fd()
Comment 21 Sebastian Dröge (slomo) 2011-05-18 20:16:27 UTC
Ok, so why don't we just use a) GIOChannel in combination with g_fopen() or b) use gio?
Comment 22 Stefan Sauer (gstreamer, gtkdoc dev) 2011-07-04 17:18:21 UTC
We already have a giosrc. Maybe we just rank giosrc higher on win32?
Comment 23 Tim-Philipp Müller 2011-07-04 17:27:50 UTC
> We already have a giosrc. Maybe we just rank giosrc higher on win32?

That makes the bug go away with playbin2/uridecodebin, but doesn't actually fix it.

People will still create pipelines with filesrc and expect it to work.
Comment 24 Edward Hervey 2013-08-14 06:25:53 UTC
Is this bug still valid ? There seems to have been quite a few changes in filesrc/filesink since the latest comment, especially in regards to utf8/16 handling on windows.
Comment 25 Sebastian Dröge (slomo) 2018-05-04 08:56:34 UTC
I don't think anything is needed here anymore at this point, nobody seemed to have run into any problems related to this in the last years either.

On Windows we convert UTF8 (from the application) to UTF16 (to pass to _wopen()), elsewhere we just pass the string as passed from the application to open().