GNOME Bugzilla – Bug 350034
file{src,sink} don't convert to filesystem encoding
Last modified: 2018-05-04 08:56:34 UTC
+++ This bug was initially created as a clone of Bug #322268 +++ Moving open issue reported in bug #322268 and closing that bug. #79 from Ernst Persson > When using Swedish the songs on a CD are called Okänd ... (Unknown) > That is a unicode 'ä' I guess and it got messed up when transferred to the > library. > Also track 4 was named differently than all other tracks when I transferred them. > Unfortunately I don't have internet at home right now so I can't give you all the details. #80 from James "Doc" Livingston > Ernst: was the 'ä' that got messed up the one in the file/folder names, or in > the tags? > With respect to track 4, what was it called and what should it have been > called? #81 from Ernst Persson > This is in the tags, but it's reflected in the filenames I think. > Since I still don't have net at home this is only improvised. > It was something like this: > Okänd Sång 1 > Okänd Sång 2 > Okänd Sång 3 > Okänd Sång 4 > Okänd Sång 5 > DnD all songs to library, > And became: > Ok$€nd S@£ng 1 > Ok$€nd S@£ng 2 > Ok$€nd S@£ng 3 > 04 - Ok$€nd S@£ng.mp3 > Ok$€nd S@£ng 5
Ernst, can you confirm if this bug is still present?
Oops, NEEDINFO doesn't work since I'm the reporter ;)
Ernst: Do you happen to know what filesystem you were extracting it onto (e.g. FAT32, Ext3) and what encoding the filesystem was using? If the filesystem wasn't using UTF-8, then I think this might have been caused by the filesink element not doing utf8->fs encoding translation before calling fopen().
After discussion on IRC with the GStreamer guys, this is a filesink (and src) issue so moving to GStreamer.
Created attachment 70657 [details] [review] patch This patch does two things: * checks the URI/location being passes is valid UTF8, and convert from the locale encoding if it isn't * converts the filename from utf8 to the filesystem encoding before passing it to fopen() I don't have any non-utf8 filesystems, so I'm not sure that it would actually work. But I think it should.
Two possibilities: 1) always require UTF-8 as filenames, use GLib g_fopen to make this work on windows (which then does a conversion to UTF-16). Do conversion to the file system encoding before calling g_fopen on Posix. 2) Pass the pathname in the GLib file name encoding. On POSIX this is the actual on-disk encoding which might correspond to the locale settings of the process (or the G_FILENAME_ENCODING environment variable), or not. On Windows this is UTF-8. GLib recommends 2) which requires the Application to figure out the file name encoding. 1) requires the sink to figure out the encoding, which is probably even harder. This patch does 1) and requires the locale of the app to be set to the correct encoding of the target file system.
(2) would be fine, except that I think it would make different elements handle file:// URIs differently. AIUI gnomevfssrc expects them to be in UTF-8.
Ping: any update on including this patch in gstreamer?
Well, the issue still hasn't really been sorted out. I think the point I made in my last comment, that applications shouldn't have to know the internals of the sink element to set the location correct, is fairly important. Especially if an app is using gst_element_make_from_uri() to create it. If we want to do the above, mandating that the filenames/URIs be in UTF-8 seems the logical thing to do. However this means that GstFileSink needs to (on POSIX systems) figure out what filesystem the filename is on, and find out what encoding it uses for file names. I imagine that gnomevfs has some code we could steal to do this.
I applied the patch above and it didn't not work on Windows. On Windows GLib uses UTF-8 for filenames, so g_filename_from_utf8() returns a UTF-8 string. Passing UTF-8 strings to fopen and open won't work. To fix this problem, g_locale_from_utf8 must be used instead of g_filename_from_utf8. By the way, why don't you include the patch temporarily until you find the real solution?
*** Bug 521663 has been marked as a duplicate of this bug. ***
Bug #521663 suggested using g_open().
Well, AFAIK g_open() exists exactly for this reason so we should probably use it...
I'm cooking up a patch now.
Created attachment 107059 [details] [review] possible patch This patch turns out to be the same patch as in Comment #5 from James "Doc" Livingston, except for using g_open() and g_fopen(). The reasoning is now: - the application passes a filename in UTF8 or the locale encoding - the filename in internally stored in UTF8 (getting the filename will return a UTF8 string) - when opening the file, we use g_[f]open to make the open work on windows.
Created attachment 107061 [details] [review] improved patch Previous patch was not very smart, we prefer to not mess with encodings too much. Instead we use the standard glib encodings, which is what is returned from g_dir_*. In combination with g_[f]open() this should result in consistent behaviour on Windows too (which uses UTF8 as the filename encoding).
Created attachment 107062 [details] [review] updated patch previous patch contained some unwanted things.
Looks good except the "Windoes" typo IMHO :)
word of caution here regarding using open/close/read/write. You can't pass around the fd accross methods which are located in different dlls under windows. That is : * if you use open in those plugins, you need to use write/read/close * if you use g_open (the open is done within glib dll), you need to use methods in glib (that don't exist) for write/read/close
Here's a log of a chat about this, which expands a bit on Edward's comments, and which might clarify things a bit for those of us not so win32-savvy: <tim> anyone win32-savvy around who could comment on whether http://bugzilla.gnome.org/show_bug.cgi?id=350034#c19 is accurate? (hard to believe, since it would defeat the purpose of g_open, or so it seem to me at least) <tm1> it is true <tm1> the purpose of g_open() is to enable using Unicode filenames without having to resort to UTF-16 (wchar_t), i.e. UTF-8 <tim> so one could only really use the fd returned by g_open() in connection with iochannels then? Or how does one read/write/close them then? <tm1> if one wanted to provide a GLib way around the different C runtime issue, one would also need to wrap close(), read(), write(), fread(), fclose(), popen() etc etc, i.e. all C functions that use file descriptors, including thos that use FILE * <tm1> if the code that calls g_open() uses the same C runtime as GLib, then it can use the returned file descriptor normally <tm1> if it uses another C runtime, then it can't <tim> ah <tm1> that's why gcc and MSVC6 (!) are the preferred compilers for GLib-using code on Windows <tim> is there a way to check that somehow (at runtime)? <tm1> umm, probably, but the person who builds something really should knw what he/she is doing ;) <tim> hmm, yes, but I'd feel more comfortable if it was possible to just error out with a big fat "your build is broken, go fix it" :) <tm1> well, not all apps necessarily use those APIs that cause problms, those work fine even if they use a different runtime <tm1> but yeah, this is a mess.. <tm1> feel free to post my comments to the bug report <tim> thanks <tm1> I guess there is also always room for improvements to the documentation, things like this maybe need to be pointed out more explicitly <tim> got a suggestion on what the best thing to do is if we want to allow people to build things with newer MSVCs too? <tm1> well, one possibility would be to build glib and gtk+ with that MSVC too, then <tm1> but when doing that, please then make sure to use different names for the DLLs to avoid potential mixup <tim> well, random people will build it and ship stuff, I don't think that's something that's going to be easy to enforce <tm1> unfortunately it isn't really easy to build glib or gtk+ with MSVC. there are some "makefile.msc" files for nmake, but they require manual editing. they are maintaned by Hans Breuer <tim> what's the alternative? Do the utf8=>utf16 conversion ourselves and use the utf16 functions directly with #ifdefs? would that work? <tm1> glib has some VS8 project files in build/win32/vs8 <tm1> that is quite possible, sure <tm1> but then those fds from your app can't be passed to glib APIs that take file descriptors, like g_io_channel_win32_new_fd()
Ok, so why don't we just use a) GIOChannel in combination with g_fopen() or b) use gio?
We already have a giosrc. Maybe we just rank giosrc higher on win32?
> We already have a giosrc. Maybe we just rank giosrc higher on win32? That makes the bug go away with playbin2/uridecodebin, but doesn't actually fix it. People will still create pipelines with filesrc and expect it to work.
Is this bug still valid ? There seems to have been quite a few changes in filesrc/filesink since the latest comment, especially in regards to utf8/16 handling on windows.
I don't think anything is needed here anymore at this point, nobody seemed to have run into any problems related to this in the last years either. On Windows we convert UTF8 (from the application) to UTF16 (to pass to _wopen()), elsewhere we just pass the string as passed from the application to open().