GNOME Bugzilla – Bug 795006
g_win32_locale_filename_from_utf8() failed to convert a local path
Last modified: 2018-05-24 20:20:18 UTC
In GIMP, bug 794949, we had a case on Windows with an image with a path such as "F:\都.png". We needed to load metadata with GExiv2, which unfortunately doesn't have support for GFile or GInputStream/GOutputStream yet (cf. bug 732748). So we passed the result of g_file_get_path() in g_win32_locale_filename_from_utf8(). Unfortunately it failed and returned NULL (for the record, g_file_get_path() properly returned a valid UTF-8 value as far as we could see). So I read the function docs which says it may fail when the string contains unicode characters not representable in the system codepage. Yet since this is the path of an actual file currently existing in the filesystem, I assume it should be convertible in the system codepage, so that would be a bug. Or am I misunderstanding something?
NTFS uses UTF-16 internally, and Windows API and kernel seem to generally use UTF-16 as well. These days the local/system codepage encodings are a sham for compatibility (highly imperfect, as you now see) with older, non-Unicode APIs. Why MS never implemented proper UTF-8 support in form of a full-fledged UTF-8 multi-byte codepage - that is a mystery for future historians. Note that Exiv2 does support wchar strings (see EXV_UNICODE_PATH in FileIo constructr). Maybe GExiv2 should make use of these? I'd chastise Exiv2 for not getting on the UTF-8 API bandwagon along with the Gnome stack, but that would probably be a major (and ABI-wrecking) undertaking for them, all things considered. Meanwhile they do what they can to minimize the damage, and exposing wchar-string version of the API (especially in C++ where overloading is a thing) is a viable, quick fix on their part. So...how do we close this? RESOLVED-NOTGNOME? RESOLVED-WONTFIX? RESOLVED-NOTABUG?
> NTFS uses UTF-16 internally, and Windows API and kernel seem to generally use UTF-16 as well. So g_win32_locale_filename_from_utf8() should have converted this string from UTF-8 to UTF-16, shouldn't it? If it failed to do so, that's a bug, right? I don't understand why you want to close. Now I don't say that other projects could not have made better (whether Microsoft API, Exiv2 or GExiv2), and actually as soon as GExiv2 will implement bug 732748, i.e. an GIO-based API to load files, we'll happily drop g_win32_locale_filename_from_utf8() in favor to this new API. But in the meantime, it still feels like there is a bug in this g_win_*() function. Or I misunderstood something (not impossible!).
(In reply to Jehan from comment #2) > > NTFS uses UTF-16 internally, and Windows API and kernel seem to generally use UTF-16 as well. > > So g_win32_locale_filename_from_utf8() should have converted this string > from UTF-8 to UTF-16, shouldn't it? It converts this string from UTF-8 to locale codepage, as the name implies. This function is for getting a locale-codepage-encoded filename that can be fed to APIs that do not support unicode. The conversion is lossy by definition. If an API does not take any kind of unicode string, it's limited that way, and there's nothing anyone [other than MS] can do, as long as this API is being used. What you *can* do *today* is: 1) Ensure that Exiv2 is built with Windows UTF-16 support. I've seen this supported in Exiv2 trunk source code, no idea whether it's something that's available in latest release. 2) Add some code to GExiv2 to convert filenames from UTF-16 to UTF-8 (trivial with glib), and feed these UTF-16 strings to the appropriate Exiv2 FileIO constructor. Appropriately Ifdefed, of course. As i have said, grep Exiv2 headers for EXV_UNICODE_PATH to find the APIs that support this. In fact, it might be a good exercise to temporarily modify Exiv2 headers to *remove* all methods and constructors that take 'char *' strings (as opposed to 'wchar_t *' strings), and then keep fixing GExiv2 (with appropriate ifdefs) until it compiles again. That will ensure that all filename-using GExiv2 codepaths can use UTF-16 (as far as Exiv2 interaction goes). Or, indeed, use GIO-based API to load files (or some other way of loading files without passing their names; for example, there might be some API that takes file descriptors).
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/1359.