After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 795006 - g_win32_locale_filename_from_utf8() failed to convert a local path
g_win32_locale_filename_from_utf8() failed to convert a local path
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: win32
2.55.x
Other Linux
: Normal normal
: ---
Assigned To: gtk-win32 maintainers
gtk-win32 maintainers
Depends on:
Blocks:
 
 
Reported: 2018-04-05 15:16 UTC by Jehan
Modified: 2018-05-24 20:20 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Jehan 2018-04-05 15:16:22 UTC
In GIMP, bug 794949, we had a case on Windows with an image with a path such as "F:\都.png". We needed to load metadata with GExiv2, which unfortunately doesn't have support for GFile or GInputStream/GOutputStream yet (cf. bug 732748).

So we passed the result of g_file_get_path() in g_win32_locale_filename_from_utf8(). Unfortunately it failed and returned NULL (for the record, g_file_get_path() properly returned a valid UTF-8 value as far as we could see).

So I read the function docs which says it may fail when the string contains unicode characters not representable in the system codepage. Yet since this is the path of an actual file currently existing in the filesystem, I assume it should be convertible in the system codepage, so that would be a bug. Or am I misunderstanding something?
Comment 1 LRN 2018-04-05 19:57:42 UTC
NTFS uses UTF-16 internally, and Windows API and kernel seem to generally use UTF-16 as well. These days the local/system codepage encodings are a sham for compatibility (highly imperfect, as you now see) with older, non-Unicode APIs. Why MS never implemented proper UTF-8 support in form of a full-fledged UTF-8 multi-byte codepage - that is a mystery for future historians.

Note that Exiv2 does support wchar strings (see EXV_UNICODE_PATH in FileIo constructr). Maybe GExiv2 should make use of these?

I'd chastise Exiv2 for not getting on the UTF-8 API bandwagon along with the Gnome stack, but that would probably be a major (and ABI-wrecking) undertaking for them, all things considered. Meanwhile they do what they can to minimize the damage, and exposing wchar-string version of the API (especially in C++ where overloading is a thing) is a viable, quick fix on their part.

So...how do we close this? RESOLVED-NOTGNOME? RESOLVED-WONTFIX? RESOLVED-NOTABUG?
Comment 2 Jehan 2018-04-05 20:26:47 UTC
> NTFS uses UTF-16 internally, and Windows API and kernel seem to generally use UTF-16 as well.

So g_win32_locale_filename_from_utf8() should have converted this string from UTF-8 to UTF-16, shouldn't it? If it failed to do so, that's a bug, right? I don't understand why you want to close.

Now I don't say that other projects could not have made better (whether Microsoft API, Exiv2 or GExiv2), and actually as soon as GExiv2 will implement bug 732748, i.e. an GIO-based API to load files, we'll happily drop g_win32_locale_filename_from_utf8() in favor to this new API. But in the meantime, it still feels like there is a bug in this g_win_*() function. Or I misunderstood something (not impossible!).
Comment 3 LRN 2018-04-05 21:08:25 UTC
(In reply to Jehan from comment #2)
> > NTFS uses UTF-16 internally, and Windows API and kernel seem to generally use UTF-16 as well.
> 
> So g_win32_locale_filename_from_utf8() should have converted this string
> from UTF-8 to UTF-16, shouldn't it?

It converts this string from UTF-8 to locale codepage, as the name implies. This function is for getting a locale-codepage-encoded filename that can be fed to APIs that do not support unicode. The conversion is lossy by definition. If an API does not take any kind of unicode string, it's limited that way, and there's nothing anyone [other than MS] can do, as long as this API is being used.

What you *can* do *today* is:
1) Ensure that Exiv2 is built with Windows UTF-16 support. I've seen this supported in Exiv2 trunk source code, no idea whether it's something that's available in latest release.
2) Add some code to GExiv2 to convert filenames from UTF-16 to UTF-8 (trivial with glib), and feed these UTF-16 strings to the appropriate Exiv2 FileIO constructor. Appropriately Ifdefed, of course.

As i have said, grep Exiv2 headers for EXV_UNICODE_PATH to find the APIs that support this. In fact, it might be a good exercise to temporarily modify Exiv2 headers to *remove* all methods and constructors that take 'char *' strings (as opposed to 'wchar_t *' strings), and then keep fixing GExiv2 (with appropriate ifdefs) until it compiles again. That will ensure that all filename-using GExiv2 codepaths can use UTF-16 (as far as Exiv2 interaction goes).

Or, indeed, use GIO-based API to load files (or some other way of loading files without passing their names; for example, there might be some API that takes file descriptors).
Comment 4 GNOME Infrastructure Team 2018-05-24 20:20:18 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/1359.