GNOME Bugzilla – Bug 404832
g_date_strftime should use wcsftime if available
Last modified: 2007-02-17 08:57:42 UTC
See attached test case.
Compiled with mingw and without any special parameters. Set "Standard and Formats" to "French" and "Language for non-Unicode applications"(or so) to "Russian" or the other way around. Other combinations may fail as well.
On Windows, strftime with %B will fill the buffer with the full month in the encoding specified for non-Unicode applications which may differ from the one used by g_locale_to_utf8. This can be avoided if defined(G_WIN32_HAVE_WIDECHAR_API) and wcsftime is used.
Created attachment 81970 [details]
> with the full month in the encoding specified for non-Unicode applications
> which may differ from the one used by g_locale_to_utf8
The change sounds like a possibly useful one, but I'm suprised by the
above. What are these two encodings? g_locale_to_utf8() really is supposed
to be picking up the "encoding specified for non-Unicode applications"
That is right, g_get_charset picks the encoding specified for non-Unicode applications. But that may differ from the locale encoding... Maybe another bug?
See the attachment for an encoding test case. E.g. it might print
Created attachment 82041 [details]
encodings test case
How does this problem occur without an explicit call to setlocale() with a
non-"" second argument? If I look at the msdn docs for setlocale() I
setlocale( LC_ALL, "" );
Sets the locale to the default, which is the user-default ANSI code page obtained from the operating system
Internally, g_get_charset() uses GetACP() which gives the "user-default
ANSI code page". You first example doesn't seem to have a non-default
setlocale() call. (gtk_init() calls setlocale() internally, with the
above arguments, IIRC)
I agree that wcsftime() should be used on NT-based Windows.
Whether the situation described here is particularily common I don't know, though. What is the use case where this situation occurs? Is such a setup common among Russian speakers in France, for instance? Any idea what the "Language for non-Unicode applications" setting actually does, API-wise? The "user friendly" wording in the Control Panel applet is less than useful. Presumably this setting sets the system codepage (what GetACP() returns), but not the default locale (what GetThreadLocale() returns), which is affected by the "Standards and Formats" setting?
Just two links:
Hmm, looking at the source for wcsftime() (for instance, as included with MSVC6, or with the Platform SDK) reveals an interesting and somewhat disappointing thing: wcsftime() converts the format to a multibyte string with wcstombs(), calls strftime(), and convers the result to UTF-16 with mbstowcs(). So using wcsftime() will not get rid of strftime(), it will still be used...
As the C library's handling of locales seems to be somewhat confusing anyway, what should be done is probably then to forget both strftime() and wcstime(). Instead parse the format string ourselves and call the necessary wide character Win32 APIs, like GetLocaleInfo() and GetDateFormat(). But which LCID should be used? The one returned by GetThreadLocale(), presumably. And as we use wide character APIs, codepages fortunately do not get involved at all.
Created attachment 82059 [details] [review]
Suggested patch to trunk GLib. Unfortunately we can't use this as such in the stable branch, as we (at least pretend to) still support Win9x there. I'll cook up an other patch for the stable branch, where on Win9x the old code is used.
Created attachment 82063 [details] [review]
Corresponding patch for glib-2-12
On a related issue, I just filed bug 405469.
So, do you think the approach in these patches will work for you? I.e. using the thread locale, i.e. what the "Standard and Formats" setting affects, for g_date_strftime(), using just wide character Windows APIs and not strftime() or wcsftime() at all?
When somebody chooses a language with a different codepage as system codepage than that used by the default user locale, what is usually the rationale for this? Like in the French locale but Russian codepage example, would this be somebody who wants to manipulate files with russian filenames (with programs that use non-Unicode APIs)?
Hello? Unless somebody tells me why it's a bad idea, I'll commit the patches to the trunk and stable branch.
I am sorry, I am a bit busy right now and I wanted to test before I respond. I guess the approach will work for me, but in a few days I could tell you more definitely.
If you want to commit, go on :-)
Fixed in trunk and glib-2-12.
2007-02-17 Tor Lillqvist <firstname.lastname@example.org>
* glib/gdate.c (win32_strftime_helper): New Win32-only
function. Use the wide character Win32 API to do the work of
strftime(): GetThreadLocale(), GetLocaleInfoW(), GetDateFormatW()
(g_date_strftime): On NT-based Windows use win32_strftime_helper()
instead of strftime() to avoid codepage issues with strftime().
Unfortunately using wcsftime() would not help either. (#404832)