After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 408637 - g_date_strftime failure
g_date_strftime failure
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: datetime
2.12.x
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2007-02-16 16:42 UTC by Morten Welinder
Modified: 2018-05-24 10:58 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Morten Welinder 2007-02-16 16:42:55 UTC
Add the following sniplet to testgdate.c

  setlocale(LC_TIME,"fi_FI");
  g_date_set_dmy(d, 12, 1, 2006);
  g_date_strftime(buf,100,"Today is a %b\n", d);
  g_print ("[%s]\n", buf);

and observe:

(process:19948): GLib-WARNING **: gdate.c:1493Error converting results of strftime to UTF-8: Invalid byte sequence in conversion input

[which is an ugly error message, btw.]

The problem is that strftime looks at LC_TIME whereas g_locale_to_utf8
looks at something else.  My LANG is en_US.UTF-8
Comment 1 Tor Lillqvist 2007-02-16 17:46:41 UTC
Don't do that then?
Comment 2 Morten Welinder 2007-02-16 18:26:56 UTC
Why would you say that?

I need to get translated month names in data dependent locale without
interfering with number formatting, so LC_ALL is out of the question.

strftime is documented to use LC_TIME so the code shouldn't assume it
uses anything else.  Or are you saying that people should set different
LC_* variables to different values?  Doing so is quite common, although
typically it is LC_NUMERIC that is set.
Comment 3 Tor Lillqvist 2007-02-16 18:46:37 UTC
I mean, doesn't the warning mean that LC_TIME=fi_FI uses a different charset than that which g_get_charset() returns (presumably UTF-8 in your case, as your have LANG set to use UTF-8)? Isn't such a combination broken by design? What happens if you set LC_TIME to fi_FI.UTF-8 instead?
Comment 4 Morten Welinder 2007-02-16 20:08:37 UTC
There is nothing broken about setting LC_whatever to something that uses
a different character set than other LC settings.  I set it, the C
library uses it and returns the right value.   And that's it.

(Well, in this case the C library doesn't return the right value -- the
character that trips up glib is 0xa0 which shouldn't have been there to
begin with.  But June and July would cause actual problems since they
contain \"a.)

The problem arises inside glib when it assumes that all strings from the
C library come back in the same encoding.  Well, strftime is the documented
exception to that rule.

> What happens if you set LC_TIME to fi_FI.UTF-8 instead?

Then, of course, I don't get an error message.  That's beside the point,
though.

But locale values are not something I get to pick and choose.  There is a
fixed set of valid values.  Worse, the codeset part is not even standardized,
see http://www.debian.org/doc/manuals/intro-i18n/ch-locale.en.html:

    [...] There are no standard for codeset and modifier. [...]

My language strings are data dependent so I can't punt it out to the
user.  And I cannot simply tag on the result of g_get_charset because
that doesn't work due to aliases, notably hyphens-vs-dashes-vs-nothing
in ISO-8859-1.  I cannot parse the result of "setlocale (LC_MESSAGES, NULL)"
because that is (in theory and practice) an opaque string.
Comment 5 Matthias Clasen 2007-03-16 04:09:17 UTC
> There is nothing broken about setting LC_whatever to something that uses
> a different character set than other LC settings.  I set it, the C
> library uses it and returns the right value.   And that's it.

Well, one thing thats broken is that there is 

  nl_langinfo (CODESET)

which returns "the character encoding used in the selected locale". There
is no similar function to get "the character encoding used for the parts
of localedata which happens to depend on LC_TIME". So at least nl_langinfo
seems to promote the idea that there should be a single charset for all 
aspects of localedata.
Comment 6 Colomban Wendling 2013-03-17 23:29:36 UTC
I know this is a very old report, but I ran into the issue today.  I started to use g_date_set_parse() and saw it works fine if I don't touch any locale settings, but fails miserably if I set LANG=C, which is a very common thing to do to get an untranslated program.

Maybe as you suggest the libc is broken in some aspect at not allowing to fetch per-group charset, but this is a really annoying issue.

A workaround for some (most?) situations would perhaps be to get g_get_charset() to guess UTF-8 as a fallback instead of US-ASCII -- although it's not a real solution since it wouldn't fix it with systems using a non-UTF-8 locale.  However, it's most likely an harmless thing because UTF-8 is compatible with US-ASCII and has a very strict and unambiguous representation, so it's really unlikely a non-UTF-8 charset could be successfully parsed as UTF-8.  What I mean is that if LC_TIME actually use ISO-8859-1, it's most likely that g_loacale_to_utf8() will just fail like it currently does;  while if it is UTF-8 it'd work just fine.

A more real fix maybe would be to parse LC_TIME to get the charset, and if not found fallback on g_get_charset().  I don't know the complete rules for locale settings, but extracting the encoding from something like fr_FR.UTF-8 is mostly a matter of:

  if (p = strchr(g_getenv("LC_TIME"), '.') && p[1]) {
    lc_time_charset = &p[1];
  } else {
    g_get_charset(&lc_time_charset);
  }
Comment 7 GNOME Infrastructure Team 2018-05-24 10:58:15 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/81.