After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 658107 - GDateTime alternate number formats are a bit broken
GDateTime alternate number formats are a bit broken
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2011-09-03 02:56 UTC by Allison Karlitskaya (desrt)
Modified: 2012-02-24 17:22 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
g_date_time_format: improve support for alt digits (15.09 KB, patch)
2011-09-04 00:15 UTC, Allison Karlitskaya (desrt)
committed Details | Review
GtkCalendar Widget (11.69 KB, image/png)
2012-02-24 17:22 UTC, Arash Mousavi
  Details

Description Allison Karlitskaya (desrt) 2011-09-03 02:56:37 UTC
GDateTime uses the "I" modifier to printf to get the alternate number format as per the 'O' modifier for strftime().

man 3 printf says this:

       glibc 2.2 adds one further flag character.

       I      For decimal integer conversion (i, d, u) the output uses
              the locale's alternative  output  digits,  if  any.  For
              example, since glibc 2.2.3 this will give Arabic-Indic
              digits in the Persian ("fa_IR") locale.

This flag is a GNU extension, so this will badly break on non-GNU systems.

It also happens to be broken on GNU systems as well (at least F15 and F16 affected) when mixed with the '0' modifier.  You would expect:

  printf ("%I04d", 5);

to print out:

  ۰۰۰۵

but actually, you get only the '۵' (and the '0' modifier is ignored).

This means that a time might print out like "۱۹:۰:۰" instead of "۱۹:۰۰:۰۰".



After some research and general API shopping, it seems like the only way that we can possibly expect to get Arabic digits is to do one of these:

 - 'I' modifier (only works on GNU)

 - abuse strftime("%OS") to format two digits at a time (not universally
   supported and may not be UTF-8)

 - do one of the above to build a table of digits for the current locale
   and build a number formatting function that consults this table as
   needed

 - do it from scratch (and then we have to do it for all affected locales)


None of the options seem particularly good, but one thing to note: we probably only care to format 'O' modified strftime() strings if the current locale specifies them for the preferred time/date formats -- in which case we know that strftime() on their system will work.  Therefore, that seems like the best bet.

There is only one complicating factor here: we translate the preferred date/time formats using gettext for systems that lack nl_langinfo(), so we should probably make sure that no translations introduce the use of %O characters.
Comment 1 Allison Karlitskaya (desrt) 2011-09-03 16:32:46 UTC
Here are the locales that show non-ASCII numerals for strftime("%OS") or printf %I02d:

bn_BD    strftime %OS: 05          %I02d: ৫
bn_IN    strftime %OS: 05          %I02d: ৫
fa_IR    strftime %OS: ۰۵          %I02d: ۵
ja_JP    strftime %OS: ��          %I02d: 05
my_MM    strftime %OS: ၀၅          %I02d: ၅
ta_IN    strftime %OS: 05          %I02d: ௫
uk_UA    strftime %OS: ������      %I02d: 05


I strongly suspect uk_UA of being some kind of a bug and I'm suspicious about ja_JP.

That practically gives us between 4 and 6 scripts to worry about: 'bn', 'fa', 'my' and 'ta'.  That's finite and small.
Comment 2 Allison Karlitskaya (desrt) 2011-09-03 17:54:58 UTC
I was deliciously naïve:

as_IN        strftime %OS: 05      %I02d: ৫
bn_BD        strftime %OS: 05      %I02d: ৫
bn_IN        strftime %OS: 05      %I02d: ৫
fa_IR        strftime %OS: ۰۵      %I02d: ۵
gu_IN        strftime %OS: 05      %I02d: ૫
hi_IN        strftime %OS: 05      %I02d: ५
hne_IN       strftime %OS: 05      %I02d: ५
ja_JP.utf8   strftime %OS: 五       %I02d: 05
kn_IN        strftime %OS: 05      %I02d: ೫
kok_IN       strftime %OS: 05      %I02d: ५
mai_IN       strftime %OS: 05      %I02d: ५
ml_IN        strftime %OS: 05      %I02d: ൫
mr_IN        strftime %OS: 05      %I02d: ५
my_MM        strftime %OS: ၀၅      %I02d: ၅
or_IN        strftime %OS: ୫       %I02d: ୫
pa_IN        strftime %OS: 05      %I02d: ੫
ps_AF        strftime %OS: 05      %I02d: ٥
ta_IN        strftime %OS: 05      %I02d: ௫
te_IN        strftime %OS: 05      %I02d: ౫
uk_UA.utf8   strftime %OS: травня  %I02d: 05


Some notes:

The mix-matching of support for strftime %OS and printf %I is pretty random.

%0I2d" is broken for all locales.

For 10, ja_JP strftime outputs "十" which is a single character to mean "ten".  That pretty much prevents us from doing anything clever in terms of producing the output or dealing with padding for ourselves.

uk_UA is broken: the conversion is %OS (which is for seconds) and the output is "травня" which is Ukrainian for "May".  Other numbers map to month names predictably.
Comment 3 Allison Karlitskaya (desrt) 2011-09-03 18:35:08 UTC
Okay.  Figured it out.

"LANG=fa_IR locale LC_TIME -k" on the command line shares this (among other things):

alt_digits="۰۰";"۰۱";"۰۲";"۰۳";"۰۴";"۰۵";"۰۶";"۰۷";"۰۸";"۰۹";"۱۰";"۱۱";"۱۲";"۱۳";"۱۴";"۱۵";"۱۶";"۱۷";"۱۸";"۱۹";"۲۰";"۲۱";"۲۲";"۲۳";"۲۴";"۲۵";"۲۶";"۲۷";"۲۸";"۲۹";"۳۰";"۳۱";"۳۲";"۳۳";"۳۴";"۳۵";"۳۶";"۳۷";"۳۸";"۳۹";"۴۰";"۴۱";"۴۲";"۴۳";"۴۴";"۴۵";"۴۶";"۴۷";"۴۸";"۴۹";"۵۰";"۵۱";"۵۲";"۵۳";"۵۴";"۵۵";"۵۶";"۵۷";"۵۸";"۵۹";"۶۰";"۶۱";"۶۲";"۶۳";"۶۴";"۶۵";"۶۶";"۶۷";"۶۸";"۶۹";"۷۰";"۷۱";"۷۲";"۷۳";"۷۴";"۷۵";"۷۶";"۷۷";"۷۸";"۷۹";"۸۰";"۸۱";"۸۲";"۸۳";"۸۴";"۸۵";"۸۶";"۸۷";"۸۸";"۸۹";"۹۰";"۹۱";"۹۲";"۹۳";"۹۴";"۹۵";"۹۶";"۹۷";"۹۸";"۹۹"

This explains why C99 only specifies %O conversions for two-digit number values (and omits %OY, for example, which you might expect to give you the 4 digit year in alternate digits).

That string is also accessible via nl_langinfo(ALT_DIGITS), which is clearly what we should use.  There are a couple of problems still, however:

The GNU implementation of nl_langinfo appears to violate the specification by using '\0' to separate the digits rather than ';' as indicated by the Single UNIX Specification[1]: 

    ALT_DIGITS

    The alternative symbols for digits, corresponding to the %O conversion
    specification modifier. The value consists of semicolon-separated
    symbols. The first is the alternative symbol corresponding to zero,
    the second is the symbol corresponding to one, and so on. Up to 100
    alternative symbols may be specified.


[1] http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03_05_02

That's easy enough to work around, I guess.

Another more substantial problem is that this ALT_DIGITS scheme doesn't seem sufficient to implement all of the format specifiers demanded by C99.  Specifically:

%Oe   is replaced by the day of the month, using the locale’s alternative
      numeric symbols (filled as needed with leading spaces).
%Ou   is replaced by the ISO 8601 weekday as a number in the locale’s
      alternative representation, where Monday is 1.
%Ow   is replaced by the weekday as a number, using the locale’s
      alternative numeric symbols.

Essentially, these three formats demand non-zero-padded or non-2-digit numbers.  glibc (and the 'date' command line tool) are non-conformant here and always output two-digit zero-padded forms.  That's slightly annoying since, for the fa_IR locale, %Oe is included in the D_T_FMT string.
Comment 4 Allison Karlitskaya (desrt) 2011-09-04 00:15:58 UTC
Created attachment 195606 [details] [review]
g_date_time_format: improve support for alt digits

Improve a few situations where g_date_time_format() was getting the
padding wrong when displaying alt digits (eg: Arabic numerals) for
formatting time.

We now depend on nl_langinfo (_NL_CTYPE_OUTDIGITn_WC) to do the
conversion, which is very likely glibc-specific, but our previous method
relied on a glibc-specific printf() feature, so no harm done there.

Add a configure check for nl_langinfo (_NL_CTYPE_OUTDIGITn_WC).

Uncomment a few testcases that were failing previously.
Comment 5 Matthias Clasen 2011-09-04 02:12:26 UTC
From my earlier investigation into I modifier behaviour it seemed to me that glibc gets the character counting wrong there, which is why padding doesn't work right - it seems to count bytes.
Comment 6 Matthias Clasen 2011-09-04 02:15:44 UTC
Review of attachment 195606 [details] [review]:

The idea to use nl_langinfo() for alternative digit conversions had actually occurred to me recently, but I was too lazy to do this work myself - thanks !
It looks good to me.
Comment 7 Allison Karlitskaya (desrt) 2011-09-04 03:06:52 UTC
Attachment 195606 [details] pushed as 2d7051e - g_date_time_format: improve support for alt digits
Comment 8 Arash Mousavi 2012-02-24 17:22:03 UTC
Created attachment 208366 [details]
GtkCalendar Widget

I just wrote a simple python program to show the GtkCalendar widget to test if it can show a four digit year in Persian or no. But It didn't work. 
I'm using glib 2.30.2 and gtk+ 3.2.3. Am I doing something wrong or the whole bug is related to something else? 
I attached a picture of GtkCalendar.