GNOME Bugzilla – Bug 658107
GDateTime alternate number formats are a bit broken
Last modified: 2012-02-24 17:22:03 UTC
GDateTime uses the "I" modifier to printf to get the alternate number format as per the 'O' modifier for strftime(). man 3 printf says this: glibc 2.2 adds one further flag character. I For decimal integer conversion (i, d, u) the output uses the locale's alternative output digits, if any. For example, since glibc 2.2.3 this will give Arabic-Indic digits in the Persian ("fa_IR") locale. This flag is a GNU extension, so this will badly break on non-GNU systems. It also happens to be broken on GNU systems as well (at least F15 and F16 affected) when mixed with the '0' modifier. You would expect: printf ("%I04d", 5); to print out: ۰۰۰۵ but actually, you get only the '۵' (and the '0' modifier is ignored). This means that a time might print out like "۱۹:۰:۰" instead of "۱۹:۰۰:۰۰". After some research and general API shopping, it seems like the only way that we can possibly expect to get Arabic digits is to do one of these: - 'I' modifier (only works on GNU) - abuse strftime("%OS") to format two digits at a time (not universally supported and may not be UTF-8) - do one of the above to build a table of digits for the current locale and build a number formatting function that consults this table as needed - do it from scratch (and then we have to do it for all affected locales) None of the options seem particularly good, but one thing to note: we probably only care to format 'O' modified strftime() strings if the current locale specifies them for the preferred time/date formats -- in which case we know that strftime() on their system will work. Therefore, that seems like the best bet. There is only one complicating factor here: we translate the preferred date/time formats using gettext for systems that lack nl_langinfo(), so we should probably make sure that no translations introduce the use of %O characters.
Here are the locales that show non-ASCII numerals for strftime("%OS") or printf %I02d: bn_BD strftime %OS: 05 %I02d: ৫ bn_IN strftime %OS: 05 %I02d: ৫ fa_IR strftime %OS: ۰۵ %I02d: ۵ ja_JP strftime %OS: �� %I02d: 05 my_MM strftime %OS: ၀၅ %I02d: ၅ ta_IN strftime %OS: 05 %I02d: ௫ uk_UA strftime %OS: ������ %I02d: 05 I strongly suspect uk_UA of being some kind of a bug and I'm suspicious about ja_JP. That practically gives us between 4 and 6 scripts to worry about: 'bn', 'fa', 'my' and 'ta'. That's finite and small.
I was deliciously naïve: as_IN strftime %OS: 05 %I02d: ৫ bn_BD strftime %OS: 05 %I02d: ৫ bn_IN strftime %OS: 05 %I02d: ৫ fa_IR strftime %OS: ۰۵ %I02d: ۵ gu_IN strftime %OS: 05 %I02d: ૫ hi_IN strftime %OS: 05 %I02d: ५ hne_IN strftime %OS: 05 %I02d: ५ ja_JP.utf8 strftime %OS: 五 %I02d: 05 kn_IN strftime %OS: 05 %I02d: ೫ kok_IN strftime %OS: 05 %I02d: ५ mai_IN strftime %OS: 05 %I02d: ५ ml_IN strftime %OS: 05 %I02d: ൫ mr_IN strftime %OS: 05 %I02d: ५ my_MM strftime %OS: ၀၅ %I02d: ၅ or_IN strftime %OS: ୫ %I02d: ୫ pa_IN strftime %OS: 05 %I02d: ੫ ps_AF strftime %OS: 05 %I02d: ٥ ta_IN strftime %OS: 05 %I02d: ௫ te_IN strftime %OS: 05 %I02d: ౫ uk_UA.utf8 strftime %OS: травня %I02d: 05 Some notes: The mix-matching of support for strftime %OS and printf %I is pretty random. %0I2d" is broken for all locales. For 10, ja_JP strftime outputs "十" which is a single character to mean "ten". That pretty much prevents us from doing anything clever in terms of producing the output or dealing with padding for ourselves. uk_UA is broken: the conversion is %OS (which is for seconds) and the output is "травня" which is Ukrainian for "May". Other numbers map to month names predictably.
Okay. Figured it out. "LANG=fa_IR locale LC_TIME -k" on the command line shares this (among other things): alt_digits="۰۰";"۰۱";"۰۲";"۰۳";"۰۴";"۰۵";"۰۶";"۰۷";"۰۸";"۰۹";"۱۰";"۱۱";"۱۲";"۱۳";"۱۴";"۱۵";"۱۶";"۱۷";"۱۸";"۱۹";"۲۰";"۲۱";"۲۲";"۲۳";"۲۴";"۲۵";"۲۶";"۲۷";"۲۸";"۲۹";"۳۰";"۳۱";"۳۲";"۳۳";"۳۴";"۳۵";"۳۶";"۳۷";"۳۸";"۳۹";"۴۰";"۴۱";"۴۲";"۴۳";"۴۴";"۴۵";"۴۶";"۴۷";"۴۸";"۴۹";"۵۰";"۵۱";"۵۲";"۵۳";"۵۴";"۵۵";"۵۶";"۵۷";"۵۸";"۵۹";"۶۰";"۶۱";"۶۲";"۶۳";"۶۴";"۶۵";"۶۶";"۶۷";"۶۸";"۶۹";"۷۰";"۷۱";"۷۲";"۷۳";"۷۴";"۷۵";"۷۶";"۷۷";"۷۸";"۷۹";"۸۰";"۸۱";"۸۲";"۸۳";"۸۴";"۸۵";"۸۶";"۸۷";"۸۸";"۸۹";"۹۰";"۹۱";"۹۲";"۹۳";"۹۴";"۹۵";"۹۶";"۹۷";"۹۸";"۹۹" This explains why C99 only specifies %O conversions for two-digit number values (and omits %OY, for example, which you might expect to give you the 4 digit year in alternate digits). That string is also accessible via nl_langinfo(ALT_DIGITS), which is clearly what we should use. There are a couple of problems still, however: The GNU implementation of nl_langinfo appears to violate the specification by using '\0' to separate the digits rather than ';' as indicated by the Single UNIX Specification[1]: ALT_DIGITS The alternative symbols for digits, corresponding to the %O conversion specification modifier. The value consists of semicolon-separated symbols. The first is the alternative symbol corresponding to zero, the second is the symbol corresponding to one, and so on. Up to 100 alternative symbols may be specified. [1] http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03_05_02 That's easy enough to work around, I guess. Another more substantial problem is that this ALT_DIGITS scheme doesn't seem sufficient to implement all of the format specifiers demanded by C99. Specifically: %Oe is replaced by the day of the month, using the locale’s alternative numeric symbols (filled as needed with leading spaces). %Ou is replaced by the ISO 8601 weekday as a number in the locale’s alternative representation, where Monday is 1. %Ow is replaced by the weekday as a number, using the locale’s alternative numeric symbols. Essentially, these three formats demand non-zero-padded or non-2-digit numbers. glibc (and the 'date' command line tool) are non-conformant here and always output two-digit zero-padded forms. That's slightly annoying since, for the fa_IR locale, %Oe is included in the D_T_FMT string.
Created attachment 195606 [details] [review] g_date_time_format: improve support for alt digits Improve a few situations where g_date_time_format() was getting the padding wrong when displaying alt digits (eg: Arabic numerals) for formatting time. We now depend on nl_langinfo (_NL_CTYPE_OUTDIGITn_WC) to do the conversion, which is very likely glibc-specific, but our previous method relied on a glibc-specific printf() feature, so no harm done there. Add a configure check for nl_langinfo (_NL_CTYPE_OUTDIGITn_WC). Uncomment a few testcases that were failing previously.
From my earlier investigation into I modifier behaviour it seemed to me that glibc gets the character counting wrong there, which is why padding doesn't work right - it seems to count bytes.
Review of attachment 195606 [details] [review]: The idea to use nl_langinfo() for alternative digit conversions had actually occurred to me recently, but I was too lazy to do this work myself - thanks ! It looks good to me.
Attachment 195606 [details] pushed as 2d7051e - g_date_time_format: improve support for alt digits
Created attachment 208366 [details] GtkCalendar Widget I just wrote a simple python program to show the GtkCalendar widget to test if it can show a four digit year in Persian or no. But It didn't work. I'm using glib 2.30.2 and gtk+ 3.2.3. Am I doing something wrong or the whole bug is related to something else? I attached a picture of GtkCalendar.