GNOME Bugzilla – Bug 323249
Additional Numberings in gnome-doc-utils
Last modified: 2012-01-19 14:29:27 UTC
From http://mail.gnome.org/archives/gnome-doc-devel-list/2005-December/msg00000.html I wrote: I'm going to translate gnome-doc-utils into Thai and find two required Thai numberings are missing. One is Thai alphabetical, and the other is Thai decimal digits. Thai alphabetical numbering is run with Thai consonants in the range: U+0E01 (THAI CHARACTER KO KAI) : U+0E2E (THAI CHARACTER HO NOKHUK) with three characters skipped, namely: - U+0E03 (THAI CHARACTER KHO KHUAT) - U+0E05 (THAI CHARACTER KHO KHON) - U+0E06 (THAI CHARACTER KHO RAKHANG) (i.e. the sequence is: U+0E01, U+0E02, U+0E04, U+0E07 .. U+0E2E) This is mainly used for numbering appendixes in Thai documents, and occasionally used in ordered lists. Numbering with Thai decimal digits is less used in general, but exists in most official or military documents. It just uses Thai digits in the range (U+0E50..U+0E50) for 0..9 respectively. I'm not sure about digits bahavior described by W3C's XSLT, nor what have been done in gnome-doc-utils, but let me mention a common mistake in some implementations: the assumed translation of digits. We would need an explicit way to specify whether to use Thai digits in numbering, rather than automatically translated. Thank you for your attention. Any comment would be appreciated. Hossein Noorikhah also added : What about Persian(arabic script) numbering? ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ Any comment would be appreciated!
Ok, the Thai requirements are apparently harder than Persian/Arabic. The case of simple alternative digit sets is easier to fix.
Quoth Theppitak: "Thai digits in the range (U+0E50..U+0E50) for 0..9 respectively." I assume you mean U+0E50 to U+0E59? So I need to understand how all these numbering systems work. For Thai decimal digits (U+0E50) and Persian numbering (U+06F0), I'm assuming the numbering works just like Western decimal numbering: 0, 1, ..., 9, 10, 11, ..., 19, 20, 21, .... That is, you have a 0, you begin counting with 1, and when you run out of digits in a position, you set the digit in the position to 0 and increment the digit to the left. Correct? In contrast, alphabetic numbering has no "0" digit. Rather, when you run out of digits, you increment the digit to the left and set the current digit to the one you started counting with. So we have a, b, c, ..., x, y, z, aa, ab, .... See the difference? So does Thai alphabetic numbering follow this scheme?
Does the Thai alphabet have upper- and lowercase letters? For English alphabetic numbering, we have both a,b,c,... and A,B,C,....
I don't expect this to matter for anything we do, but how do the Thai and Persian decimal systems indicate a negative value?
Shaun, Let alphabetic case aside for now. For decimal, AFAIK, all digit sets encoded in Unicode behave the same way. But that's not even relevant here: glibc has support for the 'I' (i18n) modifier to printf format strings for decimal and floating point number, to use the locale-specific digit set instead of the ASCII one. So you write printf ("%Id", x) and get it using Persian numerals. Now as a portability layer, we have worked with GNU gettext maintainers and gettext >= 0.14 takes care of the i18n flag. Means, if the runtime doesn't support it, it removes the i18n flag from the translated message. So a rather safe way to use locale-specific digits is to printf (_("%d"), x). That also takes care of negative numbers, etc. Hope that helps. I don't have much ideas about alphabetic at this point. Just know that CSS3 has some stuff for enumerating like that.
This is all done in XSLT, so I don't have direct access to libc functionality. Numbering system can either be implemented in libxslt as an additional xsl:number format, or they can be implemented in high-level XSLT code. Doing things in libxslt means we can use libc; however, having the numbering system automatically determined by libc from the environment isn't an acceptable solution. The XSLT will explicitly tell libxslt which number formatter to use, and libxslt would need to respect that absolutely. We also can't shunt the work off onto CSS, because most of these numbers are used inside of block text, rather than as prefixes on the lists.
Ok, then I guess reassigning to libxslt is the best way to go. I was discussing locale-dependent sorting order with Daniel Veillard this summer, that's quite possible. Digits should not be any harder.
It's also not too difficult to implement these directly in XSLT. That has the advantage of making the stylesheets portable across different XSLT implementations.
Shaun, also check http://mail.gnome.org/archives/gnome-doc-devel-list/2005-September/msg00000.html I've implemented something similar for Serbian in XSLT, and it seems it applies directly to this as well. Easy enough to make it more general, I'd say. Provided you find it useful, of course :)
Shuan, More information on Thai numbering: - Thai decimal number just behaves like that of Western, including negative number, as Behdad explained in #5. Just replace 0..9 with corresponding digits. For floating points, we use comma as thousand separator, and period as decimal point. - Thai alphabet does not have case. And, yes, its behavior is just like what you described at the end of #2.
Oops! Testing it with i18n/test-numbers in recent CVS, I found I missed something when describing Thai alphanumeric numbering. Actually, there are two more skipped chars: - U+0E24 THAI CHARACTER RU - U+0E26 THAI CHARACTER LUSo, the sequence should be: U+0E01, U+0E02, U+0E04, U+0E07 .. U+0E23, U+0E25, U+0E27 .. U+0E2E
Created attachment 56395 [details] [review] alpha-thai patch
*ping* Can I commit the last patch?
Yes, please commit to HEAD.
Patch committed to HEAD.