GNOME Bugzilla – Bug 150394
Sorting with Glib is bad in Norwegian
Last modified: 2004-12-22 21:47:04 UTC
Since Gaim is using Glib's 'g_utf8_*' string manipulation functions, you can use Gaim to reproduce this bug. Open up Gaim's buddy list, rename some contacts to 'Ælvis', 'Øyvind', 'Åge', 'Aage', 'Asbjørn' and 'Cato'. Then go into the preferences and change the buddy list sorting to «alphabetical». The list will be sorted like this: 1. Ælvis 2. Øyvind 3. Åge 4. Asbjørn 5. Cato 6. Aage This is, however, almost as wrong as it can get. The correct order should be: 1. Asbjørn 2. Cato 3. Ælvis 4. Øyvind 5. Aage 6. Åge I'm not sure if this is a bug in Glib or if Gaim uses Glib the wrong way, but the Gaim developers seems to think that this is a bug with Glib: https://sourceforge.net/tracker/? func=detail&atid=350235&aid=1010023&group_id=235 "For alphabetical sorting, we use the g_utf8_casefold() and then g_utf8_collate() method described at the above URL. As they say, it's not perfect, but it's pretty good. File a bug with the Glib folk to try and make it better, or to get them to implement a better case-insensitive sorting function that is locale-aware for utf8 strings."
GLib just uses the system implementation of strcoll(); probably there is a better way of using Win32 collation facilities than what it is doing now, the code is optimized for Unix systems with high-quality C libraries.
I think it should be created some test cases for Windows to see how it should be done, and how it is done at the moment. I've provided a good test case for Norwegian in this bug report. The collation needs improvement, for sure.
See bug #141124. There is a bug in g_utf8_collate_key() and g_utf8_collate(), in the ifdef branches used on Windows (and many other non-Linux platforms, presumably).
Fixed in CVS: 2004-08-19 Tor Lillqvist <tml@iki.fi> * glib/gunicollate.c (g_utf8_collate, g_utf8_collate_key): Correct source and destination charset parameter order in g_convert() call. (#150394, possibly also #141124) Note that (after this fix) in order for sorting to be according to (for instance) Norwegian rules, you need to set the locale in the Regional Settings appropriately. Setting LANG or LC_* environment variables won't affect the Microsoft C library, which is where the strxfrm() and strcoll() are that GLib ultimately uses. They will affect what UI language GTK and GLib use, but they won't affect the second parameter to the setlocale() call in GTK. (There is room for improvement here. Maybe GTK should convert LANG or LC_* env vars it might find from the "sv_FI" style to the Microsoft-style "Swedish_Finland" if necessary, and pass them to setlocale?)
Great