GNOME Bugzilla – Bug 112798
Provide a native implementation of collation
Last modified: 2013-02-03 15:38:21 UTC
i am writing a dictionary software(StarDict), and i need a good unicode sorting function when creating index. i read "Unicode Technical Standard #10: Unicode Collation Algorithm", http://www.unicode.org/reports/tr10/ And i found i can't use g_utf8_collate() in StarDict because it is locale-dependent. can glib provide a locale-independent g_utf8_collate() in further version? thank you :) -- huzheng <huzheng_001@163.com>
Collation is *by definition* locale sensitive. Different languages have different rules. Perhaps I'm not understanding what you are asking for? The Unicode collation algorithm describes "tailoring" for different locales, and such tailoring is necessary for correct operation.
here is some text that i write: ============= g_utf8_collate()? This is a locale-dependent funcition, it means, if you look up a Chinese characters while in the Chinese locale, it works fine, but if you are in some other locale, the look up will failed as the order is not the same as in the Chinese locale(which is being used when creating this dictionary). ============ when using g_utf8_collate() with the same source code and input data but the running enviroment is different(locale is different), the result will be different. this is because g_utf8_collate() convert the utf-8 string into locale string first, but the convert will got different result in different locale(most will fail). i hope you understand it now. thanks :)
On decent systems (e.g. GNU libc), g_utf8_collate() does not convert to a locale string, but rather to a Unicode wide character string. Collation is *still* locale-dependent because collation depends on locale, by definition. Advantages of having native collation algorithm in GLib would be: - Would get good collation everywhere - Could turn of tailoring - Could provide comparison ignoring accents or case (See the UCA and doc comments on g_utf8_casefold()) Disadvantages: - A separate copy of collation data in GLib has extra memory overhead - Maintaining collation data as part of the GLib distribution is not appealing. - Massive job to implement
a massive job that hasn't happened since 2003 - time to give up waiting