Bug 112798 – Provide a native implementation of collation

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 112798 - Provide a native implementation of collation


Summary:	Provide a native implementation of collation


Status:	RESOLVED WONTFIX

Product:	glib
Classification:	Platform
Component:	general
Version:	2.2.x
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2003-05-12 00:21 UTC by huzheng
Modified:	2013-02-03 15:38 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description huzheng 2003-05-12 00:21:31 UTC

i am writing a dictionary software(StarDict), and i need a good
unicode sorting function when creating index.
i read "Unicode Technical Standard #10: Unicode Collation Algorithm",
http://www.unicode.org/reports/tr10/

And i found i can't use g_utf8_collate() in StarDict because it is
locale-dependent.

can glib provide a locale-independent g_utf8_collate() in further
version?

thank you :)

-- 
huzheng <huzheng_001@163.com>

Comment 1 Owen Taylor 2003-05-12 01:02:34 UTC

Collation is *by definition* locale sensitive. Different
languages have different rules. Perhaps I'm not understanding
what you are asking for?

The Unicode collation algorithm describes "tailoring" for
different locales, and such tailoring is necessary for
correct operation.

Comment 2 huzheng 2003-05-12 05:39:50 UTC

here is some text that i write:
=============
g_utf8_collate()? This is a locale-dependent funcition, it means, 
if you look up a Chinese characters while in the Chinese locale, it
works fine, 
but if you are in some other locale, the look up will failed as the
order is 
not the same as in the Chinese locale(which is being used when creating 
this dictionary).
============

when using g_utf8_collate() with the same source code and input data
but the running enviroment is different(locale is different), the
result will be different. this is because g_utf8_collate() convert the
utf-8 string into locale string first, but the convert will got
different result in different locale(most will fail).

i hope you understand it now.

thanks :)

Comment 3 Owen Taylor 2003-05-23 21:50:11 UTC

On decent systems (e.g. GNU libc), g_utf8_collate() does not
convert to a locale string, but rather to a Unicode
wide character string. Collation is *still* locale-dependent
because collation depends on locale, by definition.

Advantages of having native collation algorithm in GLib
would be:

 - Would get good collation everywhere
 - Could turn of tailoring
 - Could provide comparison ignoring accents or case
   (See the UCA and doc comments on g_utf8_casefold())

Disadvantages:

 - A separate copy of collation data in GLib has extra
   memory overhead
 - Maintaining collation data as part of the GLib distribution
   is not appealing.
 - Massive job to implement

Comment 4 Matthias Clasen 2013-02-03 15:38:21 UTC

a massive job that hasn't happened since 2003 - time to give up waiting