After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 531403 - g_utf8_collate broken on Mac
g_utf8_collate broken on Mac
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other Mac OS
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 538088 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2008-05-04 16:29 UTC by Yevgen Muntyan
Modified: 2008-06-17 18:10 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Glib patch (8.63 KB, patch)
2008-05-04 16:31 UTC, Yevgen Muntyan
none Details | Review
Gtk patch (5.66 KB, patch)
2008-05-04 16:32 UTC, Yevgen Muntyan
none Details | Review
patch (6.44 KB, patch)
2008-06-02 18:30 UTC, Yevgen Muntyan
none Details | Review
patch (6.46 KB, patch)
2008-06-02 19:02 UTC, Yevgen Muntyan
accepted-commit_now Details | Review

Description Yevgen Muntyan 2008-05-04 16:29:01 UTC
g_utf8_collate is broken on Mac, apparently because wcscoll() and friends are broken. Attached patch makes glib use carbon API for collation and adds a new structure, GCollationKey, because I have no idea how to convert carbon collation key to a string (its format is not as described in UTS 10). Also attached is a patch for Gtk to use new API in filechooser. As result, collation test in glib doesn't fail, and GtkFileChooser sorts files like Finder does.
Comment 1 Yevgen Muntyan 2008-05-04 16:31:53 UTC
Created attachment 110357 [details] [review]
Glib patch

This adds GCollationKey structure and new API to deal with it; but implementation simply casts char* to that and back on platforms other than Mac, and uses g_utf8_collate_key*. On Mac, it uses Carbon API, and the structure actually contains Carbon collation key. g_utf8_collate_key and g_utf8_collate_key_for_filename are unchanged and broken on Mac.
Comment 2 Yevgen Muntyan 2008-05-04 16:32:21 UTC
Created attachment 110358 [details] [review]
Gtk patch
Comment 3 Behdad Esfahbod 2008-05-04 22:08:26 UTC
How is wcscoll() and friends broken?  Needless to say, I'm not a huge fan of this...
Comment 4 Yevgen Muntyan 2008-05-04 22:23:01 UTC
Sorry, it seems I forgot about the main thing. So, wcscoll() on Mac (10.4.11 on PPC) does not do what it should, it simply does what wcscmp() does. I tried it with en_US.UTF-8, en_US.ISO8859-1, en_US.ISO8859-15 (all listed in 'locale -a' output), it does same thing as strcmp(). And it's not only wcscoll, strcoll() does same thing, i.e. collation is broken, not specifically wide char API. GtkFileChooser sorts files as in "C" locale - ChangeLog before autogen.sh.

Here's the test program I tried (I tried two, this and the same one using wide char API):

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>

static int
cmp_strings (const void *p1, const void *p2)
{
    const char * const *ps1 = p1;
    const char * const *ps2 = p2;
    return strcoll (*ps1, *ps2);
}

int main (void)
{
    unsigned i;
    char *array[] = {"a", "ab", "B", "BA", "b"};
    setlocale (LC_ALL, "");
    qsort (array, 5, sizeof (char*), cmp_strings);
    for (i = 0; i < 5; ++i)
        puts (array[i]);
    return 0;
}

Result is: B, BA, a, ab, b.
Comment 5 Behdad Esfahbod 2008-05-05 00:40:13 UTC
So, basically that their libc collation rules are bogus.
Comment 6 Daniel Macks 2008-05-05 03:35:26 UTC
Bug #333977 is the genesis here. Would be good to put this issue to bed once and for all. Either OS X has a different sort-order in its libSystem and gnome should obey it (so everything on Mac is self-consistent) which means changing the test expected result, or else glib should sort the same on all platforms and therefore need to implement something itself (or rely on some lib) instead of using some system function here.
Comment 7 Yevgen Muntyan 2008-06-02 18:30:54 UTC
Created attachment 111986 [details] [review]
patch

New patch, no new structures, instead convert Carbon collation key to a string as Behdad suggested. The result is not 100% equivalent to sorting with Carbon API, but it seems to be good enough: I sorted text from planet.gnome.org and Chinese and Russian wikipedias, and the difference was that Carbon compared some strings as equal while this new glib routine didn't; that happened only in cases like the following (first number is Carbon comparison, second is Glib's):

'‘PackageKit’.' and 'PackageKit': 0 vs -1
'LinuxTag' and 'LinuxTag!': 0 vs -1
''/org/bansheeproject/Banshee/PlayerEngine')' and '"org.bansheeproject.Banshee.PlayerEngine")': 0 vs -1
'"/org/bansheeproject/Banshee/PlayerEngine")' and ''/org/bansheeproject/Banshee/PlayerEngine')': 0 vs 1
'¾' and 'ответственности': 0 vs -1
'¾' and 'Порт-Вила': 0 vs -1
'Малвату-Маури,' and 'Малвату-Маури': 0 vs 1
'¾' and '¾': 0 vs 1

Note the last one. I guess that's because one of them is decomposed or something. Behdad, is it good enough? (It passes glib test, if that's worth anything)
Comment 8 Behdad Esfahbod 2008-06-02 18:35:34 UTC
The output is definitely good enough.  Reviewing the patch now.

To make it match Carbon I guess you just need to stop upon seeing a '\001' byte.  At least that's what the docs said.
Comment 9 Behdad Esfahbod 2008-06-02 18:44:05 UTC
So, I think you don't even need the utf8 encoding.  Just exchange 0 and 1 bytes and stop at the first (new!) nul byte!
Comment 10 Yevgen Muntyan 2008-06-02 18:57:41 UTC
(The patch is broken, it fails to zero-terminate the result)
So, I tested string comparison instead of comparing keys, and the results are identical: namely, g_utf8_collate (which uses UCCompareTextDefault) produces same results as strcmp(g_utf8_collate_key()).   Either UCCompareTextDefault doesn't use their collation key comparison routine or something else is wrong. 
Anyway, g_utf8_collate and g_utf8_collate_key from the patch agree, and filechooser seems to show files in right order, regardless why it happens. Also, sorting breaks if you stop at \001. Perhaps it doesn't break if you stop at the int16 or int32 (whatever is used there) with value of 1, no idea.
Comment 11 Yevgen Muntyan 2008-06-02 19:02:00 UTC
Created attachment 111990 [details] [review]
patch

Working patch (identical to previous one, except it terminates the result with zero byte).
Comment 12 Behdad Esfahbod 2008-06-02 19:16:19 UTC
The Apple docs say a 1 separate significant and insignificant fields.  So most probably UCCompareTextDefault() is stopping upon seeing a 1, but for sorting you really need to compare all fields.  So yeah, either your patch, or just replace 0 with 1.  Your's safer, so lets go with it.
Comment 13 Yevgen Muntyan 2008-06-02 19:38:14 UTC
2008-06-02  Yevgen Muntyan  <muntyan@tamu.edu>

	Bug 531403 – g_utf8_collate broken on Mac.

	* glib/gunicollate.c:  (g_utf8_collate): use UCCompareTextDefault;
        (collate_key_to_string), (carbon_collate_key_with_collator),
        (carbon_collate_key), (carbon_collate_key_for_filename): new
        functions using Carbon API to get collate key for g_utf8_collate_key()
        and g_utf8_collate_key_for_filename();
        (g_utf8_collate_key), (g_utf8_collate_key_for_filename): use those.

Comment 14 margali 2008-06-17 18:10:20 UTC
*** Bug 538088 has been marked as a duplicate of this bug. ***