Bug 699340 – sort filenames case-insensitively

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 699340 - sort filenames case-insensitively


Summary:	sort filenames case-insensitively


Status:	RESOLVED WONTFIX

Product:	meld
Classification:	Other
Component:	general
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	meld-maint
QA Contact:	meld-maint

URL:
Whiteboard:

Depends on:	699254
Blocks:

Reported:	2013-04-30 16:22 UTC by Adam Dingle
Modified:	2015-01-12 23:35 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Adam Dingle 2013-04-30 16:22:09 UTC

In the directory comparison and source control views, Meld lists filenames in alphabetical case-sensitive order: [A-Z] all come before [a-z].  Instead, it should order them case-insensitively for consistency with Nautilus.

Comment 1 Kai Willadsen 2013-05-01 20:40:57 UTC

I went looking for what Nautilus does, and found the following order for these files:

Makefile
meld.1
meld.2
meld1
Meld.2
Meld1

...which isn't any sane order that I can recognise. Maybe GTK has some utility functions for crazy GTK-specific sort.

Comment 2 Adam Dingle 2013-05-01 20:42:14 UTC

Hm.  I'm curious enough that I'm willing to go hunt down the code in Nautilus that determines this order.  Stay tuned.

Comment 3 Kai Willadsen 2013-05-01 20:52:09 UTC

This was testing on a Gnome 2 box by the way; I'll have a look at Gnome 3 later, just in case the sort order has changed.

Comment 4 Adam Dingle 2013-05-01 20:53:06 UTC

OK, it appears that Nautilus actually asks GLib for the sort order.  Specifically it calls g_file_info_get_sort_order () - this happens in update_info_internal() in nautilus-file.c.   I think it then breaks ties by doing some extra checks in compare_by_display_name(), also in nautilus-file.c.

So presumably we should do the same if we want the same sort order as Nautilus.  Not sure whether we need to do the tie-breaking.  If so, perhaps that code itself should really move into GLib.

Comment 5 Kai Willadsen 2013-06-03 19:53:16 UTC

I just had a look at this. The get_sort_order stuff defined in gio is just property access, and doesn't appear to do any actual work. The real stuff happens in nautilus_file_set_display_name(), and relies on g_utf8_collate_key_for_filename() which does some magic in order to get better file orderings.

Unfortunately, the collate_key stuff isn't bound in PyGTK 2. I'm assuming that we can get at this through GI, so assigning this as a GTK 3 thing.

Comment 6 andré 2014-07-04 19:24:14 UTC

A while back I saw a "case-insensitive" sort algorithm that roughly moved each character to a lower case byte, tagging with an additional byte at the end for each character.  Which up to doubled the length of the string being sorted.
(Exactly doubling if no non-ascii characters like é, assuming utf8 encoding.)

The tags at the end ensured an order something like eéèêëEÉÈÊË for words otherwise identical.
This encoding was only designed to work for character sets based on the latin alphabet.

Then there are arabic and related alphabets, which often combine with latin-based text.

For many other non-latin alphabets, probably sorting utf-16 would be a good alternative, since they generally don't use diacritical marks like accents.

Don't know how much this comment helps ...

Comment 7 Kai Willadsen 2015-01-11 00:49:53 UTC

I've just spent a while playing around with this, and I'm going to just close it as WONTFIX.

I think the nautilus ordering is fine, but in the context of a programmer's tool like Meld, I'm going to say that the traditional case-based ordering is going to be strongly preferred by most Meld users. The only thing I really feel like we're missing from the GLib-based sort is correct numeric ordering for foo.2 vs foo.10, but I can live with that.

It doesn't help that the GLib-defined ordering is extra weird in that only the initial sort is case insensitive; the secondary sort (i.e., e vs é vs E) puts capitalised letters *last*.

This isn't necessarily my final word on the matter, but I just don't feel like the alternative sort ordering is a good fit right now. In particular, if the current sort gets unicode-y ordering wrong then I'll definitely reconsider, but from my quick tests with latin alphabets it seemed fine to me.

Comment 8 Adam Dingle 2015-01-12 23:35:31 UTC

OK - I'm the one who filed this bug in the first place and I can live with Kai's decision.   Thanks for thinking about this at least!  :)