GNOME Bugzilla – Bug 355152
Improve sorting by name for digits and case-sensitiveness
Last modified: 2021-06-18 15:55:48 UTC
Please describe the problem: Hello all, I found an really strange behaviour in the latest nautilus file browser. The sorting of the files in the list view is not working! It's hard to explain, but I captured the bug with istanbul. I love this tool!! ;-) Kindly regards, Marcus Steps to reproduce: 1. Edit some filenames (at the end of the filename). 2. Watch how they are re-sorted! 3. Actual results: Expected results: Does this happen every time? Yes! Other information: I created an Bugzilla-Report in Mandrivas system, too. (#25447) However, this is severe enough to get asap to the Gnome developers!
Created attachment 72457 [details] Istanbul captured the nautilus sorting bug!
Replicated. For those not wanting to watch the video: name 4 files e.g. like this bug25551, bug25552, bug25553, bug26554 any sort by name should always sort the last item last because of the *26* bit. However, changing the closing 4 to a, i.e. mv bug26554 bug2655a causes it to be sorted _before_ the files starting with bug25*. Additional info: works on completely empty (text) files. Keeps working no matter how many letter or numbers are appended to the end of the filename.
*** Bug 398729 has been marked as a duplicate of this bug. ***
*** Bug 435505 has been marked as a duplicate of this bug. ***
Created attachment 142663 [details] The bug This bug is still present in nautilus 2.26.3. Here, I have three folders named 2009.0, 2009.05, and 2009.1, and 2009.05 is ordered after 2009.1.
Bug 605598 also seems to be a duplicate of this bug.
*** Bug 609991 has been marked as a duplicate of this bug. ***
*** Bug 605598 has been marked as a duplicate of this bug. ***
Hi, I'm seeing this bug on a fully updated (2010-04-27) Ubuntu 9.10 system, Nautilus 2.28.1. I see this bug is still "unconfirmed", but IME it should be confirmed instead.
*** Bug 328383 has been marked as a duplicate of this bug. ***
*** Bug 619612 has been marked as a duplicate of this bug. ***
*** Bug 316960 has been marked as a duplicate of this bug. ***
Oh for crying out loud, it has been around since 2006, we are putting all these duplicates against it, an it is _unconfirmed_? Give me a fricking break. 3aa 4aa 20a Sort... is just... SO WRONG... This is killing me, Lucid Lynx forced me from Gqview to Geeqie, which uses whatever Gnomefrastructure that has the bug, and I have zillions of files with md5 sums as their names. Guess what this bug does to MD5 sums as filenames? Hash! Guess I'm going to have to fix it myself... Of switch to freaking Lubuntu (pcman-fm) or Kubuntu (just shoot me if it comes to that... I guess something called "Konqueror"...) As mentioned, this bug breaks Geeqie, also...
Confirming.
*** Bug 621245 has been marked as a duplicate of this bug. ***
Like I wrot in Bug 458707 please do consider the following: I have now read tons and tons of pro and con of it. Just my 2 cents: Cheap, general solution: - Make an option available in Gconf to change between "ascii" "natural" (=the current with respect to numbers that are NOT prefixed with "0" ) and "dos/win-Style" ( prefixes with . in front of the rest, folders separated") Better solution: - offer a per-directory config option in addition. Reason: on my "linux" files, I can live perfectly happy with "natural" sort order though it "feels" sometimes "weird" and "buggy" - but I do have to mount a lot of stuff created under windumb - so "win-style" Order should be default for ntfs-partitions and be selectable for all those dirs from my co-workers that have all the "!" "__" "_" etc. prefixing to put important stuff, such as "_main.cpp" on top of a source dir. It would be so nice to find a solution that offers - thoug hidden in gconf - choice over the flame war going on here... Please change status back to "Enhancement"! (Actually, in "natural" is so buggy and broken, why not make "ascii" or "winstyle" the default, and the "natural" as "innovation" optional?
Actually, on thinking about it, a good filesystem should give a per-user and per-directory configurability how things should be sorted...
e.g. all this sorting that an OS (or tool) does is just overcoming the lack of this information on a lower level. More: when taring, zipping, 7zipping or whatever the whole tree sort order should be retained as they were inteded by the archive creator as well, so that when mailing archives and unpacking them elsewhere the order remains... same as with the timestamps: on win, the refer to "last save time" and are retained across filesystems, on linux they get the date when they were unpacked. Sad. Sad.
As mentioned above I'd also really like to see the current behaviour of skipping non-alphanumeric characters addressed. Current gnome scenario: foldera _importantfolder_ somefolder =specialfolder= yetanotherfolder Proposed example: _importantfolder_ =specialfolder= foldera somefolder yetanotherfolder I can understand that the alphanumeric sort order for files and folders may in certain cases need to be be different from their mac and windows counterparts but contend that for non-alphanumeric instances it primarily results in: a) An unnecessary adjustment to muscle memory when moving between file managers b) A most irritating inability for people to simply and effectively (albeit somewhat crudely) order by importance For those striving to familiarise themselves with desktop linux I believe this is a significant usability hurdle many struggle to overcome. Lastly I propose that any alterations to the ordering of non-alphanumeric characters as described would have only a marginal effect on present gnome users. These users are unlikely to place any emphasis on non-alphanumerical ordering since it is currently a tactic that they are unable to employ. Possible solutions for *non-alphanumeric* ordering: a) Research current mac, windows and even kde ordering, and if considered an appropriate fit incorporate into default alphanumeric ordering. b) Research windows/mac specifications. Implement ability to sort according to those standards either as an option under 'arrange icons' or via a windows/mac style sorting preference. Should any developer that wishes to assign themselves to this bug want assistance I would be happy to conduct a thorough analysis of available options to present to the community for review. I would be interested in helping map and resolve both alphanumeric and non-alphanumeric issues. If required, I would also be happy to approach the KDE developers regarding consistency should such action be helpful. Hopefully with such information to hand any developer effort required would be minimal.
Could we increase the priority of this bug? I'm dealing with directories with lots of files named with base 16 numbers. The trivial reproducing test where it sorts like this: 3aa.txt 4aa.txt 20a.txt is not really my problem. I'm dealing with trying to use directories with hundreds of md5 sum named files. It would almost be an improvement if it just used rand() to sort, then I wouldn't have false expectations of useful ordering...
Apparently there are several problems related to the sorting order on linux. Please see my detailed tries on http://ubuntuforums.org/showthread.php?t=1588316 .
*** Bug 635017 has been marked as a duplicate of this bug. ***
HOW CAN A BUG LIKE THIS REMAIN UNFIXED AFTER FOUR FRIGGIN YEARS ????????? BY THE WAY, IT IS NOT JUST SORTING WITH DIGITS !!!! Two files, aa.ext and bb.ext, when renamed to dd.ext and cc.ext, alos do not show up with the rearranged order. REPEAT, FOUR YEARS AND COUNTING. WTF ?????? GK
There was another occurance of this bug yesterday, I was really annoying: For my mp3 collection, I'm using a folder per album. The naming scheme is: "<artist> - <album title>" Yesterday, I was looking onto the hard disk and saw this: .. .. Madonna - Frozen Madonna - Greatest Hits Madonna Hiphop Massaker - Heavy Rotation Madonna - Like A Prayer ... This is just wrong. Any user intent is completely nuked by a strange sorting algorithm. Please, get this fixed. One of the main preaches in the linux world has always been: full flexiblity and power to the users. Here, Nautilus doesn't leave any chance. Maybe you make this configurable for the users. If so, you might instrument this switch to get user feedback. I'm not convienced that a lot of users love this behaviour at all. Cheers, Marcus
Guys, i can still reproduce this bug with Comment 24 (from 2010-11-19!!!) on current Nautilus 2.32.2.1 compiled on Mai 2011: Madonna - Greatest Hits Madonna Hiphop Massaker - Heavy Rotation Madonna - Like A Prayer ...created files still stay ordered like this! PLEASE IMPLEMENT A SIMPLE ORDERING-ALGORITHM AS DESCRIBED IN THE "Proposed example:" by the guy on Comment 19. I just want a simple renaming sheme how to keep files on top of a directory (like on Windows since 95 with "_" or other special symbols at the beginning of a filename...) Why is it not possible to implement something as default if many linux-users got frickled up locales, or at least give us a place where we can define ordering under nautilus itself? PLEASE: If it is a matter of money, just write it here, and i will contact you (probably one of the devs) directly: I would pay 50EUR if a change comes live which shows up files on topmost all files or other directories listed topmost under the current directory if they start with an underline ("_"), and all the other dirs/files get sorted after " ",0,1,...,9,a,ä..o,ö,...,s,ß,...,u,ü,...,z. That rule cant be so hard to repeat it for the longest filename or dirname if it is longer then the longest file in a current directory and sort all directories, before files, and before links! At least this solution is less awkward then the current situation.
(In reply to comment #25) > i can still reproduce this bug That is very likely as this bug report is not in RESOLVED FIXED state, but in NEW state. > PLEASE IMPLEMENT No need for capital letters, plus please avoid forum-style comments if possible. > If it is a matter of money, just write it here, and i will contact you > (probably one of the devs) directly: There are several platforms for open-source bounties available. Plus feel free to contact the nautilus mailing list to reach a bigger audience. Thanks!
*** Bug 654721 has been marked as a duplicate of this bug. ***
*** Bug 338154 has been marked as a duplicate of this bug. ***
*** Bug 681176 has been marked as a duplicate of this bug. ***
*** Bug 631376 has been marked as a duplicate of this bug. ***
*** Bug 638839 has been marked as a duplicate of this bug. ***
The appears to be due to the behavior of g_utf8_collate_key_for_filename. In the test case in comment #20 what seems to be happening is that it determines 3 < 4 < 20. Possible dup of bug 352237.
*** Bug 633406 has been marked as a duplicate of this bug. ***
I'm the submitter of bug 633406. I admit there is some similarity to the bug 355152, but it is in my opinion a different bug. From what I can see in the comments, 355152 is about problems caused by sorting using numeric ordering vs lexicographic ie 3, 4, 20 vs 20, 3, 4 and also about dropping punctuation symbols from sort keys. I'm not a fan of either of those things, but they do seem to be deliberate choices and some users would find the behaviors desireable. My bug I believe is not something I think the developers wanted or which any user would consider desireable. It is a definite flaw due to the implementation of the natural numeric sorting. So the nautilus developers might want to consider addressing this problem even if it is determined to accept the other parts of 355152 as desired behavior. On the other hand maybe it is preferred to keep all the sort problems together on one bug. A configuration setting allowing users to have case-insensitive, but otherwise lexicographic sorting would for me solve this problem since I would prefer that, but users who continued to use the natural numeric sorting would still experience this problem. Ther problem results from a bad interaction between the implementation of case-insensitive sorting and the natural numeric order sorting which causes case-insensitivity to fail when file names have letters differing only be case followed by differing numbers. For example, files sort like this y1, y3, Y2 Instead of: y1, Y2, y3. Some time ago I looked into the source code and I believe the behaviour also was in g_utf8_collate_key_for_filename. I'm going to describe this from memory so there may be some details not exactly right, but I'm pretty sure I recall the basic problem. The code nautilus is using that function to generate collation keys which provide case-insensitivity, ignore punctuation symbols, and also provides natural numeric ordering. It did this by splitting file names into separate chunks depending on whether they were numbers or letters. Punctuation characters were just dropped. A collation key was generated for each chunk, then those collation keys were concatenated together to create the final overall collation key returned by the function. The collation key parts for chunks containing letters were created by passing the chunk to a collation key generator from a lower level library. I don't recall exactly the function name. With LC_COLLATE=en_US.UTF-8 what I was seeing was that the lower level collation key function would return a string containing three fields. The first field was the passed string, all converted to a consistent case. Following this was a field representing the casing, like a 1 for upper or a 0 for lower, then a field representing diacriticals I think. So AbC would come back as something like ABC101000, but aBc would come back ABC010000. The extra fields for case and diacriticals would give you some consistency when sorting strings which only differ by case or diacriticals. For chunks of numbers the collation key segment was created using some kind of coding to make them sort in natural numeric order. (I don't think it was just simple zero padding, but I can't remember it exactly and for my example it doesn't really matter, so let's say it was zero padding to 3 digits for the sake of the example.) So what happens in my example is Y2 is split into two chunks, Y and 2. Chunk #1 generates collation key Y10, then Chunk #2 is padded to 002. They get appended together, to become Y10002. For y3, the string is split to y and 3, then to Y00 and 003, concatenated to Y00003. The sorting using these collation keys goes like this: y3 (Y00003) Y2 (Y10002) So y3 and Y2 end up in an unexpected order, which I don't think anyone wants. Basically it seems like g_utf8_collate_key_for_filename is not expecting the lower level function's returned collation key to include those extra fields for the case and diacriticals fields. I doubt the authors of that lower level function returning collation keys expected that their output would be concatenated with other stuff like this. Ideally you would want to split the returned collation key segments for letter chunks into the 3 separate fields and agregate the case and diacritical fields at the end of the overall collation key. So Y2 goes to Y and 2, then to Y10 and 002, then break Y10 into Y, 1, and 0, concatenate the Y, then the 002 then the 1 and the 0, resulting in Y00210. Then the sort order would look like this Y2 (Y00210) y3 (Y00300) With multiple letter chunks you would place all the case fields, followed by all the diacriticals fields at the end, so Y1A would be Y001A1100. I'm not sure however if that lower level function always returns three fields for all possible values of LC_COLLATE so it might be complicated trying to deal with that.
*** Bug 700950 has been marked as a duplicate of this bug. ***
This bug (or design decision) renders the file viewer useless for me too and I have to use "ls" instead. I have a directory of photos of stock which have varying length part numbers. I want to find a part no. starting say 4677 but I don't know how long it is, so where do I find it? PLEASE give us back the option to sort as we've been able to since the 1980s.
I'm just gonna throw my two cents in here. As someone who has used computers for over 20 years, this is the first time I've ever had trouble with sorting. Denounce Windows all you want, but the name sorting in Explorer is simply how everyone on earth expect them to be. symbols before numbers before letters. Windows added a neat feature back in Vista I believe where numbers that were delimited by spaces were treated as whole numbers which solved the zero-padding issue of days gone by. _00 _AA _aa _BB _bb 000 0AA 0aa 0BB 0bb 111 AAA aaa BBB bbb ccc 0 ccc 00 ccc 000 ccc 1 ccc 10 ccc 20 ccc 100
Ref comment 37. That might be nice for small numbers like that. However, I have directories of photos of stock part numbers. These numbers vary in length. With this new "windowsy" sorting, it is next to impossible to find anything as I have to work out the _value_ of the part code instead of looking for it a digit at a time. e.g. 71772198.jpg would come before 123456789.jpg
Yes, I can see why that could be annoying in some cases. If you are working with a whole bunch of photos, might it be worthwhile to look into using a photo application where you can tag photos with metadata so you can organize them away from the filesystem?
That seems like a sledgehammer to crack a nut. Leave file sorting alone and there's no need. I access the files by "Add photos" in Ebay etc. which uses a regular file browser window, no idea how any metadata would help there?
I suppose international organizations have specified standards for sorting (including sorting for specific locales). Best thing would be that the (major) softwares in the linux ecosystem implement these standards. This does not forbid some developer of some specific software to (try to) be more “creative”, but in any case standard sorting should be available as an option (and should even be the default, IMHO). Just my two cents.
3aa 4aa 20a Come on. At a _minimum_ on a *nix system I should be able to make it match "ls". ls, of course, gives 20a 3aa 4aa It continues to be a problem to be stuck with 3aa 4aa 20a
@42 - agreed.
Created attachment 351637 [details] Observing sorting behavior under Linux Tests showing under two locales how three softwares sort a set of files according to their name.
Created attachment 351639 [details] Observing sorting behavior under Linux Tests showing under two locales how three softwares sort a set of files according to their name.
I attached my tests, as detailed in #21. (These are old tests, that I did not try to reproduce recently, but I suspect nothing has changed.) Note that even ls does not seem to exhibit correct sorting behavior (for French locale at least).
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version of Files (nautilus), then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/nautilus/-/issues/ Thank you for your understanding and your help.