GNOME Bugzilla – Bug 768074
Japanese Kana are ignored
Last modified: 2021-06-18 15:50:38 UTC
When sorting files/folders the algorithm appears to disregard Japanese characters and sort kana-only entries arbitrarily. e.g. this is the sorted result in Nautilus きた にし きち にさ にし 1 にさ 2 きた 3 きち 5 みなみ ひがし みなみ 4 ひがし 6 So there is a mixture of sorting by length then using the digits, everything else rather arbitrary. A more correct order would be either of these schemes: Apache: きた 3 きた きち 5 きち にさ 2 にさ にし 1 にし ひがし 6 ひがし みなみ 4 みなみ JavaScript, C#, Python array/list sort: きた きた 3 きち きち 5 にさ にさ 2 にし にし 1 ひがし ひがし 6 みなみ みなみ 4 (I saw another report where setting locales where mentioned but that does not make this any less of a bug; even sorting by Unicode code point would be more reasonable than this. Most sort algorithms also seem to do just fine by default. It cannot be expected that users switch their locale and restart their browsers/system just to work with a different set of files. And if the files are mixed there is no way to reasonably deal with the situation at all.) While setting up a minimal sorting example i ran into another issue: The create new folder dialog falsely claims there is a conflict between names that consist only of kana, maybe that has the same root cause. I tried to create the folders: みなみ きた ひがし The last one blocked.
(In reply to Harald Brunner from comment #0) > While setting up a minimal sorting example i ran into another issue: The > create new folder dialog falsely claims there is a conflict between names > that consist only of kana, maybe that has the same root cause. > > I tried to create the folders: > > みなみ > きた > ひがし > > The last one blocked. This one I figured out. When creating directories or renaming directories or files, the code calls nautilus_directory_get_file_by_name() to look for duplicates. This, of course, does not work, as g_utf8_collate() is the deciding function (which is also what causes the sorting issue, due to it being locale-dependent). I have fixed it (read: used HAX) locally by adding an additional check in case an existing file has been found. Will attach a demo patch in a bit (so whomever is concerned in the morning can take a look). Funnily enough, I fixed a different bug in the same code not too long ago.
Nope, my quick hack does not fix it fully. Message to Carlos: would it not be better for nautilus_file_compare_display_name() to use g_strcmp0() (or friends) instead of g_utf8_collate() (/and/ friends)? I see it’s only used by nautilus_directory_get_file_by_name().
(In reply to Ernestas Kulik from comment #2) > Nope, my quick hack does not fix it fully. > > Message to Carlos: would it not be better for > nautilus_file_compare_display_name() to use g_strcmp0() (or friends) instead > of g_utf8_collate() (/and/ friends)? I see it’s only used by > nautilus_directory_get_file_by_name(). No, collate actually does some smart sorting. For instance "file1" "file10" "file5" are ordered correctly. Also the dot in the extensions are considered special so that the order is not directly the alphabetical order. Also performance. What's need here is glib to support better Asian languages, and there are few reports about it already.
(In reply to Carlos Soriano from comment #3) > (In reply to Ernestas Kulik from comment #2) > > Nope, my quick hack does not fix it fully. > > > > Message to Carlos: would it not be better for > > nautilus_file_compare_display_name() to use g_strcmp0() (or friends) instead > > of g_utf8_collate() (/and/ friends)? I see it’s only used by > > nautilus_directory_get_file_by_name(). > > No, collate actually does some smart sorting. For instance "file1" "file10" > "file5" are ordered correctly. Also the dot in the extensions are considered > special so that the order is not directly the alphabetical order. > > Also performance. > > What's need here is glib to support better Asian languages, and there are > few reports about it already. But there is no need to do any kind of sorting if we’re only looking for an exact match (talking about the renaming issue specifically).
Created attachment 330435 [details] [review] file: repurpose compare_display_name() nautilus_file_compare_display_name() is only used by nautilus_directory_get_file_by_name() nowadays and it was written with sorting in mind. As g_utf8_collate() and its locale dependence does not work well with finding matching files by name, it makes sense to replace the call to g_strcmp0(). That, however, makes the function less suitable for sorting. This commit changes its purpose as described.
Unless I am wrong in that there are extensions, depending on it. Realized that a bit too late.
(In reply to Ernestas Kulik from comment #6) > Unless I am wrong in that there are extensions, depending on it. Realized > that a bit too late. Nevermind, the API doesn’t call it.
Review of attachment 330435 [details] [review]: I think this is fine, thanks!
Comment on attachment 330435 [details] [review] file: repurpose compare_display_name() Attachment 330435 [details] pushed as fc25f7e - file: repurpose compare_display_name()
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version of Files (nautilus), then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/nautilus/-/issues/ Thank you for your understanding and your help.