GNOME Bugzilla – Bug 542055
Sort order is not localized
Last modified: 2010-01-24 12:47:13 UTC
[ From http://bugs.debian.org/489207 by Ulrik Sverdrup ] "The sort order in browser or normal sorted list mode is not localized, or at least does not respect my locale. What happens: Artist Östblocket sorts under O What is expected: Östblocket should sort after Z: (XYZÅÄÖ) If my memory serves correctly this bug was introduced in a later version of rhythmbox, and even 0.10 might behave correctly." The locale used by the bug reporter is sv_SE.UTF-8. I have confirmed that this is a regression from earlier versions, at least from 0.9.6.
Is it known why this happens? It is a regression. That the sort order is not localized is not accurate (I suspect), since V and W are collated as they should be (they sort Wa, Ve, Wi,.. dvs V <=> W). However, Ö is sorted in the wrong place. Which is strange. Nautilus (for example) sorts correctly in my locale.
I'm researching this. Here are some hints I've found, I'm reading the source around artist sortname and so rhythmbox-client --print-playing-format "%aa" Östblocket rhythmbox-client --print-playing-format "%aA" ostblocket rhythmbox-client --print-playing-format "%as" rhythmbox-client --print-playing-format "%aS" The correct lowercase is "östblocket" I followed the code around and I highly suspect I found the culprit in lib/rb-util.c gchar* rb_search_fold (const char *original) This function is used somewhere in the sort path and it removes too much to be able to sort the result correctly by the locale (hence VW sort correctly, see above, but Ö (probably Å and Ä neither) doesn't! So what is the right way, change this rb_search_fold, or take it out of the locale sort path? I notice as a consequence of this, a search for "ostblocket" finds the songs by "Östblocket". Might sound convenient to you, but very surprising in my locale (O is not Ö, and Ö is not a decorated O, it's an Ö)
Can anyone with more rhythmbox code experience point to where and through which path of the code does rhythmbox get its strings to sort? We clearly need to take rb_search_fold out of that, since it removes all combining marks and so thereby losing lots of information. (I can't imagine it sorting Ä/Ö right in German either that way). Thank you for any help.
Might be related to the fixes that were done in bug #421253. I'm not sure how the sorting of rhythmbox treeviews work :-/
Created attachment 141893 [details] [review] Simplest possible possible patch This is the simplest patch that might solve this problem. I can't manage to build rhythmbox tonight, so it is not even compile tested or tested. I appreciate working with rhythmbox git but the normal big stumbling block when trying to contribute to a project is always to manage to build it; it is also a comparatively big package for my puny computer. ---------------- Subject: [PATCH] rhythmdb/rb-refstring.c: Use g_utf8_casefold in rb_refstring_get_sort_key To extract a source key we want to use the proper g_utf8_casefold instead of rb_refstring_get_folded which uses rb_search_fold which strips many bytes from the string, including combining marks. We need to maintain the original string in a shape where the localized sort algorithm can still sort the names faithfully according to the locale, which was not possible when stripping out combining marks (and other code points).
(In reply to comment #4) > Might be related to the fixes that were done in bug #421253. I'm not sure how > the sorting of rhythmbox treeviews work :-/ It was probably broken by the reimplementation of rb_search_fold, it has a comment that the old behavior was equivalent to g_utf8_casefold which most probably did *not* strip out vital bytes from the string.
Created attachment 152130 [details] [review] better patch I don't think we can revert to the old case folding behavior, since that affects searching as well as sorting. Instead, this patch generates the sort key using g_utf8_casefold rather than the custom-folded version of the string. I don't have sufficient test data to check that this works properly, and I wouldn't really know what I was looking for if I did. Hopefully someone affected by the problem can test it.
Thank you! I compiled rhythmbox from the debian archive, version 0.12.6-2, with this patch applied. I'm very happy with the result, it works fine. (FYI the only debian patch in that package is called 01_dlna_vorbis.patch and modifies some plugin.)
Thanks for testing, and sorry for taking so long to look into this. I've committed the patch, and I guess we'll see what breaks later on.. commit fecd44feb8b228509f1ece1e442eb42e47817d22 Author: Jonathan Matthew <jonathan@d14n.org> Date: Sun Jan 24 22:15:37 2010 +1000 rhythmdb: fix sort order for composed characters (bug #542055) Previously, we created the sort key based on the folded version of the string. Our custom folding function removes all combining characters, which in some locales are important for sorting, so the resulting strings didn't sort correctly. Now we create the sort key using g_utf8_casefold rather than the custom folding function.