After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 542055 - Sort order is not localized
Sort order is not localized
Status: RESOLVED FIXED
Product: rhythmbox
Classification: Other
Component: general
0.11.x
Other Linux
: Normal normal
: ---
Assigned To: RhythmBox Maintainers
RhythmBox Maintainers
Depends on:
Blocks:
 
 
Reported: 2008-07-08 15:21 UTC by Sven Arvidsson
Modified: 2010-01-24 12:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Simplest possible possible patch (1.27 KB, patch)
2009-08-27 21:35 UTC, ulrik sverdrup
none Details | Review
better patch (596 bytes, patch)
2010-01-24 10:25 UTC, Jonathan Matthew
committed Details | Review

Description Sven Arvidsson 2008-07-08 15:21:15 UTC
[ From http://bugs.debian.org/489207 by Ulrik Sverdrup ]

"The sort order in browser or normal sorted list mode is not localized,
or at least does not respect my locale.

What happens:
Artist Östblocket sorts under O
What is expected:
Östblocket should sort after Z: (XYZÅÄÖ)

If my memory serves correctly this bug was introduced in a later version
of rhythmbox, and even 0.10 might behave correctly."


The locale used by the bug reporter is sv_SE.UTF-8. I have confirmed that this is a regression from earlier versions, at least from 0.9.6.
Comment 1 ulrik sverdrup 2009-08-12 00:12:01 UTC
Is it known why this happens? It is a regression. That the sort order is not localized is not accurate (I suspect), since V and W are collated as they should be (they sort Wa, Ve, Wi,.. dvs V <=> W). However, Ö is sorted in the wrong place. Which is strange. Nautilus (for example) sorts correctly in my locale.
Comment 2 ulrik sverdrup 2009-08-18 15:39:11 UTC
I'm researching this. Here are some hints I've found, I'm reading the source around artist sortname and so

rhythmbox-client --print-playing-format "%aa"
Östblocket
rhythmbox-client --print-playing-format "%aA"
ostblocket
rhythmbox-client --print-playing-format "%as"

rhythmbox-client --print-playing-format "%aS"


The correct lowercase is "östblocket"

I followed the code around and I highly suspect I found the culprit in

lib/rb-util.c

gchar*
rb_search_fold (const char *original)


This function is used somewhere in the sort path and it removes too much to be able to sort the result correctly by the locale (hence VW sort correctly, see above, but Ö (probably Å and Ä neither) doesn't!

So what is the right way, change this rb_search_fold, or take it out of the locale sort path? 

I notice as a consequence of this, a search for "ostblocket" finds the songs by "Östblocket". Might sound convenient to you, but very surprising in my locale (O is not Ö, and Ö is not a decorated O, it's an Ö)
Comment 3 ulrik sverdrup 2009-08-27 20:42:09 UTC
Can anyone with more rhythmbox code experience point to where and through which path of the code does rhythmbox get its strings to sort? We clearly need to take rb_search_fold out of that, since it removes all combining marks and so thereby losing lots of information. (I can't imagine it sorting Ä/Ö right in German either that way).

Thank you for any help.
Comment 4 Christophe Fergeau 2009-08-27 20:58:50 UTC
Might be related to the fixes that were done in bug #421253. I'm not sure how the sorting of rhythmbox treeviews work :-/
Comment 5 ulrik sverdrup 2009-08-27 21:35:40 UTC
Created attachment 141893 [details] [review]
Simplest possible possible patch

This is the simplest patch that might solve this problem. I can't manage to build rhythmbox tonight, so it is not even compile tested or tested.

I appreciate working with rhythmbox git but the normal big stumbling block when trying to contribute to a project is always to manage to build it; it is also a comparatively big package for my puny computer.

----------------

Subject: [PATCH] rhythmdb/rb-refstring.c: Use g_utf8_casefold in rb_refstring_get_sort_key

To extract a source key we want to use the proper g_utf8_casefold
instead of rb_refstring_get_folded which uses rb_search_fold which
strips many bytes from the string, including combining marks.

We need to maintain the original string in a shape where the localized
sort algorithm can still sort the names faithfully according to the
locale, which was not possible when stripping out combining marks (and
other code points).
Comment 6 ulrik sverdrup 2009-08-27 21:36:46 UTC
(In reply to comment #4)
> Might be related to the fixes that were done in bug #421253. I'm not sure how
> the sorting of rhythmbox treeviews work :-/

It was probably broken by the reimplementation of rb_search_fold, it has a comment that the old behavior was equivalent to g_utf8_casefold which most probably did *not* strip out vital bytes from the string.
Comment 7 Jonathan Matthew 2010-01-24 10:25:02 UTC
Created attachment 152130 [details] [review]
better patch

I don't think we can revert to the old case folding behavior, since that affects searching as well as sorting.  Instead, this patch generates the sort key using g_utf8_casefold rather than the custom-folded version of the string.

I don't have sufficient test data to check that this works properly, and I wouldn't really know what I was looking for if I did.  Hopefully someone affected by the problem can test it.
Comment 8 ulrik sverdrup 2010-01-24 11:51:10 UTC
Thank you!

I compiled rhythmbox from the debian archive, version 0.12.6-2, with this patch applied. I'm very happy with the result, it works fine.

(FYI the only debian patch in that package is called 01_dlna_vorbis.patch and modifies some plugin.)
Comment 9 Jonathan Matthew 2010-01-24 12:46:49 UTC
Thanks for testing, and sorry for taking so long to look into this.

I've committed the patch, and I guess we'll see what breaks later on..

commit fecd44feb8b228509f1ece1e442eb42e47817d22
Author: Jonathan Matthew <jonathan@d14n.org>
Date:   Sun Jan 24 22:15:37 2010 +1000

    rhythmdb: fix sort order for composed characters (bug #542055)
    
    Previously, we created the sort key based on the folded version of the
    string.   Our custom folding function removes all combining characters,
    which in some locales are important for sorting, so the resulting
    strings didn't sort correctly.  Now we create the sort key using
    g_utf8_casefold rather than the custom folding function.