After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 499650 - Add artist/album/title sort options for "the", firstname, and lastname order
Add artist/album/title sort options for "the", firstname, and lastname order
Status: RESOLVED FIXED
Product: banshee
Classification: Other
Component: Metadata
git master
Other All
: Normal enhancement
: 1.x
Assigned To: Banshee Maintainers
Banshee Maintainers
: 527662 532899 552014 555988 561380 567437 583381 593592 (view as bug list)
Depends on: 567657
Blocks: 528498
 
 
Reported: 2007-11-26 01:52 UTC by Will Farrington
Modified: 2011-04-16 20:31 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Proposed patch (25.61 KB, patch)
2008-05-16 21:44 UTC, Andrés G. Aragoneses (IRC: knocte)
none Details | Review
improved patch (25.64 KB, text/plain)
2008-05-17 12:24 UTC, Andrés G. Aragoneses (IRC: knocte)
  Details
Patch that applies to current trunk (25.60 KB, patch)
2008-05-18 13:50 UTC, Andrés G. Aragoneses (IRC: knocte)
rejected Details | Review
When sorting by artist or album artist, use sort names if available (15.88 KB, patch)
2008-11-15 05:28 UTC, John Millikin
needs-work Details | Review
Add sort information to TagLib.Tag (4.20 KB, patch)
2008-11-15 05:28 UTC, John Millikin
none Details | Review
When sorting by artist or album artist, use sort names if available (v2) (20.58 KB, patch)
2009-01-13 22:55 UTC, John Millikin
needs-work Details | Review
When sorting by artist or album artist, use sort names if available (v3) (31.14 KB, patch)
2009-01-16 06:23 UTC, John Millikin
needs-work Details | Review
When sorting by artist or album artist, use sort names if available (v4) (30.95 KB, patch)
2009-01-16 22:56 UTC, John Millikin
needs-work Details | Review
When sorting by artist or album artist, use sort names if available (v5) (30.62 KB, patch)
2009-01-31 21:32 UTC, John Millikin
none Details | Review
When sorting by artist or album artist, use sort names if available (v6) (380.58 KB, patch)
2009-02-04 01:32 UTC, John Millikin
none Details | Review
When sorting by artist or album artist, use sort names if available (v7) (32.26 KB, patch)
2009-02-04 04:40 UTC, John Millikin
needs-work Details | Review
Only extract sortname metadata, no modified database sorting (4.82 KB, patch)
2009-02-04 20:05 UTC, John Millikin
none Details | Review
Only extract sortname metadata, no modified database sorting (24.39 KB, patch)
2009-02-04 20:05 UTC, John Millikin
committed Details | Review
When sorting by artist or album artist, use sort names if available (v8) (18.40 KB, patch)
2009-02-11 02:50 UTC, John Millikin
none Details | Review
When sorting by artist or album artist, use sort names if available (v9) (18.37 KB, patch)
2009-02-11 03:08 UTC, John Millikin
needs-work Details | Review
When sorting by artist or album artist, use sort names if available (v10) (18.15 KB, patch)
2009-02-11 20:10 UTC, John Millikin
needs-work Details | Review
When sorting by artist or album artist, use sort names if available (v11) (39.56 KB, patch)
2009-02-12 04:41 UTC, John Millikin
none Details | Review
Updated patch (41.92 KB, patch)
2009-02-13 04:48 UTC, Gabriel Burt
none Details | Review
When sorting by artist or album artist, use sort names if available (v12) (48.42 KB, patch)
2009-02-25 22:19 UTC, John Millikin
none Details | Review
When sorting by artist or album artist, use sort names if available (v13) (48.49 KB, patch)
2009-02-26 19:34 UTC, John Millikin
committed Details | Review

Description Will Farrington 2007-11-26 01:52:25 UTC
Please describe the problem:
Currently, Banshee and a slew of other applications that transfer music to iPods do not ignore "The"s in the Artist and Album fields; however, this is contrary to the behavior used in iTunes. However, simply making the iTunes-like behavior default isn't feasible either, as many users prefer not to ignore the "The"s. Banshee should offer an option that would allow users to simply tick a box in iPod properties that would write to the database ignoring "The"s (disabled by default).

NOTE: Some applications, like gtkpod, attempt the above behavior, however, they aren't handling everything correctly, as quickscroll treats any artist with an opening "The" as a 'T', even though it is listed in the correct spot.

Steps to reproduce:
Transfer some songs to your iPod.

Actual results:


Expected results:


Does this happen every time?


Other information:
This may be something that needs fixed in ipod-sharp (actually, the support would definitely need added there).
Comment 1 Michael Martin-Smucker 2008-03-05 18:32:22 UTC
Would this option have to be only for iPod use?  When browsing my Banshee library, I would appreciate having the option to place "The Decemberists" with the letter "D" rather than "T".
Comment 2 Andrew Conkling 2008-03-05 23:17:35 UTC
(In reply to comment #1)
> Would this option have to be only for iPod use?  When browsing my Banshee
> library, I would appreciate having the option to place "The Decemberists" with
> the letter "D" rather than "T".

This bug is requesting to sort the tracks when being transferred to an iPod. A natural sort order across the interface would certainly be a good idea, but a separate one.

Will, have you been able to test this in a specific version? Namely the stable branch of SVN?
Comment 3 Will Farrington 2008-03-05 23:45:09 UTC
I've spoken to snorp about this, and he plans to make it work on the ipod-sharp side of things, but it is not currently done, no.
Comment 4 Andrew Conkling 2008-03-06 00:09:27 UTC
Ok, I'll take a guess at 0.13.x. Let me know if you've tested against a different version.
Comment 5 Gabriel Burt 2008-03-29 00:03:07 UTC
This could be implemented very efficiently using sqlite, one query per field to change, so three total: title, artist, album.

Should probably not just do this for English's "the" though, but also for common translations and the translation in the user's current language.

Setting to version: trunk b/c this isn't going to happen for stable.
Comment 6 Frank Murphy 2008-04-15 09:09:10 UTC
Instead of hard-coding "The" and "Les" and "Die" in the application, Banshee should respect the "Album Sort Order", "Artist Sort Order", and "Title Sort Order" fields. So if a track has and ID3v2.4 tag and has a TSOP field, sort on that, but show TPE1. ID3v2.3 would use XSOP, Vorbis and APE would use ARTISTSORT, AAC would use 'soar' and windows media 'WM/ArtistSortOrder'.

Then files that are tagged with this would sort without the "The".

(See http://musicbrainz.org/doc/PicardQt/TagMapping for details.)
Comment 7 Gabriel Burt 2008-05-15 06:12:50 UTC
*** Bug 532899 has been marked as a duplicate of this bug. ***
Comment 8 Brad Jensen 2008-05-15 06:18:42 UTC
My bug report is considered a duplicate of this ticket, but my bug is about the Banshee applications sorting and has nothing to do with the iPod.

I hope this bug gets treated not only about the iPod problem but the whole Banshee sorting problem.

Thank you.
Comment 9 Andrés G. Aragoneses (IRC: knocte) 2008-05-15 16:44:33 UTC
Maybe the fix to this is related to the fix to bug 528493 (go there for details). I'll see if I have time to cook something.
Comment 10 Gabriel Burt 2008-05-15 16:47:11 UTC
*** Bug 527662 has been marked as a duplicate of this bug. ***
Comment 11 Gabriel Burt 2008-05-15 16:49:23 UTC
From bug #527662:

Currently, Bob Dylan is filed under 'B' instead of 'D'. Would be nice to add a
sorting hint in either the meta data or the artist folder. For instance Bob
_Dylan in the meta tag sorts Dylan, or the naming convention of the artist
folder ("Dylan, Bob") can give Banshee an idea on how to sort it.

Comment 12 Andrés G. Aragoneses (IRC: knocte) 2008-05-16 21:44:56 UTC
Created attachment 111018 [details] [review]
Proposed patch

This patch fixes bug#528493 and the part of this one referred to the article "the". I'm not sure about some things so maybe this is unfinished. Don't hesitate to look at the FIXME's I added and propose modifications or discuss them.
Thanks!
Comment 13 Andrés G. Aragoneses (IRC: knocte) 2008-05-17 12:24:49 UTC
Created attachment 111041 [details]
improved patch

I have removed the reference to Banshee.Collections from Banshee.Query, it was not needed indeed.
Comment 14 Andrés G. Aragoneses (IRC: knocte) 2008-05-18 13:50:37 UTC
Created attachment 111096 [details] [review]
Patch that applies to current trunk

There were recently some commits that conflicted with the patch, so I updated the patch. Can somebody review please?
Comment 15 Shaun McCance 2008-06-16 15:32:06 UTC
This bug claims to only be about iPod stuff, but bug #532899 was marked a duplicate, so I have to assume it's the right place to discuss sorting in Banshee in general.

It would be really nice to be able to set sort names manually.  This could be an additional field in the properties dialog.  Banshee could be smart and automatically fill in sort names for names starting with "the" or "a", but it should still allow the user to override the sort name.

Use cases:

1) Some people like to sort by last name.  That's going to be pretty much impossible to do automatically.  Heck, even human beings can't agree on whether "Jethro Tull" should be sorted as "Tull, Jethro".  (It's not the name of any actual human being.)

2) I'm not one of those people who likes to sort by last name.  So I really want "Les Paul" to sort as "Les Paul".  But I want "Les Nubians" to sort as "Nubians, Les", because in this case, "les" is the French word for "the".  People with very eclectic music collections are likely to encounter a lot of foreign words for "the".
Comment 16 Michael Monreal 2008-06-16 15:37:40 UTC
The only problem I see regarding Shaun's proposal is that you would have to set the override name more or less manually for all songs of that artist. Perhaps a "rename" or "set display name" dialog, invoked from the artist list, would help and do it for all songs automatically?
Comment 17 Bertrand Lorentz 2008-09-12 21:00:43 UTC
*** Bug 552014 has been marked as a duplicate of this bug. ***
Comment 18 Bertrand Lorentz 2008-10-12 10:12:59 UTC
*** Bug 555988 has been marked as a duplicate of this bug. ***
Comment 19 Bertrand Lorentz 2008-10-12 10:17:23 UTC
The patch doesn't apply anymore, and seems to have some overlap with the one in bug #528493.
Comment 20 Shaun McCance 2008-11-14 16:39:12 UTC
Michael, it's really not terribly difficult to set the same piece of metadata for all songs by an artist, given the nice "sync" buttons in Banshee's track metadata dialogs.

You could imagine doing it from the artist list, where right-clicking on an artist would give you the option to set a sort name.  And that does make a certain amount of sense.  But right-clicking the artist currently does nothing in Banshee, so I worry that wouldn't be very discoverable.
Comment 21 John Millikin 2008-11-15 05:28:01 UTC
Created attachment 122712 [details] [review]
When sorting by artist or album artist, use sort names if available

Here's my attempt at this. It uses "sort name" metadata embedded in media files when ordering by artist or album artist. I don't understand why ordering is calculated in the database as opposed to the user interface, but this'll just obey the same rules (using the LoweredName field, and similar.
Comment 22 John Millikin 2008-11-15 05:28:08 UTC
Created attachment 122713 [details] [review]
Add sort information to TagLib.Tag

...and here's a small, ugly patch to TagLib. I don't really know where to put this, since taglib appears not to have a bug tracker, but I noticed that a custom taglib is maintained in the Banshee PPA so perhaps somebody will know what to do with this. It just adds tag properties for artist and album artist sort information to TagLib.Tag.

Required for compiling the above patch to Banshee.
Comment 23 Gabriel Burt 2009-01-13 21:09:59 UTC
John, we just started tracking TagLib# bugs in BGO - please file a new bug for it.  Would love to see id3v2 support in that patch, too.
Comment 24 Gabriel Burt 2009-01-13 21:19:51 UTC
John, sorting is calculated in the db b/c 1) sqlite is super good and fast at that and 2) our entire new (1.x) architecture is built around leveraging the database - so we sort once and cache the result of that into the CoreCache table, and then just pull 100 or so items at a time out as you scroll.

John, can you svn up and attach an updated patch?  Pretty sure I committed some changes to DatabaseAlbumInfo.cs (that make sure we call Save as few times as possible) that conflict.

I'm also not sure what I think about you changing NameLowered.  That column is used not just for sorting, but for searching too.  Sqlite's LOWER method only works for ASCII, which is why that column exists (and to save time by caching the lowered result instead of calculating it each time).

With your change, if the ArtistName is "The Beatles", the ArtistSortName is "Beatles, The", and so LoweredName is "beatles, the" then a search for "the beatles" would no longer match.  Options include: 1) that being ok 2) adding a new column LoweredSortName 3) something I haven't thought of?
Comment 25 Gabriel Burt 2009-01-13 21:24:00 UTC
Anders, I'm rejecting your patch - it makes incorrect assumptions (eg that Sqlite's LOWER command is a viable option), and makes too many changes that aren't (IMO) directly relevant to solving this bug (e.g. your 'fictional' metadata stuff).
Comment 26 John Millikin 2009-01-13 21:35:05 UTC
Split TagLib# bug to bug #567657 , and re-attached patch. Will work on a new patch that supports ID3 when I can find some MP3 files.

If LoweredName is used for both sorting and searching, it should probably be split into two columns to provide proper support for sortnames. Maybe the new column could be called something like "SortKey" or "NormalizedSortName", since lowercasing might not be the only processing performed on the sortname.
Comment 27 John Millikin 2009-01-13 22:55:05 UTC
Created attachment 126385 [details] [review]
When sorting by artist or album artist, use sort names if available (v2)

Updated version of my patch. Adds a separate column for storing the lowercase sort name. Updates various queries to use the lowered sortnames for ordering.
Comment 28 Andrés G. Aragoneses (IRC: knocte) 2009-01-13 23:12:59 UTC
(In reply to comment #25)
> Anders, I'm rejecting your patch - it makes incorrect assumptions (eg that
> Sqlite's LOWER command is a viable option), and makes too many changes that
> aren't (IMO) directly relevant to solving this bug (e.g. your 'fictional'
> metadata stuff).

Gabriel, thanks for the follow-up. Indeed, that patch was obsolete. I have some changes on my working copy of Banshee but didn't publish the patch because it depends on the patch I posted on bug 528493, so I figured I should wait until that one lands. Any chance to review it please? Thanks.

Comment 29 Gabriel Burt 2009-01-16 03:12:11 UTC
John, you've got some method calls w/o a space before the arg list (see HACKING).

My /usr/include/gstreamer-0.10/gst/gsttaglist.h file lists these as valid gst tags (for use in CommonTags.cs in Banshee):

title-sortname
musicbrainz-sortname (but also keep artist-sortname for this)
album-sortname

But I guess I can't see much point in the TrackSort - until enough people complain, at least. :)

"CoreAlbums.ArtisSortNameLowered ASC" looks like a typo.

Looking pretty good, please update and test thoroughly.  Thanks!
Comment 30 John Millikin 2009-01-16 03:28:27 UTC
> John, you've got some method calls w/o a space before the arg list (see
> HACKING).

Just trying to match surrounding style. Is it preferred to update code style while patching?

> title-sortname
> album-sortname

I could add these if you think they will be useful, but it's additional code for a feature that appears useless to me. Are there any tagging applications that write these tags?

> musicbrainz-sortname (but also keep artist-sortname for this)

This might be a bug/legacy in gstreamer, since Musicbrainz writes to ARTISTSORT/ALBUMARTISTSORT. If we're to read it, should it be used for artist or albumartist sorting?

> "CoreAlbums.ArtisSortNameLowered ASC" looks like a typo.

Whoops
Comment 31 Gabriel Burt 2009-01-16 03:31:56 UTC
(In reply to comment #30)
> > John, you've got some method calls w/o a space before the arg list (see
> > HACKING).
> 
> Just trying to match surrounding style. Is it preferred to update code style
> while patching?

Oh right, sorry.  Matching it is fine then, better to at least be consistent within the file.

> > title-sortname
> > album-sortname
> 
> I could add these if you think they will be useful, but it's additional code
> for a feature that appears useless to me. Are there any tagging applications
> that write these tags?

Why wouldn't you read album-sortname?  You have the db column and trackInfo properties all setup, no?

> > musicbrainz-sortname (but also keep artist-sortname for this)
> 
> This might be a bug/legacy in gstreamer, since Musicbrainz writes to
> ARTISTSORT/ALBUMARTISTSORT. If we're to read it, should it be used for artist
> or albumartist sorting?

Yeah, the musicbrainz-sortname is deprecated, but no reason we can't handle both in our big switch statement.  I'm not sure about artist vs albumartist, you'll have to do some research.
Comment 32 John Millikin 2009-01-16 06:23:38 UTC
Created attachment 126557 [details] [review]
When sorting by artist or album artist, use sort names if available (v3)

New patch. Adds tag-based sorting of track and album titles. Adds support for musicbrainz-sortname, as an alias for artist-sortname, when reading tags from a stream.

Requires latest patch in bug #567657 for compilation, due to new TagLib.Tag properties.
Comment 33 Shaun McCance 2009-01-16 17:46:59 UTC
I mentiond this to Gabriel on IRC: Sorting in Banshee is frequently suboptimal because sqlite uses a very basic collation algorithm.  At a glance, it looks like it's just sorting lexicographically on the code points.  If there were a sort name column, Banshee could potentially mangle its contents to coerce a better sort from sqlite.  It's certainly not the cleanest way to handle the problem, but I figured I'd mention it here anyway.
Comment 34 Gabriel Burt 2009-01-16 18:46:26 UTC
Hey John,

You can combine the two cases in the switch like this:

case CommonTags.ArtistSortName:
case CommonTags.MusicBrainzSortName:
   ...;
   break;

Let's used 'Normalized' instead of 'Lowered' for the new columns/properties, to be better named for when we start doing smarter collation like Shaun mentioned.

Let's also but the 'Sort' at the end of any columns/names (but before 'Normalized', if any), so:
TitleSort, TitleSortNormalized etc

Also, we can't just set the value of the *Lowered columns to lower(..) using SQL because Sqlite's lower function only works for ASCII.  Instead, you have to increase the MetadataVersion in the BansheeDbFormatMigrator.cs which will trigger a job that will iterate over all tracks and save them (at which time, for each, the .ToLower () will be called appropriately).

Thanks, this is shaping up really nicely!
Comment 35 John Millikin 2009-01-16 22:56:14 UTC
Created attachment 126620 [details] [review]
When sorting by artist or album artist, use sort names if available (v4)

Updated properties and database field names, incremented metadata version.
Comment 36 Gabriel Burt 2009-01-17 20:40:38 UTC
Thanks John, this looks really good.  Since we're going to bump the MetadataVersion which triggers the rescan, and we want to do that to our users as infrequently as possible - we should probably get the collation fix in now too.

First step to do that is to change all the foo.ToLower () calls into a Normalize (foo) call.  Not sure where the Normalize method should live - maybe even in Hyena.Data.Sqlite.HyenaSqliteConnection ?

Shaun (and/or anybody w/ expertise here) - which if any of the optiosn listed here http://msdn.microsoft.com/en-us/library/system.text.normalizationform.aspx sound best?

I think the Normalize function should be more or less this:

public static string Normalize (string input)
{
    if (String.IsNullOrEmpty (input))
      return input;

    string ret = input.Normalize (System.Text.NormalizationForm.FormKC);
    return ret.ToLowerInvariant ();
}

I think we need the ToLower*Invariant* so even if the user switches locales between Banshee runs the data in the db will be consistent.

Important note: we also need to modify the query code to 1) use this Normalize method instead of ToLower and 2) look for "*Normalized" in addition to "Lowered" in the column names -- see src/Libraries/Hyena/Hyena.Query/QueryField.cs

Any other per-track managed-code processing we while we're at it?
Comment 37 John Millikin 2009-01-17 22:39:15 UTC
I wonder if it's even possible to calculate sensible sorting keys in advance. Most of the pages I've found regarding Unicode collation assume the user's locale is constant for all keys.

> I think we need the ToLower*Invariant* so even if the user switches locales
> between Banshee runs the data in the db will be consistent.

According to MSDN[1]:

> However, an application should use the invariant culture only for processes
> that require culture-independent results, such as formatting and parsing data
> that is persisted to a file. In other cases, it produces results that might be
> linguistically incorrect or culturally inappropriate.

The unicode.org information on collation[2] indicates that the current locale is a very important part of sorting:

> Languages vary not only regarding which types of sorts to use (and in which
> order they are to be applied), but also in what constitutes a fundamental
> element for sorting. For example, Swedish treats ä as an individual letter,
> sorting it after z in the alphabet; German, however, sorts it either like ae
> or like other accented forms of a, thus following a. In Slovak, the digraph
> ch sorts as if it were a separate letter after h.

The unicode collation functions in Glib are locale-dependent[3]. Different collation keys will be generated, depending on the user's settings.

The only way I can see to provide proper sorting while still using SQLite is to populate the sorting key attribute on startup, and on every locale change. That sounds complex and fragile.

[1] http://msdn.microsoft.com/en-us/library/4c5zdc6a.aspx
[2] http://unicode.org/reports/tr10/
[3] http://library.gnome.org/devel/glib/2.18/glib-Unicode-Manipulation.html#g-utf8-collate
Comment 38 Wouter Bolsterlee (uws) 2009-01-17 23:14:46 UTC
(In reply to comment #36)
> I think the Normalize function should be more or less this:
> 
> public static string Normalize (string input)
> {
>     if (String.IsNullOrEmpty (input))
>       return input;

I'd say "return null". Storing empty strings in a database where "no value" is actually intended is usually a sign of bad design. In this case the result would be that both empty strings and NULL values may end up in the database, which is not only bad design, but may also cause sorting and comparison issues later on.

>     string ret = input.Normalize (System.Text.NormalizationForm.FormKC);
>     return ret.ToLowerInvariant ();
> }
Comment 39 John Millikin 2009-01-19 04:29:25 UTC
Found a possible alternative to normalizing sortnames. It might also eliminate the need for pre-calculating lowercase sortnames. According to the home page for SQLite's .NET bindings[1], it is possible to define custom functions in managed code. If a custom function was defined to lowercase a string and then calculate the sorting key, it could be used in BansheeQuery.cs.

I don't know what the appropriate Mono/.NET functions would be, but glib defines functions that would be useful for this:
g_utf8_casefold()
g_utf8_collate_key()

Does this sound like a good idea? Based on the sample code I can find, it would be relatively easy to implement.

[1] http://sqlite.phxsoftware.com/
Comment 40 Gabriel Burt 2009-01-22 23:51:40 UTC
John, I spent considerable effort around a year ago to switch us from Mono.Data.SqliteClient to Mono.Data.Sqlite, so we could benefit from prepared statements and custom functions, but at that time at least, M.D.S had _serious_ problems with threading - you couldn't read results from a query from any thread except the one where the db connection had been opened.

If you can verify that this is no longer the case, I'd love to move to it and have a more powerful sqlite.
Comment 41 Gabriel Burt 2009-01-23 00:54:22 UTC
*** Bug 561380 has been marked as a duplicate of this bug. ***
Comment 42 John Millikin 2009-01-23 02:16:41 UTC
> M.D.S had _serious_ problems with threading - you couldn't read results from
a query from any thread except the one where the db connection had been opened.

It looks like this issue has been fixed, at least, but there are more. Attempting to port from m.d.sc to m.d.s causes intermittent errors when importing tracks, which look related to multithreading:

* cannot start a transaction within a transaction
* not an error
* cannot commit transaction - SQL statements in progress

The SQLite wiki page on multithreading[1] states that using the same database connection simultaneously between threads is unsafe. I assume m.d.sc has additional locking to prevent the above errors, or perhaps the current configuration only works by chance. I'm poking around in Banshee's database layer to figure out why multiple threads are trying to share a connection.

http://www.sqlite.org/cvstrac/wiki?p=MultiThreading
Comment 43 John Millikin 2009-01-31 21:32:22 UTC
Created attachment 127662 [details] [review]
When sorting by artist or album artist, use sort names if available (v5)

Uses a custom SQLite function for calculating the sorting keys for sortnames. Currently just calls `String.ToLower()`, but the function could be easily modified to support fancier sorting without requiring any changes to the database.

Requires a patch to mono: https://bugzilla.novell.com/show_bug.cgi?id=470042
Comment 44 Gabriel Burt 2009-02-03 22:46:19 UTC
Ok, we need to get a patched copy of Mono.Data.Sqlite into our codebase, and bundle it until the fixed M.D.S is prevalent enough to depend on.
Comment 45 John Millikin 2009-02-04 01:32:19 UTC
Created attachment 127893 [details] [review]
When sorting by artist or album artist, use sort names if available (v6)

> Ok, we need to get a patched copy of Mono.Data.Sqlite into our codebase, and
> bundle it until the fixed M.D.S is prevalent enough to depend on.
Comment 46 John Millikin 2009-02-04 04:40:19 UTC
Created attachment 127903 [details] [review]
When sorting by artist or album artist, use sort names if available (v7)

Move the custom function definitions out of Hyena.Data.Sqlite and into Banshee.Database.
Comment 47 John Millikin 2009-02-04 20:05:11 UTC
Created attachment 127943 [details] [review]
Only extract sortname metadata, no modified database sorting
Comment 48 John Millikin 2009-02-04 20:05:44 UTC
Created attachment 127944 [details] [review]
Only extract sortname metadata, no modified database sorting
Comment 49 Gabriel Burt 2009-02-11 01:27:28 UTC
I committed the read/write *sort metadata patch, with a slight change to not set the AlbumArtistSort to "Various Artists" if null.

For the other one, to actually sort by the sort columns, let's get rid of the BANSHEE_* custom method and add a HYENA_COLLATE method, then we can sort like this in SQL:

ORDER BY HYENA_COLLATE(ifnull(TitleSort, Title))
Comment 50 Andrés G. Aragoneses (IRC: knocte) 2009-02-11 02:27:37 UTC
This is interesting. However, don't forget to raise the version of TagLib# in the configure check, because it's not building for me with my version.
Comment 51 John Millikin 2009-02-11 02:50:52 UTC
Created attachment 128438 [details] [review]
When sorting by artist or album artist, use sort names if available (v8)

Use .NET's sortkey methods for generating sortkeys (in byte[] form). Adds new columns to the database, *SortKey, for storing these keys. Keys are generated on startup if the user's locale has changed from the last time.
Comment 52 John Millikin 2009-02-11 03:08:35 UTC
Created attachment 128439 [details] [review]
When sorting by artist or album artist, use sort names if available (v9)

sorry...

While playing with Banshee after applying that last patch, I discovered a small error in `BansheeQuery.GetSort()`. This is the corrected patch.
Comment 53 Gabriel Burt 2009-02-11 19:04:57 UTC
Hey John, looks really good.  A few things:

1. Can we just call ToString on the SortKey objects instead of dealing w/ the byte[]? Do we potentially lose the proper collation if we do that?

2. If not, why is Replace ("-","") needed?  Please put a space after the comma(s) in args.

3. In the custom function(s), we could do the System.DBNull.Value.Equals check before the cast to avoid casting if not needed.

4. In Hyena's GetSortKey let's cache CultureInfo.CurrentCulture.CompareInfo in a static variable (to avoid 2 extra getters each time).

5. Let's rename HYENA_COLLATE to HYENA_UNICODE_NOCASE_COLLATE_KEY - what do you think?
Comment 54 John Millikin 2009-02-11 19:26:32 UTC
> Can we just call ToString on the SortKey objects instead of dealing w/ the
> byte[]? Do we potentially lose the proper collation if we do that?

`SortKey.ToString()` prints the original inputs to the sortkey, not the key itself[1]. Retrieving the key requires retrieving a byte[].

> If not, why is Replace ("-","") needed?  Please put a space after the
> comma(s) in args.

`BitConverter.ToString()` converts the array {0x10, 0x20, 0x30} into "10-20-30". The dashes need to be stripped, otherwise SQLite will not be able to parse the value.

> Let's rename HYENA_COLLATE to HYENA_UNICODE_NOCASE_COLLATE_KEY - what do you
> think?

Awfully long. "unicode" can be assumed, because .NET strings are unicode. I doubt "nocase" is needed, since all searching and sorting in Banshee/Hyena is performed without case. I doubt there will be so many collation-related functions in Hyena that such disambiguation is required.

"collate key" sounds like it's operating *on* a key, rather than generating one; if you want the function to be a noun, it would be "collation key".

[1] http://msdn.microsoft.com/en-us/library/system.globalization.sortkey.tostring.aspx
Comment 55 Gabriel Burt 2009-02-11 19:37:00 UTC
Ok, byte[] it is, and HYENA_COLLATION_KEY it is.
Comment 56 John Millikin 2009-02-11 20:10:17 UTC
Created attachment 128491 [details] [review]
When sorting by artist or album artist, use sort names if available (v10)

* Renamed function to HYENA_COLLATION_KEY
* Cache current culture's `CompareInfo` in a static variable
* Use `DatabaseConfigurationClient` for caching the user's locale between sessions
* Move test for DBNull above the cast
Comment 57 Gabriel Burt 2009-02-12 00:37:32 UTC
OK sorry, I was trying to fix this up but...would you do another round?

1. Specify column name in TrackTitleSortKey and TrackTitleSort properties

2. Make all the SortKey and Lowered properties internal

3. Replace lazy load with "private static CompareInfo culture_compare_info = CultureInfo.CurrentCulture.CompareInfo;"

4. Write unit tests for
    A. StringUtils.GetSortKey
    B. Hyena.Data.Sqlite.HyenaSqliteCommand.SqlifyObject (change to internal)
    C. DatabaseTrack/Artist/AlbumInfo - testing the Sort[Key]/Lowered properties, specifically

Thanks!
Comment 58 John Millikin 2009-02-12 04:41:13 UTC
Created attachment 128520 [details] [review]
When sorting by artist or album artist, use sort names if available (v11)

> OK sorry, I was trying to fix this up but...would you do another round?
Always

> 1. Specify column name in TrackTitleSortKey and TrackTitleSort properties
Done

> 2. Make all the SortKey and Lowered properties internal
See below

> 3. Replace lazy load with "private static CompareInfo culture_compare_info =
> CultureInfo.CurrentCulture.CompareInfo;"
Done (but see below)

>4. Write unit tests for
>    A. StringUtils.GetSortKey
>    B. Hyena.Data.Sqlite.HyenaSqliteCommand.SqlifyObject (change to internal)
Done and done

>    C. DatabaseTrack/Artist/AlbumInfo - testing the Sort[Key]/Lowered
> properties, specifically

Attempting to construct instances of the Database* classes in the unit tests raised exceptions. To test these properties, I had to move them into the non-Database classes in Banshee.Core/Banshee.Collection/*Info.cs. The resulting properties could, therefore, not be marked internal.

Since `SortKey()` depends on the user's current culture, I suspect the unit tests may break depending on the locale. For reference, my tests were run in en-US.UTF8. The cached `CompareInfo` means that even if `Thread.CurrentThread.CurrentCulture` is set to a known, neutral culture, it will not take effect in `SortKey()`.
Comment 59 Gabriel Burt 2009-02-12 15:55:55 UTC
What exception was raised when testing Database*Info classes?

All the unit tests used to be run in Italian - sure they're not still?
Comment 60 John Millikin 2009-02-12 18:14:17 UTC
> What exception was raised when testing Database*Info classes?

1) Banshee.Collection.Database.Tests.DBAlbumInfoTests.RunTest : System.TypeInitializationException : An exception was thrown by the type initializer for Banshee.Collection.Database.DatabaseAlbumInfo
  ----> System.NullReferenceException : Object reference not set to an instance of an object
at Banshee.Collection.Database.Tests.DBAlbumInfoTests.RunTest () [0x00000]
at (wrapper managed-to-native) System.Reflection.MonoMethod:InternalInvoke (object,object[])
at System.Reflection.MonoMethod.Invoke (System.Object obj, BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0x00000]
--NullReferenceException
at Banshee.Configuration.DatabaseConfigurationClient.Get (System.String namespce, System.String key) [0x00000]
at Banshee.Configuration.DatabaseConfigurationClient.Get[Int32] (System.String namespce, System.String key, Int32 fallback) [0x00000]
at Banshee.Database.BansheeModelProvider`1[Banshee.Collection.Database.DatabaseAlbumInfo].CheckVersion (System.String namespce, System.String key, Int32 new_version, Banshee.Database.MigrateDel func) [0x00000]
at Banshee.Database.BansheeModelProvider`1[Banshee.Collection.Database.DatabaseAlbumInfo].CheckVersion () [0x00000]
at Hyena.Data.Sqlite.SqliteModelProvider`1[Banshee.Collection.Database.DatabaseAlbumInfo].Init () [0x00000]
at Hyena.Data.Sqlite.SqliteModelProvider`1[Banshee.Collection.Database.DatabaseAlbumInfo]..ctor (Hyena.Data.Sqlite.HyenaSqliteConnection connection, System.String table_name) [0x00000]
at Banshee.Database.BansheeModelProvider`1[Banshee.Collection.Database.DatabaseAlbumInfo]..ctor (Banshee.Database.BansheeDbConnection connection, System.String table_name) [0x00000]
at Banshee.Collection.Database.DatabaseAlbumInfo..cctor () [0x00000]

> All the unit tests used to be run in Italian - sure they're not still?
Ah, you're correct, they are. Disregard that part, then.
Comment 61 Gabriel Burt 2009-02-13 04:48:47 UTC
Created attachment 128620 [details] [review]
Updated patch

I'm still seeing total sort failure when I take a version 22 db w/ 5000 songs and run this patch on it.  It does a Metadata Refresh, and is sorted by track desc as normal, but I cannot discern any pattern to the actual sort order.

All the unit tests pass.  I moved the *Key and *Lowered properties back out of *Info (eg now only in Database*Info again) and added a bit of hackery to get the unit tests to run against Database*Info.

Can you confirm this patch works for you when you just run it on an existing db?
Comment 62 Gabriel Burt 2009-02-13 04:50:10 UTC
Apply with patch -p1, sorry
Comment 63 John Millikin 2009-02-13 07:21:46 UTC
> Can you confirm this patch works for you when you just run it on an existing
> db?
Confirmed. Run on a fresh version-22 db created with 1.4.2 , metadata refresh worked and all columns are sorted properly. Resorting on various columns worked perfectly.

> I'm still seeing total sort failure when I take a version 22 db w/ 5000 songs
> and run this patch on it.  It does a Metadata Refresh, and is sorted by track
> desc as normal, but I cannot discern any pattern to the actual sort order.
Based on your description of the issue in IRC, Banshee is acting as if sortnames are non-null empty strings. I can't figure out any reason why that would be.

A couple diagnostic steps that might help...

-----------------------

1. Change the *SortKey properties to use the *SortKey infrastructure, without actual sort keys. If the problem persists, the error is somewhere in sortkey generation / extraction. If sorting works correctly after trying this, the problem is metadata-related:

- get { return Hyena.StringUtil.SortKey (NameSort ?? Name); }
+ get { return Hyena.StringUtil.SortKey (Name); }

-----------------------

2. If test #1 indicates a problem with sortkey generation, replace the implementation of `StringUtil.SortKey()` with one that returns the UTF8-encoded version of its input. This will let metadata be applied, but without proper collation.

public static byte[] SortKey (string orig)
{
    if (orig == null) { return null; }
    return System.Text.Encoding.UTF8.GetBytes (orig);
}

If this code works, it indicates a configuration problem / bug in Mono. I very much hope this isn't what's happening, because it could be a huge pain to fix.

-----------------------

3. If test #1 indicates a metadata problem, try running these queries against the migrated (version 25) database. They should return titles for objects without sortnames. If resultset is empty, empty strings are being inserted instead:

SELECT Title FROM CoreTracks WHERE TitleSort IS NULL;
SELECT Title FROM CoreAlbums WHERE TitleSort IS NULL;
SELECT Name FROM CoreArtists WHERE NameSort IS NULL;

If no results are returned for objects known to not contain sortnames, it's probably a taglib problem.

Something that occurs to me, related to this, is that your tag editor is inserting metadata like sortname="", which could cause a problem like this. If so, a patch to TagLib to check the sortname for the empty string could help.
Comment 64 John Millikin 2009-02-25 22:19:43 UTC
Created attachment 129524 [details] [review]
When sorting by artist or album artist, use sort names if available (v12)

New version of the patch. This adds a pane for editing sorting-related metadata to the track editor.
Comment 65 John Millikin 2009-02-26 19:34:42 UTC
Created attachment 129595 [details] [review]
When sorting by artist or album artist, use sort names if available (v13)

Updates unit tests to new "" -> null behavior.
Comment 66 Gabriel Burt 2009-02-26 20:06:39 UTC
Good grief, you did an amazing job on this John.  Thanks for your perseverance!  Committed, at long last.  Working perfectly for me.
Comment 67 John Millikin 2009-03-05 23:46:19 UTC
*** Bug 567437 has been marked as a duplicate of this bug. ***
Comment 68 Alexander Kojevnikov 2009-05-21 03:09:42 UTC
*** Bug 583381 has been marked as a duplicate of this bug. ***
Comment 69 Bertrand Lorentz 2009-08-31 11:12:56 UTC
*** Bug 593592 has been marked as a duplicate of this bug. ***