GNOME Bugzilla – Bug 333893
searches should be fuzzy
Last modified: 2006-06-27 20:13:19 UTC
Current searching for e.g. "me drag" does not locate "Are a Drag" by "Me First and the Gimme Gimmes". This approach works well in iTunes (it's how I locate albums, as it's quicker than finding them with the mouse) - would be nice to see it here. Suggest matching tracks where every search word (whitespace separated) matches in some metadata field of the track.
This is the same issue that rhythmbox had: bug:139196 Searches should be fuzzy instead of searching for a particular phrase. Searching for "bob dylan" or "dylan bob" should both find tracks by Bob Dylan. You also can't search in different fields. e.g. searching for "hendrix purple" should find the track Purple Haze by Jimi Hendrix. After typing hendrix, all the tracks by Hendrix are shown. After you type in purple, no tracks are shown. I would like to add that I find this bug my biggest complaint against banshee. Hopefully the bug reference may make it's resolution an easier task.
also, a (slightly) more powerful search syntax would be cool. probably the ability to exclude certain keywords would suffice, e.g. muse -hullabaloo to get everything by muse except a particular album (hulabaloo in this example)
Moving to the User Interface component.
Created attachment 63738 [details] [review] A patch to add a bit of fuzzy search capability This patch adds fuzzy searching to the default simple search. Unlike the original bug request, each of the terms in the search menu is AND'd to make the final results. For example, if I want to find the songs by Huey Lewis and the News from the Billboard Top 100 of 1984 I can type "1984 huey" or "huey 1984" or anything of that nature. A search for "1984 huey" will not return anything from Van Halen's album 1984 because none of those have 'huey' in them. (Unless there's a hilarious bootleg I'm missing.) Things I'm not sure about: I just used a single space ' ' to split the query string into tokens/words. Probably some nice constant for whitespace would be better, but I'm really knew to C# and I don't know what that would be. I also don't know if this has any internationalization implications, because I've never done anything with string literals in a program that supported internationalization. But the original source code has a 'the ' constant in it, so I bet this is fine.
I reread the first comment. My patch behaves exactly has dave rodgman described.
Created attachment 63744 [details] [review] Patch to add fuzzy search and NOT prefix operator, '-' I updated my previous patch to allow words to be prefixed with '-' to mean NOT. This should work now exactly as Dave suggested. I also found and read the HACKING text file and made a deliberate effort to match the coding guidelines provided there.
I think this is pretty much fixed now. If there are improvements to my patch or suggestions of any kind, I'd love a critique. Thanks.
Hi Travis... some things I noted when testing the patch: a) I had two albums by "Armor for Sleep" in my library; with the "All" filter selected on the search, I typed "armor" and the second album was removed from the view. I expected both albums by "ARMOR for Sleep" to match. This seems to happen for any artist with multiple albums; only the first album is shown when searching for that artist b) Try running a full import (I was doing about 4k songs) with some search active; I ended up getting a NullReferenceException: Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object in <0x002f9> Banshee.PlayerUI:DoesTrackMatchSearch (Banshee.Base.TrackInfo) in <0x0004a> Banshee.PlayerUI:OnSourceTrackAdded (object,Banshee.Sources.TrackEventArgs) ... c) Apart from that, everything behaves pretty nicely; if you could reproduce and iron out those issues, I have no objections to committing this in HEAD very soon.
Aaron, thanks for checking out my patch. I just tried to recreate (a) and I can't seem to make it happen. My testing library has about 16k songs in it, and I'm searching quite hard for an artist/album combination that I can get to recreate this first-album-only behavior. The closest example I have to your "Armor for Sleep" example is I have albums "Sublime" and "40 oz. to Freedom" both by "Sublime" and if I type "sublime" they both are featured in the view. Here are some other artist/album combinations I tried and the successful search terms I used to narrow down: search:artist:albums, coldplay:coldplay: a rush..., parachutes, the string quartet... rage machine:rage against the machine:rage...,renegades,unreleased,live at...,evil...,the battle of... system:system of a down:mesmerize,steal this album!,system of a down, toxicity,toxicity sessions, unrelease I've been trying to use as many different single, double, triple keywords on bands that I have one, two, three, four or more albums and each time I get all of the tracks that match including those from compilations or secondary albums. Do you think you could pick an example of where (a) failed for you and send me the id3 data from those files? I could at try walk it through on the whiteboard and see if I can find out what I've done wrong. (I'll look at (b) when I get home tonight.) Thanks again for looking at my patch, and I hope I can get this whipped into shape quickly.
Investigating (b): I flushed my music db and started from scratch. My first import was 4492 items (21+ days) all of billboard top 100 hits or whatever. No crash. I'm going to do a longer import (9k or so) to see if I can recreate the crash if I just kick it a little harder.
Ah stink. I just imported another 12k songs in one set without it crashing as described in (b) above. Maybe I'll see if I can get someone in the channel or on the mailing list to see if they can reproduce the bugs you describe. I'm kind of at a loss for what I can do to further my investigation. (on a plus side, this does make banshee the very first media player on linux that has *ever* imported all 16k songs of my music collection without freezing or crashing during the process.)
After looking at the patch, I think I might have an idea about issue (b). The patch results in the folowing code : foreach(string m in matches) { string ml = m.ToLower(); if(m == null || m == String.Empty) { continue; } If the m variable happens to be null, m.ToLower() throws a NullReferenceException. Moving the m.ToLower line after the if clause should fix this. I hope I made myself clear, I just didn't want to create a patch to the patch... This is just a wild guess, as I haven't duplicated the bug, but I think it should be worth trying.
Regarding (a): Couldn't confirm it. All albums show up on searches. Whether there's two albums by the same artist or 15. However, banshee crashes when typing something in the searchfield while it is playing. It works fine when no music is playing.
Correction: Banshee did leave out one album on searches. The fix in comment 12 fixes this. It also fixes the crash on searching when banshee is playing.
Thanks for the help! I had just recreated the bug last night when I stopped being able to build (Dumb cvs mistake on my part) and gave up for the night. Bertrand: thanks for catching my amateurish mistake! Looks like I didn't need to be able to build, I just needed to read over the code and pay more attention, heh heh. I really appreciate it. I'm testing the updated patch now to insure that it fixes it for me like it did in comment 14 for P. van de Geer and then I'll be uploading.
Created attachment 63893 [details] [review] fuzzy search and '-' NOT operator I've moved the null pointer generating line down below the appropriate check like Bertrand identified for me. My initial testing shows that it works great now. (My import testing for (b) failed earlier because I didn't read the entire report by Aaron and didn't have a search active. :( Bonehead move on my part.)
Don't be so hard on yourself :) The latest revision works much better, solves the two issues I was seeing. However I noticed one more issue: When the "All" filter is selected, everything works great, but for instance when I choose to filter on "Song Name," the behavior is not what I would expect. I entered "coheed" for the search on the "Song Name" filter and instead of getting zero results ("coheed" would match artist), I got the full track listing.
Regarding my last comment, it seems this only happens when an import is active. The results are correct (nothing should be shown for "coheed" filtered by Song Name) until the next track is imported... the view resets. This is not the case with the "All" filter, which does the right thing, even when importing.
I think I have this almost fix, but ran into some "weirdness". It seems that when importing the "first" song is always added to the view, even if I return false from DoesTrackMatchSearch(). Just to test this, I changed DTMS to: private bool DoesTrackMatchSearch(TrackInfo ti) { return false; } Then I typed "massive" into the search and imported an album (to an empty database) and sure enough the first song (In this case some random song from 'Maroon5') popped into view while the rest from the album I imported were obscured. A next consecutive import made no change in the view (random 'Maroon5' still there). (There are now 21 tracks, two distinct artists and albums.) I cleared the search (everything came back), then typed 'maroon' into the search as (as expected with the return false; function) the view blanked. Then I did a third import of a third distinct artist/album and wacky of all wackiness, the entire 21 tracks that were in the library appeared, along with the very first song of the final imported album (some random Madonna track). This leads me to believe that the wonky behavior of the searches when importing tracks isn't a result of my changes (which I did improve regarding searching on one "field" instead of "all" thanks to this), but something else that is using DoesTrackMatchSearch() incorrectly. I'm now going to see if I can widen my efforts to find out where the shenanigans are.
Created attachment 63906 [details] [review] fuzzy search and '-' NOT operator Improved version of fuzzy search and NOT operator. (Previously the NOT operator wouldn't work when searching just one field instead of "all") This doesn't resolve the remaining issue of wonky results when importing tunes, though it may turn out that the problem isn't in this portion of the code.
Hey guys, I'm bowing out of banshee development (I know I just got started), but I've moved to another project. I wanted to post so that if someone wants to take this patch and make it work (or even just take the bug and scrap my attempt), they know I'm not working on it anymore. Thanks for the help while I was working on the patch. -- Travis
Created attachment 67584 [details] [review] A patch for this bug and bug #327671 I was working on bug #327671 and decided to combine it with this bug because both of them involve the same code. I encountered the same problem with searching during importing. After some extensive debugging I think I figured out what's going on. When you search for something, the importing process stops, the playlist gets cleared and all the songs that have been added so far get filtered. If they match, they are added to the playlist. Once all of the tracks have been filtered, the importing process continues. The problem is that if there are no matches among the current tracks, the playlist should be empty. But as soon as the importing continues, if the playlist is empty (if there's at least one match there are no problems), it will be reset to show all the tracks in the library. I added a fix for that too, but I'm not familiar with Banshee enough to be sure it won't cause problems in other parts. Also, the original search code always tried to match the query and "the" + query. For example, if you searched for "beatles" it would try to match that or "the beatles". I don't think this is necessary because if "beatles" doesn't match, "the beatles" won't match either. And if it matches, there's no need for additional check with "the beatles". So I removed that.
Travis: thanks for your work on this, and good luck with your new project. Marin: outstanding work on your version! Excellent use of generics and the culture-based string comparison. I moved RelaxedStringIndexOf to src/Banshee.Base/Utilities.cs as StringUtil.RelaxedIndexOf so other areas can take advantage of it. I also formatted your code to better match the HACKING spec, and added a small check in the Array.ForEach<string> callback to not exclude (or include) a standalone '-' as a search term (which causes zero results to return). Thanks guys! I love it!