GNOME Bugzilla – Bug 139196
Make searches ignore minor spelling differences
Last modified: 2018-05-24 10:30:43 UTC
The search function as implemented in rhythmbox searches for exact phrases, such that a search for "Lazy Days" is different from a search for "Lazy Days". I think a better search method is to search for each word, with spaces and whitespace ignored as immaterial. Then it wouldn't matter of someone searched for "Lazy Days" or "Days Lazy" or "Lazy Days". It should then search for all the songs with the words "days" and "lazy" in it.
I agree - I would really like to have the search box contents treated as a list of space seperated search terms as opposed to one search term. As well as changing orders as above, I think the different terms should be searched for globally. So for example to locate the album "true" by artist "trinity roots" it would (depending on other items in your library) probably be enough to enter roots true or maybe even roo tru. I found that this form of searching was the most valuable feature for me in itunes. As the search updates, it allows you to interactively adjust what your typing to narrow in to what you want. I found this interactive search experience really a new feature in computer interfaces and I think its great. Im not sure if the search update in rhythmbox would be fast enough for this kind of behaviour, but it would be good to have anyway, for the reasons outlined above. Not sure if this should really be called 'fuzzy' searching though - since to me this implied Madman style search interpretation (matching spelling mistakes and providing best match etc.) which is not what I would like at all...
It would be also very great if I could find stuff like "Die Ärzte" by typing "die arzte" (just leaving out the umlaut on a foreign keyboard) or "die aerzte" (which is how German people would write it).
I agree that this a good idea. I think the current behaviour is non-intuitive. Users are used to the concept that searching for: The Rolling Stones Will search for 3 terms. And if they want to search for a number of words in a row its: "The Rolling Stones" Will search for the 1 term. I find the current implementation limiting. e.g. Let's say I have Bob Dylan's Like a Rolling Stone, plus a number of Rolling Stone tracks. I go to find the Dylan track via the search-bar. Searching for: Rolling Stone will find a number of tracks. Searching for: Rolling Stone Dylan won't find any tracks.
Someone started working on this a few days ago, with some guidance via IRC. They were using the Levenstein distance algorithm - which means that it can match even with extra letters inserted, letters deleted or swapped. Last time we talked they had it mostly working. It was fairly slow, but there are a couple of obvious optimizations we can make, which will provide order-of-magnitude improvements to speed. I'll provide an update next time I hear from them.
Created attachment 50729 [details] [review] patch I haven't heard from them in about a week, so I'm attaching the most up to date copy of the patch that I had - so it doesn't get lost. The patch does a number of things: * adds functions to rb-string-helpers, which computes the "distance" of two strings using some "Party Pattern" code written by Benjamin Otte two years ago. (using the Levenstein distance algorithm) * stores the PartyPatterns in RBRefStrings, so they don't have to be generated every time a query is run * adds a new operator RHYTHMDB_QUERY_PROP_FUZZY_MATCH, which does fuzzy matching * adds a new property RHYTHMDB_PROP_SEARCH_STRING, which can only (currently) be used by the fuzzy operators, and causes it to match against genre, artist, album and title (for the search box). This lets you specify more than one of those at the same time. * changes the search box to use RHYTHMDB_QUERY_PROP_FUZZY_MATCH and RHYTHMDB_PROP_SEARCH_STRING. This implements fuzzy matching which ignores all punctuation and case differences, as well as matches words seperately. The search box also does this with genre/artist/album/title at once, so "pris end" will match "Prisioner of Society" by "The Living End". By fiddling around with the value returned from party_pattern_cost_replace, party_pattern_cost_delete and party_pattern_cost_insert; as well as the threshold in evaluate_conjunctive_subquery it can be made to match even with missing characters or extra ones inserted. With the current (fairly arbitrary) values it will ignore one missing or extra character per search. Although doing fuzzy matching is slower than looking for a substring, this doesn't feel that much slower (I haven't done profiling yet, so I have no numbers). This is probably because it does one match for all four properties (genre/album/artist/title) rather than doing a substring search four times. There is an additional obvious optimisation that could be done - by not having to rebuilding the PartyPattern of the search term for every entry. Some things that need to be done: * fix the copyright/licence stuff in rb-string-helpers. * rename some things, and fix some code style issues. * check whether can match "è" with "e" and the like * implement the above optimisation, and then do some profiling to see what speed difference this patch causes. * probably fix some bugs, and make it more robust
Created attachment 53567 [details] [review] other patch This is a variation on the earlier patch, which doesn't have any "fuzzy matching" but can match against multiple properties, and hence is simpler. For example I can entering "cou row liv" will find my tracks by "Counting Crows" from "Live at the Wiltern Theatre".
A better version of the second patch has been committed to cvs. If someone wants to add real "fuzzy matching" that ignores spelling mistakes and the like, it should be easy enough to add the necessary bits from the first patch.
I'd prefer to have this bug open still, but just watch for the missing fuzzyness bits; maintainers, your thoughts?
Yeah, it's probably worth keeping open.
*** Bug 338824 has been marked as a duplicate of this bug. ***
I've unmarked 338824 as a duplicate, and am re-titline this bug to be the other bit, since the two features are different and are likely to be done separately.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/rhythmbox/issues/29.