Bug 139196 – Make searches ignore minor spelling differences

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 139196 - Make searches ignore minor spelling differences


Summary:	Make searches ignore minor spelling differences


Status:	RESOLVED OBSOLETE

Product:	rhythmbox
Classification:	Other
Component:	User Interface
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	RhythmBox Maintainers
QA Contact:	RhythmBox Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2004-04-05 22:46 UTC by Maynard Kuona
Modified:	2018-05-24 10:30 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
patch (18.54 KB, patch) 2005-08-15 16:23 UTC, James "Doc" Livingston	reviewed	Details \| Review
other patch (9.60 KB, patch) 2005-10-17 02:01 UTC, James "Doc" Livingston	committed	Details \| Review

Description Maynard Kuona 2004-04-05 22:46:23 UTC

The search function as implemented in rhythmbox searches for exact phrases, such
that a search for "Lazy Days" is different from a search for "Lazy  Days". I
think a better search method is to search for each word, with spaces and
whitespace ignored as immaterial. Then it wouldn't matter of someone searched
for "Lazy Days" or "Days Lazy" or "Lazy  Days". It should then search for all
the songs with the words "days" and "lazy" in it.

Comment 1 Robin Ince 2004-12-07 07:14:52 UTC

I agree - I would really like to have the search box contents treated as a list
of space seperated search terms as opposed to one search term.

As well as changing orders as above, I think the different terms should be
searched for globally.
So for example to locate the album "true" by artist "trinity roots" it would
(depending on other items in your library) probably be enough to enter

roots true

or maybe even roo tru.

I found that this form of searching was the most valuable feature for me in
itunes. As the search updates, it allows you to interactively adjust what your
typing to narrow in to what you want. I found this interactive search experience
really a new feature in computer interfaces and I think its great.
Im not sure if the search update in rhythmbox would be fast enough for this kind
of behaviour, but it would be good to have anyway, for the reasons outlined above.

Not sure if this should really be called 'fuzzy' searching though - since to me
this implied Madman style search interpretation (matching spelling mistakes and
providing best match etc.) which is not what I would like at all...

Comment 2 Sven Herzberg 2005-07-01 17:53:06 UTC

It would be also very great if I could find stuff like "Die Ärzte" by typing
"die arzte" (just leaving out the umlaut on a foreign keyboard) or "die aerzte"
(which is how German people would write it).

Comment 3 lexual 2005-08-03 08:16:40 UTC

I agree that this a good idea. I think the current behaviour is non-intuitive. 

Users are used to the concept that searching for:

The Rolling Stones
Will search for 3 terms.

And if they want to search for a number of words in a row its:

"The Rolling Stones"
Will search for the 1 term.

I find the current implementation limiting.
e.g. Let's say I have Bob Dylan's Like a Rolling Stone, plus a number of Rolling
Stone tracks. I go to find the Dylan track via the search-bar.

Searching for:
Rolling Stone
will find a number of tracks.

Searching for:
Rolling Stone Dylan
won't find any tracks.

Comment 4 James "Doc" Livingston 2005-08-03 08:26:53 UTC

Someone started working on this a few days ago, with some guidance via IRC. They
were using the Levenstein distance algorithm - which means that it can match
even with extra letters inserted, letters deleted or swapped.

Last time we talked they had it mostly working. It was fairly slow, but there
are a couple of obvious optimizations we can make, which will provide
order-of-magnitude improvements to speed.

I'll provide an update next time I hear from them.

Comment 5 James "Doc" Livingston 2005-08-15 16:23:36 UTC

Created attachment 50729 [details] [review]
patch

I haven't heard from them in about a week, so I'm attaching the most up to date
copy of the patch that I had - so it doesn't get lost.

The patch does a number of things:
* adds functions to rb-string-helpers, which computes the "distance" of two
strings using some "Party Pattern" code written by Benjamin Otte two years ago.
(using the Levenstein distance algorithm)
* stores the PartyPatterns in RBRefStrings, so they don't have to be generated
every time a query is run
* adds a new operator RHYTHMDB_QUERY_PROP_FUZZY_MATCH, which does fuzzy
matching
* adds a new property RHYTHMDB_PROP_SEARCH_STRING, which can only (currently)
be used by the fuzzy operators, and causes it to match against genre, artist,
album and title (for the search box). This lets you specify more than one of
those at the same time.
* changes the search box to use RHYTHMDB_QUERY_PROP_FUZZY_MATCH and
RHYTHMDB_PROP_SEARCH_STRING.

This implements fuzzy matching which ignores all punctuation and case
differences, as well as matches words seperately. The search box also does this
with genre/artist/album/title at once, so "pris end" will match "Prisioner of
Society" by "The Living End". By fiddling around with the value returned from
party_pattern_cost_replace, party_pattern_cost_delete and
party_pattern_cost_insert; as well as the threshold in
evaluate_conjunctive_subquery it can be made to match even with missing
characters or extra ones inserted. With the current (fairly arbitrary) values
it will ignore one missing or extra character per search.

Although doing fuzzy matching is slower than looking for a substring, this
doesn't feel that much slower (I haven't done profiling yet, so I have no
numbers). This is probably because it does one match for all four properties
(genre/album/artist/title) rather than doing a substring search four times.
There is an additional obvious optimisation that could be done - by not having
to rebuilding the PartyPattern of the search term for every entry.


Some things that need to be done:
* fix the copyright/licence stuff in rb-string-helpers.
* rename some things, and fix some code style issues.
* check whether can match "è" with "e" and the like
* implement the above optimisation, and then do some profiling to see what
speed difference this patch causes.
* probably fix some bugs, and make it more robust

Comment 6 James "Doc" Livingston 2005-10-17 02:01:13 UTC

Created attachment 53567 [details] [review]
other patch

This is a variation on the earlier patch, which doesn't have any "fuzzy
matching" but can match against multiple properties, and hence is simpler. For
example I can entering "cou row liv" will find my tracks by "Counting Crows"
from "Live at the Wiltern Theatre".

Comment 7 James "Doc" Livingston 2005-10-29 08:03:30 UTC

A better version of the second patch has been committed to cvs.

If someone wants to add real "fuzzy matching" that ignores spelling mistakes and
the like, it should be easy enough to add the necessary bits from the first patch.

Comment 8 Sven Herzberg 2005-11-03 23:09:58 UTC

I'd prefer to have this bug open still, but just watch for the missing fuzzyness
bits; maintainers, your thoughts?

Comment 9 James "Doc" Livingston 2005-11-04 02:21:12 UTC

Yeah, it's probably worth keeping open.

Comment 10 Alex Lancaster 2006-04-18 08:25:53 UTC

*** Bug 338824 has been marked as a duplicate of this bug. ***

Comment 11 James "Doc" Livingston 2006-04-19 04:17:24 UTC

I've unmarked 338824 as a duplicate, and am re-titline this bug to be the other bit, since the two features are different and are likely to be done separately.

Comment 12 GNOME Infrastructure Team 2018-05-24 10:30:43 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/rhythmbox/issues/29.