GNOME Bugzilla – Bug 787452
Stop words cause other words in search to be ignored
Last modified: 2021-05-26 22:26:08 UTC
I was trying to persuade Tracker to find some music I just bought. $ tracker search new model Search term 'new' is a stop word. Stop words are common words which may be ignored during the indexing process. Results: But if I simply search for 'model' then I can find it: Results: [32mfile:///home/sam/Downloads/man_f5d8236-4_pm01122_7-08.pdf[0m ...vary depending on the [1;31mmodel[0m of wireless card... [32mfile:///home/sam/Downloads/PERTURBATOR%20-%20New%20Model[0m PERTURBATOR - New [1;31mModel[0m [32mfile:///home/sam/Downloads/PERTURBATOR%20-%20New%20Model/PERTURBATOR%20-%20New%20Model%20-%2006%20God%20Complex.m4a[0m PERTURBATOR - New [1;31mModel[0m - 06 God Complex.m4a [32mfile:///home/sam/Downloads/PERTURBATOR%20-%20New%20Model/PERTURBATOR%20-%20New%20Model%20-%2005%20Corrupted%20by%20Design.m4a[0m PERTURBATOR - New [1;31mModel[0m - 05 Corrupted by Design.m4a [32mfile:///home/sam/Downloads/PERTURBATOR%20-%20New%20Model/PERTURBATOR%20-%20New%20Model%20-%2004%20Tainted%20Empire.m4a[0m PERTURBATOR - New [1;31mModel[0m - 04 Tainted Empire.m4a [32mfile:///home/sam/Downloads/PERTURBATOR%20-%20New%20Model/PERTURBATOR%20-%20New%20Model%20-%2003%20Vantablack.m4a[0m PERTURBATOR - New [1;31mModel[0m - 03 Vantablack.m4a [32mfile:///home/sam/Downloads/PERTURBATOR%20-%20New%20Model/PERTURBATOR%20-%20New%20Model%20-%2002%20Tactical%20Precision%20Disarray.m4a[0m PERTURBATOR - New [1;31mModel[0m - 02 Tactical Precision Disarray... [32mfile:///home/sam/Downloads/PERTURBATOR%20-%20New%20Model/PERTURBATOR%20-%20New%20Model%20-%2001%20Birth%20of%20the%20New%20Model.m4a[0m PERTURBATOR - New [1;31mModel[0m - 01 Birth of the New [1;31mModel[0m.m4a
Whoops, that last command was the output of 'tracker search model'.
Created attachment 359421 [details] [review] libtracker-fts: Only let stop words go through on prefix queries Commit 63e507865 made stop words go through when tokenizing FTS5 query search terms, in order to still provide matches for incompletely typed search terms that happen to match a stop word. This however brought the side effect that searching for a stop word in combination with other terms renders the latter ineffective. As the stop word has no tokens in the FTS5 table to match with, the whole query brings no results. Since that commit, SQLite fixed FTS5_TOKENIZE_PREFIX to work as advertised, so limit the bypass to prefix queries (e.g. "onto*"), since it only makes sense there. Also, invert the way we look for stop words (i.e. always lookup those in search terms as per config, and do the bypass once we know we deal with a stop word) for the sake of readability.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new enhancement request ticket at https://gitlab.gnome.org/GNOME/tracker/-/issues/ Thank you for your understanding and your help.