GNOME Bugzilla – Bug 143708
The stemmer should do the right thing for documents in languages other than English
Last modified: 2018-07-03 09:50:56 UTC
It seems that words like 'is' are not in the index which is fine. But if I search for 'your name is' with 'best' it won't find the file containing that text. Searching for 'your name' finds it without problems.
This is the same for , etc...Best should remove these from the search string before searching. But how does this work with other languages? For instance 'is' in Danish means 'ice cream' and I might want to search for that.
The main bug (improper handling of little words like 'is') is fixed in CVS. I'm changing the bug into a wishlist item for proper support of stemming in other languages. Lucene can support stemmers/analyzers for languages other than English. The trickier part is automatically figuring out which language a document is in... which is actually doable with fairly simple algorithms, but it is almost impossible to do the right thing w/ documents in multiple languages. At very least, the user should be able to set a default language. (Or maybe it could be set based on the locale?)
Um why not both, default based on locale and a option to switch. :)
*** Bug 449149 has been marked as a duplicate of this bug. ***
Beagle is not under active development anymore and had its last code changes in early 2011. Its codebase has been archived (see bug 796735): https://gitlab.gnome.org/Archive/beagle/commits/master "tracker" is an available alternative. Closing this report as WONTFIX as part of Bugzilla Housekeeping to reflect reality. Please feel free to reopen this ticket (or rather transfer the project to GNOME Gitlab, as GNOME Bugzilla is deprecated) if anyone takes the responsibility for active development again.