After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 168189 - Strip diacritics and split ligature characters when searching
Strip diacritics and split ligature characters when searching
Status: RESOLVED WONTFIX
Product: beagle
Classification: Other
Component: General
0.2.13
Other All
: Normal enhancement
: Milestone 2
Assigned To: Beagle Bugs
Beagle Bugs
gnome[unmaintained]
: 482567 525911 (view as bug list)
Depends on: 354742
Blocks:
 
 
Reported: 2005-02-22 20:37 UTC by Federico Mena Quintero
Modified: 2018-07-03 09:53 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Strip Diacritics when searching (437 bytes, patch)
2006-08-13 02:55 UTC, Kevin Kubasik
rejected Details | Review

Description Federico Mena Quintero 2005-02-22 20:37:39 UTC
If you search for "aeiou" in Google, it will also match pages with "αθοτu". 
This is useful because it lets you find pages with bad spelling, but that
nonetheless contain the information you want.  Beagle should do the same.  This
page describes a quick-n-dirty way to strip diacritics:

http://weblogs.asp.net/michkap/archive/2005/02/19/376617.aspx
Comment 1 Kevin Kubasik 2006-05-10 21:36:51 UTC
Unfortunately, this example requires .Net 2.0 (or compatible API) which mono does not support yet.
Comment 2 Kevin Kubasik 2006-08-13 02:55:26 UTC
Created attachment 70798 [details] [review]
Strip Diacritics when searching

Ok, the attached patch does that, but at the same time, doesn't this ruin any chance at real international support?

But maybe I just did it the wrong way. Irregardless, I just did this in the beagle-search frontend, since other frontends may choose differently, and its not hard to implement.
Comment 3 Kevin Kubasik 2006-08-13 03:39:45 UTC
Scratch that, this causes XMLSerialization to die horribly.
Comment 4 Joe Shaw 2006-08-14 18:19:46 UTC
The right way to do this is to add an additional filter to the analzyer that makes these modifications for you.  Cases of changing things like é to e are easy, but
for ü you probably want to support both u and ue.  That's a little trickier.
Comment 5 Joe Shaw 2006-10-02 16:20:45 UTC
Lucene.Net 1.9.1 has support for this; we should add it once it's checked in.
Comment 6 Kevin Kubasik 2006-11-28 17:11:50 UTC
Marking NEW, since Lucene.Net 1.9.1 is now checked in and merged...
Comment 7 Kevin Kubasik 2006-11-28 17:14:23 UTC
Just adding some bug-intertwine, we need some sort of language detection for this to work.
Comment 8 Joe Shaw 2006-11-29 01:01:17 UTC
I don't think this bug depends on 354742.  We probably want to strip diacritics regardless of language.  There is a Lucene filter (new in 2.0, I believe) for stripping them, we should look into using it:

http://cvs.gnome.org/viewcvs/*checkout*/beagle/beagled/Lucene.Net/Analysis/ISOLatin1AccentFilter.cs
Comment 9 Debajyoti Bera 2007-05-03 18:10:02 UTC
FYI, there is another one in the making, a StripLatinDiacriticsFilter
https://issues.apache.org/jira/browse/LUCENENET-38
Comment 10 Joe Shaw 2007-10-02 20:37:42 UTC
*** Bug 482567 has been marked as a duplicate of this bug. ***
Comment 11 Joe Shaw 2008-04-10 14:09:12 UTC
*** Bug 525911 has been marked as a duplicate of this bug. ***
Comment 12 Joe Shaw 2008-04-10 14:10:59 UTC
Bug 525911, which I just marked as a dup, is a slight variation on the diacritic problem.  It has ligature characters like a combined "fi" that should be searchable with the regular, individual "fi" characters.  Attached to that bug is a PDF which illustrates the behavior.
Comment 13 André Klapper 2018-07-03 09:53:08 UTC
Beagle is not under active development anymore and had its last code changes in early 2011. Its codebase has been archived (see bug 796735):
https://gitlab.gnome.org/Archive/beagle/commits/master

"tracker" is an available alternative.

Closing this report as WONTFIX as part of Bugzilla Housekeeping to reflect
reality. Please feel free to reopen this ticket (or rather transfer the project
to GNOME Gitlab, as GNOME Bugzilla is deprecated) if anyone takes the
responsibility for active development again.