After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 588753 - Tracker should search arabic characters with 2 Options, Or in diffrent way.
Tracker should search arabic characters with 2 Options, Or in diffrent way.
Status: RESOLVED OBSOLETE
Product: tracker
Classification: Core
Component: FTS
0.9.x
Other All
: Normal major
: ---
Assigned To: tracker-general
Jamie McCracken
Depends on:
Blocks:
 
 
Reported: 2009-07-16 08:14 UTC by AlNaimi
Modified: 2021-05-26 22:25 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description AlNaimi 2009-07-16 08:14:09 UTC
First option, & the international way: (Which using by google desktop search & the web engine, MSN both the desktop search & web engine , Almost all Arabic programs):
To ignore the different between this characters:
أ - ا - إ -آ
ه - ة
ي - ى
The second option:
Is to find the exact phrase as written.
As i said before, the first option is regular way to search with Arabic.
I hope that you can add the to options Or to make the first one is the default.
Comment 1 AlNaimi 2009-07-24 05:36:38 UTC
Also, I forget to maintain that tracker search engine should also ignore:
ّ - َ - ً - ُ - ٌ - ِ - ٍ - ~ - ْ
These characters are usually used with the main arabic characters.
Comment 2 Aleksander Morgado 2010-06-01 21:25:53 UTC
We currently use your suggested 1st approach for "unaccenting" words, meaning that all combining diacritical marks are removed from the NFKD-normalized string (for FTS). Combining diacritical marks are considered those in these Unicode ranges:
 * 0x0300 to 0x036F
 * 0x1DC0 to 0x1DFF
 * 0x20D0 to 0x20FF
 * 0xFE20 to 0xFE2F

It really seems that those combining marks you suggest to ignore during text search do not go into those previous ranges, so I guess your suggestion actually means that tracker should not consider *any* combining mark (either diacritic or not) during text search. I don't currently know if this is a good idea, so suggestions are welcome...
Comment 3 Aleksander Morgado 2010-06-02 10:45:06 UTC
Moved to Store category, as this is really an FTS issue.
Comment 4 Aleksander Morgado 2010-06-08 15:00:26 UTC
Moving to FTS component
Comment 5 AlNaimi 2010-06-09 16:43:08 UTC
Thanks guys...
But it's how all Arabic engine search works... either Google desktop search
Comment 6 Aleksander Morgado 2010-06-10 09:52:20 UTC
(In reply to comment #5)
> Thanks guys...
> But it's how all Arabic engine search works... either Google desktop search

We should probably do it, but just we don't know which the proper way to do it is. Are those above noted the only cases applicable to Arabic?
Comment 7 AlNaimi 2010-06-11 18:58:34 UTC
Let see:
First: To ignore the different and only the different between this characters:
أ - ا - إ -آ  >It should only refer to: ا
ه - ة >It should only refer to: ه
ي - ى >It should only refer to: ى
second: To ignore always this Characters:
ّ - َ - ً - ُ - ٌ - ِ - ٍ - ~ - ْ - ـ

This option should be the default option, but if option [find the exact phrase as written] is on or check, The engine should consider the differences.
And I'm very sorry, but i don't nothing about programing...
Thanks again
Comment 8 Sam Thursfield 2021-05-26 22:25:27 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new enhancement request ticket at
  https://gitlab.gnome.org/GNOME/tracker/-/issues/

Thank you for your understanding and your help.