Bug 588753 - Tracker should search arabic characters with 2 Options, Or in diffrent way.
Tracker should search arabic characters with 2 Options, Or in diffrent way.
Status: NEW
Product: tracker
Classification: Core
Component: FTS
0.9.x
Other All
: Normal major
: ---
Assigned To: tracker-general
Jamie McCracken
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2009-07-16 08:14 UTC by AlNaimi
Modified: 2013-03-20 12:26 UTC (History)
3 users (show)

See Also:
GNOME target: ---
GNOME version: ---


Attachments

Description AlNaimi 2009-07-16 08:14:09 UTC
First option, & the international way: (Which using by google desktop search & the web engine, MSN both the desktop search & web engine , Almost all Arabic programs):
To ignore the different between this characters:
أ - ا - إ -آ
ه - ة
ي - ى
The second option:
Is to find the exact phrase as written.
As i said before, the first option is regular way to search with Arabic.
I hope that you can add the to options Or to make the first one is the default.
Comment 1 AlNaimi 2009-07-24 05:36:38 UTC
Also, I forget to maintain that tracker search engine should also ignore:
ّ - َ - ً - ُ - ٌ - ِ - ٍ - ~ - ْ
These characters are usually used with the main arabic characters.
Comment 2 Aleksander Morgado 2010-06-01 21:25:53 UTC
We currently use your suggested 1st approach for "unaccenting" words, meaning that all combining diacritical marks are removed from the NFKD-normalized string (for FTS). Combining diacritical marks are considered those in these Unicode ranges:
 * 0x0300 to 0x036F
 * 0x1DC0 to 0x1DFF
 * 0x20D0 to 0x20FF
 * 0xFE20 to 0xFE2F

It really seems that those combining marks you suggest to ignore during text search do not go into those previous ranges, so I guess your suggestion actually means that tracker should not consider *any* combining mark (either diacritic or not) during text search. I don't currently know if this is a good idea, so suggestions are welcome...
Comment 3 Aleksander Morgado 2010-06-02 10:45:06 UTC
Moved to Store category, as this is really an FTS issue.
Comment 4 Aleksander Morgado 2010-06-08 15:00:26 UTC
Moving to FTS component
Comment 5 AlNaimi 2010-06-09 16:43:08 UTC
Thanks guys...
But it's how all Arabic engine search works... either Google desktop search
Comment 6 Aleksander Morgado 2010-06-10 09:52:20 UTC
(In reply to comment #5)
> Thanks guys...
> But it's how all Arabic engine search works... either Google desktop search

We should probably do it, but just we don't know which the proper way to do it is. Are those above noted the only cases applicable to Arabic?
Comment 7 AlNaimi 2010-06-11 18:58:34 UTC
Let see:
First: To ignore the different and only the different between this characters:
أ - ا - إ -آ  >It should only refer to: ا
ه - ة >It should only refer to: ه
ي - ى >It should only refer to: ى
second: To ignore always this Characters:
ّ - َ - ً - ُ - ٌ - ِ - ٍ - ~ - ْ - ـ

This option should be the default option, but if option [find the exact phrase as written] is on or check, The engine should consider the differences.
And I'm very sorry, but i don't nothing about programing...
Thanks again

Note You need to log in before you can comment on or make changes to this bug.