GNOME Bugzilla – Bug 588753
Tracker should search arabic characters with 2 Options, Or in diffrent way.
Last modified: 2021-05-26 22:25:27 UTC
First option, & the international way: (Which using by google desktop search & the web engine, MSN both the desktop search & web engine , Almost all Arabic programs): To ignore the different between this characters: أ - ا - إ -آ ه - ة ي - ى The second option: Is to find the exact phrase as written. As i said before, the first option is regular way to search with Arabic. I hope that you can add the to options Or to make the first one is the default.
Also, I forget to maintain that tracker search engine should also ignore: ّ - َ - ً - ُ - ٌ - ِ - ٍ - ~ - ْ These characters are usually used with the main arabic characters.
We currently use your suggested 1st approach for "unaccenting" words, meaning that all combining diacritical marks are removed from the NFKD-normalized string (for FTS). Combining diacritical marks are considered those in these Unicode ranges: * 0x0300 to 0x036F * 0x1DC0 to 0x1DFF * 0x20D0 to 0x20FF * 0xFE20 to 0xFE2F It really seems that those combining marks you suggest to ignore during text search do not go into those previous ranges, so I guess your suggestion actually means that tracker should not consider *any* combining mark (either diacritic or not) during text search. I don't currently know if this is a good idea, so suggestions are welcome...
Moved to Store category, as this is really an FTS issue.
Moving to FTS component
Thanks guys... But it's how all Arabic engine search works... either Google desktop search
(In reply to comment #5) > Thanks guys... > But it's how all Arabic engine search works... either Google desktop search We should probably do it, but just we don't know which the proper way to do it is. Are those above noted the only cases applicable to Arabic?
Let see: First: To ignore the different and only the different between this characters: أ - ا - إ -آ >It should only refer to: ا ه - ة >It should only refer to: ه ي - ى >It should only refer to: ى second: To ignore always this Characters: ّ - َ - ً - ُ - ٌ - ِ - ٍ - ~ - ْ - ـ This option should be the default option, but if option [find the exact phrase as written] is on or check, The engine should consider the differences. And I'm very sorry, but i don't nothing about programing... Thanks again
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new enhancement request ticket at https://gitlab.gnome.org/GNOME/tracker/-/issues/ Thank you for your understanding and your help.