After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 619244 - Use a custom unaccenting method instead of libunac
Use a custom unaccenting method instead of libunac
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: FTS
0.9.x
Other Linux
: Normal enhancement
: ---
Assigned To: tracker-general
Jamie McCracken
Depends on:
Blocks:
 
 
Reported: 2010-05-20 22:57 UTC by Aleksander Morgado
Modified: 2010-06-15 11:12 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Custom unaccent instead of libunac (30.80 KB, patch)
2010-05-26 13:10 UTC, Aleksander Morgado
none Details | Review
Rebased patch with master (32.95 KB, patch)
2010-06-01 14:40 UTC, Aleksander Morgado
none Details | Review

Description Aleksander Morgado 2010-05-20 22:57:39 UTC
libunac seems not good in the tracker case, mainly because its requirement of converting to UTF-16.

libunac can be replaced with our own custom unaccenting method which just does the following:
 * Get input UTF-8 encoded string
 * Apply a compatibility decomposition (just do a NFKD normalization)
 * Iterate over the whole normalized string, and if found a combining diacritical mark character just remove it. This operation could be done even in-place, without needing an extra buffer, as it is just removing characters from the string.
 * Output the UTF-8 NFKD-normalized string without combining diacritical marks.

This method will actually be perfect when using the libunistring based parser, as the casefolding method allows normalizing the output in NFKD in the same call, so just an extra character-per-character iteration would be needed to look for the mark characters.
Comment 1 Aleksander Morgado 2010-05-26 13:10:44 UTC
Created attachment 162016 [details] [review]
Custom unaccent instead of libunac
Comment 2 Aleksander Morgado 2010-06-01 14:40:03 UTC
Created attachment 162463 [details] [review]
Rebased patch with master
Comment 3 Aleksander Morgado 2010-06-07 09:11:42 UTC
Pushed the patch to a new 'drop-unac' branch in gnome git, should be easier to rebase and review.
Comment 4 Aleksander Morgado 2010-06-07 10:31:43 UTC
Comment on attachment 162463 [details] [review]
Rebased patch with master

Marking as obsolete patch, as it's now managed in a separate gnome git branch.
Comment 5 Aleksander Morgado 2010-06-08 15:00:52 UTC
Moving to FTS component
Comment 6 Martyn Russell 2010-06-15 11:12:58 UTC
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.