GNOME Bugzilla – Bug 619244
Use a custom unaccenting method instead of libunac
Last modified: 2010-06-15 11:12:58 UTC
libunac seems not good in the tracker case, mainly because its requirement of converting to UTF-16. libunac can be replaced with our own custom unaccenting method which just does the following: * Get input UTF-8 encoded string * Apply a compatibility decomposition (just do a NFKD normalization) * Iterate over the whole normalized string, and if found a combining diacritical mark character just remove it. This operation could be done even in-place, without needing an extra buffer, as it is just removing characters from the string. * Output the UTF-8 NFKD-normalized string without combining diacritical marks. This method will actually be perfect when using the libunistring based parser, as the casefolding method allows normalizing the output in NFKD in the same call, so just an extra character-per-character iteration would be needed to look for the mark characters.
Created attachment 162016 [details] [review] Custom unaccent instead of libunac
Created attachment 162463 [details] [review] Rebased patch with master
Pushed the patch to a new 'drop-unac' branch in gnome git, should be easier to rebase and review.
Comment on attachment 162463 [details] [review] Rebased patch with master Marking as obsolete patch, as it's now managed in a separate gnome git branch.
Moving to FTS component
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.