After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 616836 - Use libunistring's u8_normalize() instead of GLib's g_utf8_normalize()
Use libunistring's u8_normalize() instead of GLib's g_utf8_normalize()
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: General
0.9.x
Other Linux
: Normal normal
: ---
Assigned To: tracker-indexer
Jamie McCracken
Depends on:
Blocks:
 
 
Reported: 2010-04-26 10:56 UTC by Aleksander Morgado
Modified: 2010-05-20 17:02 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Aleksander Morgado 2010-04-26 10:56:24 UTC
GLib's Unicode normalization methods strongly rely on heap allocations to perform the normalization.

libunistring's normalization methods don't allocate themselves the output buffer, and thus, even stack-allocated memory can be used to perform the normalization:
http://www.gnu.org/software/libunistring/manual/libunistring.html#Normalization-of-strings

Thus, linking to libunistring to perform Unicode normalizations could really improve the performance of the parsing operations.

Also:
 * A full-text normalization instead of a word-by-word one could be done.
 * Same approach of using libunistring could be applied for casefold-ing done just before normalization.

Note: instead of libunistring, libicu is also probably a good choice:
http://bugs.icu-project.org/trac/browser/icu/trunk/source/common/unicode/unorm2.h
Comment 1 Aleksander Morgado 2010-05-11 12:41:32 UTC
This issue is now addressed in the "parser-unicode-libs-review" branch in gnome git.

Both libunistring and libicu choices are given.
Comment 2 Martyn Russell 2010-05-17 13:33:49 UTC
Moving "Indexer" component bugs to "General" since "Indexer" refers to the old 0.6 architecture
Comment 3 Martyn Russell 2010-05-20 17:02:15 UTC
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.