GNOME Bugzilla – Bug 616836
Use libunistring's u8_normalize() instead of GLib's g_utf8_normalize()
Last modified: 2010-05-20 17:02:15 UTC
GLib's Unicode normalization methods strongly rely on heap allocations to perform the normalization. libunistring's normalization methods don't allocate themselves the output buffer, and thus, even stack-allocated memory can be used to perform the normalization: http://www.gnu.org/software/libunistring/manual/libunistring.html#Normalization-of-strings Thus, linking to libunistring to perform Unicode normalizations could really improve the performance of the parsing operations. Also: * A full-text normalization instead of a word-by-word one could be done. * Same approach of using libunistring could be applied for casefold-ing done just before normalization. Note: instead of libunistring, libicu is also probably a good choice: http://bugs.icu-project.org/trac/browser/icu/trunk/source/common/unicode/unorm2.h
This issue is now addressed in the "parser-unicode-libs-review" branch in gnome git. Both libunistring and libicu choices are given.
Moving "Indexer" component bugs to "General" since "Indexer" refers to the old 0.6 architecture
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.