GNOME Bugzilla – Bug 636074
libunistring and glib collations break fn:starts-with
Last modified: 2011-02-02 17:28:11 UTC
Current fn:starts-with processing assumes U+10FFFD (last possible valid unicode point) is always sorted last when collating: http://git.gnome.org/browse/tracker/commit/?id=de4a44f8c51cbc6026837f0ab017def565baf6d7 But for collators based on strcoll() as the glib or the libunistring ones, this is not true. In this case, undefined code points as U+10FFFD are completely ignored during the collation, which of course is wrong based on Unicode rules (section 7.1.2 of UTS#10, http://unicode.org/reports/tr10/#Unassigned_And_Other). The collation methods in glibc seem to be broken since ages with respect to this Unicode specification.
In my opinion, we should add a SQLite function that implements fn:starts-with semantics in a way that works for glibc-based collators - possibly based on prefix matching of collation keys. As that is slower than the current approach - it can't use indices -, we should only fallback to that SQLite function if tracker is being built with glib or libunistring collators.
I have fixed distcheck as a temporary measure for now by disabling the test case when !libicu. This should be reverted when we have a solution for this bug. For the commit in question see: http://git.gnome.org/browse/tracker/commit/?id=c71aaa644e876b85bbadda7a8d42d1ede46b06ee
Raising importance, as we shouldn't release any stable version until this gets a fix...
This is now fixed in master.