GNOME Bugzilla – Bug 785146
Collation not taking in account articles not followed by a space but rather by an apostrophe
Last modified: 2017-07-24 08:34:37 UTC
I noticed that tracker always assumes that an article is followed by a space when determining collation in src/libtracker-data/tracker-collation.c (tracker_collation_utf8_title), as can be seen in this commit: https://git.gnome.org/browse/tracker/commit/?id=80606e70475409f1e3dea47aee15a4c3b038aef9 This is not always a safe assumption to make for languages other than English, though. It is not true at least in Italian and French, and I suspect in other Latin languages too, like Spanish. You have items such as "l'enfant prodige" or "un'altra cosa" which both contain an elision of a vowel with the next word (through the apostrophe). No space is present there.
Additionally see this: https://bugzilla.gnome.org/show_bug.cgi?id=771348 , there was a reason we started using '|'. I didn't check that part of the patch, my bad.
Created attachment 355982 [details] [review] libtracker-data: Use '|' as separator for title articles. As suggested in https://bugzilla.gnome.org/show_bug.cgi?id=785146 and following https://bugzilla.gnome.org/show_bug.cgi?id=771348, change the separator used for common articles. Translators, apologies for the moving target.
Created attachment 355983 [details] [review] libtracker-data: Don't rely on spaces as separators on title collation Check the break type of the next char following the prefix match, instead of relying it shall be an space.
Comment on attachment 355982 [details] [review] libtracker-data: Use '|' as separator for title articles. Attachment 355982 [details] pushed as 07ef768 - libtracker-data: Use '|' as separator for title articles.
(In reply to Carlos Garnacho from comment #3) > Created attachment 355983 [details] [review] [review] > libtracker-data: Don't rely on spaces as separators on title collation > > Check the break type of the next char following the prefix match, instead > of relying it shall be an space. I'll leave this one brewing a bit here. AFAICT should be generic enough. (In reply to Matteo Settenvini from comment #0) > This is not always a safe assumption to make for languages other than > English, though. It is not true at least in Italian and French, and I > suspect in other Latin languages too, like Spanish. Not Spanish, that'd be too embarrassing. Just about every other official and non official language in Spain, though.
Created attachment 356063 [details] [review] libtracker-data: Don't rely on spaces as separators on title collation Skip non alphanumeric characters both at the beginning of titles, and after the prefix match. Of course, require at least one such non alphanumeric character after the prefix match, in order to avoid matching beginnings of words.
The last patch skips non-alphanumeric characters both at the beginning and after the prefix match (if any), after Marinus pointed out something similar was being additionally done in the gnome-music queries. The patch seems to work as expected from my testing, so I just pushed it to master. Attachment 356063 [details] pushed as 0dab836 - libtracker-data: Don't rely on spaces as separators on title collation