After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 785146 - Collation not taking in account articles not followed by a space but rather by an apostrophe
Collation not taking in account articles not followed by a space but rather b...
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: General
git master
Other Linux
: Normal normal
: ---
Assigned To: tracker-general
tracker-general
Depends on:
Blocks:
 
 
Reported: 2017-07-19 19:18 UTC by Matteo Settenvini
Modified: 2017-07-24 08:34 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
libtracker-data: Use '|' as separator for title articles. (1.69 KB, patch)
2017-07-19 21:08 UTC, Carlos Garnacho
committed Details | Review
libtracker-data: Don't rely on spaces as separators on title collation (2.36 KB, patch)
2017-07-19 21:08 UTC, Carlos Garnacho
none Details | Review
libtracker-data: Don't rely on spaces as separators on title collation (3.65 KB, patch)
2017-07-20 17:44 UTC, Carlos Garnacho
committed Details | Review

Description Matteo Settenvini 2017-07-19 19:18:09 UTC
I noticed that tracker always assumes that an article is followed by a space when determining collation in src/libtracker-data/tracker-collation.c (tracker_collation_utf8_title), as can be seen in this commit:

https://git.gnome.org/browse/tracker/commit/?id=80606e70475409f1e3dea47aee15a4c3b038aef9

This is not always a safe assumption to make for languages other than English, though. It is not true at least in Italian and French, and I suspect in other Latin languages too, like Spanish. 

You have items such as "l'enfant prodige" or "un'altra cosa" which both contain an elision of a vowel with the next word (through the apostrophe). 

No space is present there.
Comment 1 Marinus Schraal 2017-07-19 20:09:01 UTC
Additionally see this: https://bugzilla.gnome.org/show_bug.cgi?id=771348 , there was a reason we started using '|'. I didn't check that part of the patch, my bad.
Comment 2 Carlos Garnacho 2017-07-19 21:08:10 UTC
Created attachment 355982 [details] [review]
libtracker-data: Use '|' as separator for title articles.

As suggested in https://bugzilla.gnome.org/show_bug.cgi?id=785146
and following https://bugzilla.gnome.org/show_bug.cgi?id=771348,
change the separator used for common articles.

Translators, apologies for the moving target.
Comment 3 Carlos Garnacho 2017-07-19 21:08:16 UTC
Created attachment 355983 [details] [review]
libtracker-data: Don't rely on spaces as separators on title collation

Check the break type of the next char following the prefix match, instead
of relying it shall be an space.
Comment 4 Carlos Garnacho 2017-07-19 21:09:57 UTC
Comment on attachment 355982 [details] [review]
libtracker-data: Use '|' as separator for title articles.

Attachment 355982 [details] pushed as 07ef768 - libtracker-data: Use '|' as separator for title articles.
Comment 5 Carlos Garnacho 2017-07-19 21:18:26 UTC
(In reply to Carlos Garnacho from comment #3)
> Created attachment 355983 [details] [review] [review]
> libtracker-data: Don't rely on spaces as separators on title collation
> 
> Check the break type of the next char following the prefix match, instead
> of relying it shall be an space.

I'll leave this one brewing a bit here. AFAICT should be generic enough.

(In reply to Matteo Settenvini from comment #0)
> This is not always a safe assumption to make for languages other than
> English, though. It is not true at least in Italian and French, and I
> suspect in other Latin languages too, like Spanish. 

Not Spanish, that'd be too embarrassing. Just about every other official and non official language in Spain, though.
Comment 6 Carlos Garnacho 2017-07-20 17:44:13 UTC
Created attachment 356063 [details] [review]
libtracker-data: Don't rely on spaces as separators on title collation

Skip non alphanumeric characters both at the beginning of titles, and after
the prefix match. Of course, require at least one such non alphanumeric
character after the prefix match, in order to avoid matching beginnings of
words.
Comment 7 Carlos Garnacho 2017-07-24 08:34:33 UTC
The last patch skips non-alphanumeric characters both at the beginning and
after the prefix match (if any), after Marinus pointed out something similar
was being additionally done in the gnome-music queries.

The patch seems to work as expected from my testing, so I just pushed it to
master.

Attachment 356063 [details] pushed as 0dab836 - libtracker-data: Don't rely on spaces as separators on title collation