GNOME Bugzilla – Bug 739924
In search, normalize characters with accents/diacritics
Last modified: 2015-12-17 12:55:58 UTC
In most other media players searching for "Bjork" for example will return music by "Björk", however this is not the case in Gnome Music.
That would be rather complicated, but seems feasible. Probably the most universal way would be adjusting tracker queries
I'm not familiar with languages that needs normalization, so I didn't noticed the problem before. We could use tracker:case-fold, tracker:normalize and tracker:unaccent on tracker keys before comparing them to strings that will undergo GLib.utf8_normalize and GLib.utf8_casefold.
I am currently working on it. May I get this bug assigned to me?
I don't think anybody cares about assignees, but here you go. Thanks for working on this!
I tried to fix this bug following Arnel's suggestion of "cleaning up" accents via tracker:unaccent, tracker:normalize and tracker:case-fold on the tracker side, and using GLib.utf8_casefold and GLib.utf8_normalize on the string the user is searching for. However, this way accents are only really cleaned up on the tracker side: if the user searches for "Bjork", both "Björk" and "Bjork" are found, while no matches are found if the user searches for "Björk" directly. The output for GLib.utf8_casefold(GLib.utf8_normalize("Björk")) with any normalization mode is still "björk". This is not a match for "bjork", which is the result of the tracker:unaccent function. I found an equivalent bug report for gnome-documents: https://bugzilla.gnome.org/show_bug.cgi?id=722254, which led to the development of the tracker:unaccent function. In the function, there seems to be some additional computation done on top of a normalization operation. Intuitively, I'd expect to have an equivalent GLib.unaccent function, however there is no such function and gnome-documents does accented search fine.
Created attachment 317503 [details] [review] query: Find strings with accents When searching for an artist/track/album w. accented letters, no match is found. This patch changes the queries to find both accented and unaccented strings when an equivalent unaccented string is searched for, while only accented strings are found when an accented string is searched.
Comment on attachment 317503 [details] [review] query: Find strings with accents This looks correct to me given the current gnome-music queries. Anyhow, in the future might be worth to look into redesigning gnome-music queries, there's two things that could help: - Using fts:match. By default it searches on all the full-text-search capable properties for the matched elements. However it could be tweaked to look into specific fields as these queries are doing. This should bring a performance boost, because the way we're currently matching strings comes down to a linear table scan, so is bound to grow slower with the number of elements. Besides, FTS support already unaccents by default. - Using sparql1.1's BIND (new in tracker 1.7.x/master), which would allow us to at least perform certain common operations once for each match, eg: SELECT ... WHERE { ?album a nmm:MusicAlbum . BIND (tracker:normalize(nie:title(?album), 'nkfd') AS ?normalized) . FILTER (fn:contains (tracker:case-fold (tracker:unaccent (?normalized)), "foo") || fn:contains (tracker:case-fold (?normalized), "foo")) } that would allow doing the tracker:normalize(...) operation once at the query level, instead of twice (in each fn:contains) Nonetheless, I set the patch as "reviewed", but counts as "accepted-commitnow" to me, I don't think this belongs to this bug anyway, rather a new enhancement one.
Comment on attachment 317503 [details] [review] query: Find strings with accents (In reply to Mattia Cerrato from comment #6) > Created attachment 317503 [details] [review] [review] > query: Find strings with accents > > When searching for an artist/track/album w. accented > letters, no match is found. This patch changes the queries > to find both accented and unaccented strings when an equivalent > unaccented string is searched for, while only accented strings > are found when an accented string is searched. after Carlos Garnachos ack, sure.
pushed to master as https://git.gnome.org/browse/gnome-music/commit/?id=bcd19561ebacb8c2f645694f4642942f1d7cf626 I opened a new bug for the queries refactoring/enhancement: https://bugzilla.gnome.org/show_bug.cgi?id=759587