After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 739924 - In search, normalize characters with accents/diacritics
In search, normalize characters with accents/diacritics
Status: RESOLVED FIXED
Product: gnome-music
Classification: Applications
Component: general
unspecified
Other Linux
: Normal enhancement
: 3.18
Assigned To: Mattia Cerrato
gnome-music-maint
Depends on:
Blocks:
 
 
Reported: 2014-11-10 21:18 UTC by Keir Lawson
Modified: 2015-12-17 12:55 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
query: Find strings with accents (12.79 KB, patch)
2015-12-16 15:14 UTC, Mattia Cerrato
committed Details | Review

Description Keir Lawson 2014-11-10 21:18:28 UTC
In most other media players searching for "Bjork" for example will return music by "Björk", however this is not the case in Gnome Music.
Comment 1 Vadim Rutkovsky 2014-11-12 20:36:58 UTC
That would be rather complicated, but seems feasible. Probably the most universal way would be adjusting tracker queries
Comment 2 Arnel Borja 2014-11-16 16:08:51 UTC
I'm not familiar with languages that needs normalization, so I didn't noticed the problem before. We could use tracker:case-fold, tracker:normalize and tracker:unaccent on tracker keys before comparing them to strings that will undergo GLib.utf8_normalize and GLib.utf8_casefold.
Comment 3 Mattia Cerrato 2015-12-11 17:52:03 UTC
I am currently working on it. May I get this bug assigned to me?
Comment 4 André Klapper 2015-12-12 12:03:54 UTC
I don't think anybody cares about assignees, but here you go. 
Thanks for working on this!
Comment 5 Mattia Cerrato 2015-12-15 10:57:35 UTC
I tried to fix this bug following Arnel's suggestion of "cleaning up" accents via tracker:unaccent, tracker:normalize and tracker:case-fold on the tracker side, and using GLib.utf8_casefold and GLib.utf8_normalize on the string the user is searching for.
However, this way accents are only really cleaned up on the tracker side: if the user searches for "Bjork", both "Björk" and "Bjork" are found, while no matches are found if the user searches for "Björk" directly. The output for GLib.utf8_casefold(GLib.utf8_normalize("Björk")) with any normalization mode is still "björk". This is not a match for "bjork", which is the result of the tracker:unaccent function.

I found an equivalent bug report for gnome-documents: https://bugzilla.gnome.org/show_bug.cgi?id=722254, which led to the development of the tracker:unaccent function. In the function, there seems to be some additional computation done on top of a normalization operation.

Intuitively, I'd expect to have an equivalent GLib.unaccent function, however there is no such function and gnome-documents does accented search fine.
Comment 6 Mattia Cerrato 2015-12-16 15:14:46 UTC
Created attachment 317503 [details] [review]
query: Find strings with accents

When searching for an artist/track/album w. accented
letters, no match is found. This patch changes the queries
to find both accented and unaccented strings when an equivalent
unaccented string is searched for, while only accented strings
are found when an accented string is searched.
Comment 7 Carlos Garnacho 2015-12-17 12:06:17 UTC
Comment on attachment 317503 [details] [review]
query: Find strings with accents

This looks correct to me given the current gnome-music queries.

Anyhow, in the future might be worth to look into redesigning gnome-music queries, there's two things that could help:

- Using fts:match. By default it searches on all the full-text-search capable properties for the matched elements. However it could be tweaked to look into specific fields as these queries are doing. This should bring a performance boost, because the way we're currently matching strings comes down to a linear table scan, so is bound to grow slower with the number of elements.

  Besides, FTS support already unaccents by default.

- Using sparql1.1's BIND (new in tracker 1.7.x/master), which would allow us to at least perform certain common operations once for each match, eg:

  SELECT ... WHERE {
    ?album a nmm:MusicAlbum .
    BIND (tracker:normalize(nie:title(?album), 'nkfd') AS ?normalized) .
    FILTER (fn:contains (tracker:case-fold (tracker:unaccent (?normalized)), "foo") ||
            fn:contains (tracker:case-fold (?normalized), "foo"))
  }

  that would allow doing the tracker:normalize(...) operation once at the query level, instead of twice (in each fn:contains)


Nonetheless, I set the patch as "reviewed", but counts as "accepted-commitnow" to me, I don't think this belongs to this bug anyway, rather a new enhancement one.
Comment 8 Felipe Borges 2015-12-17 12:52:16 UTC
Comment on attachment 317503 [details] [review]
query: Find strings with accents

(In reply to Mattia Cerrato from comment #6)
> Created attachment 317503 [details] [review] [review]
> query: Find strings with accents
> 
> When searching for an artist/track/album w. accented
> letters, no match is found. This patch changes the queries
> to find both accented and unaccented strings when an equivalent
> unaccented string is searched for, while only accented strings
> are found when an accented string is searched.

after Carlos Garnachos ack, sure.
Comment 9 Felipe Borges 2015-12-17 12:55:13 UTC
pushed to master as https://git.gnome.org/browse/gnome-music/commit/?id=bcd19561ebacb8c2f645694f4642942f1d7cf626

I opened a new bug for the queries refactoring/enhancement:
https://bugzilla.gnome.org/show_bug.cgi?id=759587