After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 748423 - thetvdb: Fuzzy series name matching
thetvdb: Fuzzy series name matching
Status: RESOLVED FIXED
Product: grilo
Classification: Other
Component: plugins
git master
Other Linux
: Normal normal
: ---
Assigned To: Victor Toso
grilo-maint
Depends on: 753563
Blocks:
 
 
Reported: 2015-04-24 16:38 UTC by Bastien Nocera
Modified: 2015-08-25 14:11 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
thetvdb: avoid cache miss with fuzzy series name (6.18 KB, patch)
2015-04-26 19:21 UTC, Victor Toso
none Details | Review
thetvdb: avoid cache miss with fuzzy series name (6.68 KB, patch)
2015-04-27 21:15 UTC, Victor Toso
none Details | Review
thetvdb: use define for gom table names (2.70 KB, patch)
2015-07-31 22:59 UTC, Victor Toso
committed Details | Review
thetvdb: avoid cache miss with fuzzy series name (16.17 KB, patch)
2015-07-31 22:59 UTC, Victor Toso
none Details | Review
avoid cache miss (16.25 KB, patch)
2015-08-12 15:21 UTC, Victor Toso
none Details | Review
thetvdb: avoid cache miss with fuzzy series name (16.88 KB, patch)
2015-08-25 12:33 UTC, Victor Toso
committed Details | Review
tests: improve helper function with cache only arg (3.28 KB, patch)
2015-08-25 12:33 UTC, Victor Toso
committed Details | Review
tests: thetvdb with fuzzy name in show name (83.12 KB, patch)
2015-08-25 12:33 UTC, Victor Toso
committed Details | Review
thetvdb: Always search show names in all languages (3.02 KB, patch)
2015-08-25 13:28 UTC, Bastien Nocera
committed Details | Review
tests: Add another alias test with a translated series name (2.82 KB, patch)
2015-08-25 13:28 UTC, Bastien Nocera
committed Details | Review

Description Bastien Nocera 2015-04-24 16:38:29 UTC
Adding "CSI - Miami" as a series name fails to match any existing series.

(test_thetvdb_resolve_shows:12135): Grilo-WARNING **: [thetvdb] grl-thetvdb.c:1234: Resolve operation failed due 'Could not find mock content Key file does not have group 'http://thetvdb.com/api/GetSeries.php?seriesname=CSI - Miami''
Comment 1 Victor Toso 2015-04-26 00:21:23 UTC
Okay, verified with grl-launch:

1) empty thetvdb database
2) $GRL_DEBUG=thetvdb:* grl-launch-0.2 -S -k description -C tokens.conf resolve grlvideo://grl-thetvdb/?show=CSI%20-%20Miami
3) thetvdb finds 'CSI: Miami'; all data are stored in cache;
4) a new request with "CSI - Miami" will fail to get from cache;

The approach I like most is a new entry in the database with the show name provided by application and then we always search there as well. 
This would avoid dealing with all kinds of names by just mapping what the user has to what thetvdb provided.

I'll try to make a patch for it.
Comment 2 Victor Toso 2015-04-26 19:21:31 UTC
Created attachment 302393 [details] [review]
thetvdb: avoid cache miss with fuzzy series name

Including the requested show name in the database we can map it with
what thetvdb returned. By looking in both columns we can avoid the cache
miss. e.g:

thetvdb: "CSI: Miami"
application: "CSI - Miami"
Comment 3 Victor Toso 2015-04-26 19:25:24 UTC
Not sure what I'm doing wrong in the gom db migration. From older version to new one, the new column/property is not created.

On new database, this patch works here.
Comment 4 Bastien Nocera 2015-04-27 09:12:54 UTC
(In reply to Victor Toso from comment #3)
> Not sure what I'm doing wrong in the gom db migration. From older version to
> new one, the new column/property is not created.
> 
> On new database, this patch works here.

Can you file a bug against gom with that? Even better would be to integrate a test case into gom.
Comment 5 Bastien Nocera 2015-04-27 09:14:44 UTC
Review of attachment 302393 [details] [review]:

You forgot to bump the DB version in the gom_repository_automatic_migrate_async() call...
Comment 6 Bastien Nocera 2015-04-27 09:16:18 UTC
Review of attachment 302393 [details] [review]:

::: src/thetvdb/grl-thetvdb.c
@@ +737,3 @@
+   * one from application as well in order to avoid cache-miss. */
+  if (show != NULL && requested_show != NULL
+      && g_ascii_strcasecmp (show, requested_show) != 0) {

Why not use g_strcmp0() which will handle UTF-8 show names?
Comment 7 Victor Toso 2015-04-27 09:34:36 UTC
(In reply to Bastien Nocera from comment #5)
> Review of attachment 302393 [details] [review] [review]:
> 
> You forgot to bump the DB version in the
> gom_repository_automatic_migrate_async() call...

Thanks, I'll fix it and test it before resend.

(In reply to Bastien Nocera from comment #6)
> Review of attachment 302393 [details] [review] [review]:
> 
> ::: src/thetvdb/grl-thetvdb.c
> @@ +737,3 @@
> +   * one from application as well in order to avoid cache-miss. */
> +  if (show != NULL && requested_show != NULL
> +      && g_ascii_strcasecmp (show, requested_show) != 0) {
> 
> Why not use g_strcmp0() which will handle UTF-8 show names?

I did not think about non-ascii show names. I'll fix it, thanks!
Comment 8 Victor Toso 2015-04-27 21:15:42 UTC
Created attachment 302485 [details] [review]
thetvdb: avoid cache miss with fuzzy series name

Including the requested show name in the database we can map it with
what thetvdb returned. By looking in both columns we can avoid the cache
miss. e.g:

thetvdb: "CSI: Miami"
application: "CSI - Miami"
Comment 9 Victor Toso 2015-04-27 21:18:03 UTC
(In reply to Bastien Nocera from comment #5)
> Review of attachment 302393 [details] [review] [review]:
> 
> You forgot to bump the DB version in the
> gom_repository_automatic_migrate_async() call...

Silly me, I thought I only need to put what is new on the resource. Thanks :-)
Comment 10 Bastien Nocera 2015-04-28 09:21:55 UTC
Review of attachment 302485 [details] [review]:

::: src/thetvdb/grl-thetvdb.c
@@ +395,3 @@
   tables = g_list_prepend (NULL, GINT_TO_POINTER (SERIES_TYPE_RESOURCE));
   tables = g_list_prepend (tables, GINT_TO_POINTER (EPISODE_TYPE_RESOURCE));
+  gom_repository_automatic_migrate_async (source->priv->repository, 3, tables,

Would be nice to put that in a #define.

::: src/thetvdb/thetvdb-resources-series.c
@@ +338,3 @@
                                    specs[PROP_URL_POSTER]);
+  /* New in version 2 */
+  specs[PROP_OTHER_NAMES] = g_param_spec_string (SERIES_COLUMN_OTHER_NAMES,

Hmm, this really isn't too great. That means that we can only store a single alternate name for each series, which is bound to cause problems.
Comment 11 Victor Toso 2015-04-28 09:29:38 UTC
(In reply to Bastien Nocera from comment #10)
> Review of attachment 302485 [details] [review] [review]:
> 
> ::: src/thetvdb/grl-thetvdb.c
> @@ +395,3 @@
>    tables = g_list_prepend (NULL, GINT_TO_POINTER (SERIES_TYPE_RESOURCE));
>    tables = g_list_prepend (tables, GINT_TO_POINTER (EPISODE_TYPE_RESOURCE));
> +  gom_repository_automatic_migrate_async (source->priv->repository, 3,
> tables,
> 
> Would be nice to put that in a #define.

Sure.

> 
> ::: src/thetvdb/thetvdb-resources-series.c
> @@ +338,3 @@
>                                     specs[PROP_URL_POSTER]);
> +  /* New in version 2 */
> +  specs[PROP_OTHER_NAMES] = g_param_spec_string (SERIES_COLUMN_OTHER_NAMES,
> 
> Hmm, this really isn't too great. That means that we can only store a single
> alternate name for each series, which is bound to cause problems.

We can have multiple names as well, maybe split by '|' like we have in Actors/Directors. This shoudn't affect the sql query and it's not hard to set after the web request.

But I'm not sure how useful it will be, do you expect several folders/names for the same series?
Comment 12 Bastien Nocera 2015-04-28 09:46:58 UTC
I would add a separate table with name (the various matches from the web service) and tvdb-id columns. The tvdb-id column would have a reference to the primary key in the real series DB.

See gom_resource_class_set_reference().

(Better than "|" as a separator)
Comment 13 Victor Toso 2015-07-31 22:59:20 UTC
Created attachment 308589 [details] [review]
thetvdb: use define for gom table names

This is useful when referencing external tables as we are going to do in
the next commit.
Comment 14 Victor Toso 2015-07-31 22:59:34 UTC
Created attachment 308590 [details] [review]
thetvdb: avoid cache miss with fuzzy series name

Including the requested show name in the database we can map it with
what thetvdb returned. By looking in both columns we can avoid the cache
miss. e.g:

thetvdb: "CSI: Miami"
application: "CSI - Miami"
Comment 15 Victor Toso 2015-07-31 23:03:11 UTC
Those patches are working but I did not test yet for leak and migration.
I think including a test case for migration would be good.

Anyway, I think I'm not using the "gom_resource_class_set_reference()" correctly, I was expecting that a query on the auxiliar table I could get the GomResource for the main table
e.g query fuzzy-name in the fuzzy-table I would get row for main series table.

Sorry for the delay
Comment 16 Bastien Nocera 2015-08-03 13:57:03 UTC
Review of attachment 308589 [details] [review]:

Looks good.
Comment 17 Bastien Nocera 2015-08-03 13:59:58 UTC
Review of attachment 308590 [details] [review]:

Looks good, do you have test updates for that feature?
Comment 18 Victor Toso 2015-08-12 15:20:32 UTC
(In reply to Bastien Nocera from comment #17)
> Review of attachment 308590 [details] [review] [review]:
> 
> Looks good, do you have test updates for that feature?

Yup. Current vertion (to be attached) works fine on new databases but when migrating from old database it does not create the fuzzy naming database.
Bug created #753563

Following patch changed only the bits related to saving the show-name and requested-show-name

The reviewed patch would *always* save the show-name for every requested-show-name. Current patch only insert show-name when it succeed to insert in series table.

Either this or making series-name unique on fuzzy-series-name database.

<snip>

  gom_resource_save_sync (GOM_RESOURCE (sres), &error);
  if (error != NULL) {
    GRL_DEBUG ("Failed to store series '%s' due %s",
                show, error->message);
    g_error_free (error);

  } else if (series_id != NULL) {
    /* This is a new series to our db. Keep it on fuzzy naming db as well */
      cache_save_fuzzy_series_names (repository, show, series_id);
  }

  if (series_id != NULL && requested_show != NULL &&
      g_strcmp0 (show, requested_show) != 0) { 
    /* Always save the user's requested show to our fuzzy naming db */
    cache_save_fuzzy_series_names (repository, requested_show, series_id);
  }

</snip>
Comment 19 Victor Toso 2015-08-12 15:21:46 UTC
Created attachment 309163 [details] [review]
avoid cache miss
Comment 20 Bastien Nocera 2015-08-24 22:15:44 UTC
The gom bug is fixed, just waiting on a test for this feature.
Comment 21 Victor Toso 2015-08-25 12:33:24 UTC
Created attachment 309952 [details] [review]
thetvdb: avoid cache miss with fuzzy series name

Including the requested show name in the database we can map it with
what thetvdb returned. By looking in both columns we can avoid the cache
miss. e.g:

thetvdb: "CSI: Miami"
application: "CSI - Miami"
Comment 22 Victor Toso 2015-08-25 12:33:34 UTC
Created attachment 309953 [details] [review]
tests: improve helper function with cache only arg

And also check if out args are not NULL; both improvements will be used
in the next commit
Comment 23 Victor Toso 2015-08-25 12:33:43 UTC
Created attachment 309954 [details] [review]
tests: thetvdb with fuzzy name in show name
Comment 24 Bastien Nocera 2015-08-25 13:28:17 UTC
Created attachment 309963 [details] [review]
thetvdb: Always search show names in all languages

Also update the tests' mock data to match.
Comment 25 Bastien Nocera 2015-08-25 13:28:25 UTC
Created attachment 309964 [details] [review]
tests: Add another alias test with a translated series name
Comment 26 Bastien Nocera 2015-08-25 13:31:46 UTC
Review of attachment 309952 [details] [review]:

Looks good.
Comment 27 Bastien Nocera 2015-08-25 13:32:20 UTC
Review of attachment 309953 [details] [review]:

Sure.
Comment 28 Bastien Nocera 2015-08-25 13:35:20 UTC
Review of attachment 309954 [details] [review]:

Looks good.
Comment 29 Victor Toso 2015-08-25 14:06:20 UTC
Review of attachment 309963 [details] [review]:

Yes!
Comment 30 Victor Toso 2015-08-25 14:06:32 UTC
Review of attachment 309964 [details] [review]:

Yep!
Comment 31 Bastien Nocera 2015-08-25 14:10:12 UTC
Attachment 308589 [details] pushed as 66710d9 - thetvdb: use define for gom table names
Attachment 309952 [details] pushed as 510f88a - thetvdb: avoid cache miss with fuzzy series name
Attachment 309953 [details] pushed as 2a6dcfc - tests: improve helper function with cache only arg
Attachment 309954 [details] pushed as 060472c - tests: thetvdb with fuzzy name in show name
Attachment 309963 [details] pushed as 2b90f1b - thetvdb: Always search show names in all languages
Attachment 309964 [details] pushed as bd39ef8 - tests: Add another alias test with a translated series name