GNOME Bugzilla – Bug 748423
thetvdb: Fuzzy series name matching
Last modified: 2015-08-25 14:11:02 UTC
Adding "CSI - Miami" as a series name fails to match any existing series. (test_thetvdb_resolve_shows:12135): Grilo-WARNING **: [thetvdb] grl-thetvdb.c:1234: Resolve operation failed due 'Could not find mock content Key file does not have group 'http://thetvdb.com/api/GetSeries.php?seriesname=CSI - Miami''
Okay, verified with grl-launch: 1) empty thetvdb database 2) $GRL_DEBUG=thetvdb:* grl-launch-0.2 -S -k description -C tokens.conf resolve grlvideo://grl-thetvdb/?show=CSI%20-%20Miami 3) thetvdb finds 'CSI: Miami'; all data are stored in cache; 4) a new request with "CSI - Miami" will fail to get from cache; The approach I like most is a new entry in the database with the show name provided by application and then we always search there as well. This would avoid dealing with all kinds of names by just mapping what the user has to what thetvdb provided. I'll try to make a patch for it.
Created attachment 302393 [details] [review] thetvdb: avoid cache miss with fuzzy series name Including the requested show name in the database we can map it with what thetvdb returned. By looking in both columns we can avoid the cache miss. e.g: thetvdb: "CSI: Miami" application: "CSI - Miami"
Not sure what I'm doing wrong in the gom db migration. From older version to new one, the new column/property is not created. On new database, this patch works here.
(In reply to Victor Toso from comment #3) > Not sure what I'm doing wrong in the gom db migration. From older version to > new one, the new column/property is not created. > > On new database, this patch works here. Can you file a bug against gom with that? Even better would be to integrate a test case into gom.
Review of attachment 302393 [details] [review]: You forgot to bump the DB version in the gom_repository_automatic_migrate_async() call...
Review of attachment 302393 [details] [review]: ::: src/thetvdb/grl-thetvdb.c @@ +737,3 @@ + * one from application as well in order to avoid cache-miss. */ + if (show != NULL && requested_show != NULL + && g_ascii_strcasecmp (show, requested_show) != 0) { Why not use g_strcmp0() which will handle UTF-8 show names?
(In reply to Bastien Nocera from comment #5) > Review of attachment 302393 [details] [review] [review]: > > You forgot to bump the DB version in the > gom_repository_automatic_migrate_async() call... Thanks, I'll fix it and test it before resend. (In reply to Bastien Nocera from comment #6) > Review of attachment 302393 [details] [review] [review]: > > ::: src/thetvdb/grl-thetvdb.c > @@ +737,3 @@ > + * one from application as well in order to avoid cache-miss. */ > + if (show != NULL && requested_show != NULL > + && g_ascii_strcasecmp (show, requested_show) != 0) { > > Why not use g_strcmp0() which will handle UTF-8 show names? I did not think about non-ascii show names. I'll fix it, thanks!
Created attachment 302485 [details] [review] thetvdb: avoid cache miss with fuzzy series name Including the requested show name in the database we can map it with what thetvdb returned. By looking in both columns we can avoid the cache miss. e.g: thetvdb: "CSI: Miami" application: "CSI - Miami"
(In reply to Bastien Nocera from comment #5) > Review of attachment 302393 [details] [review] [review]: > > You forgot to bump the DB version in the > gom_repository_automatic_migrate_async() call... Silly me, I thought I only need to put what is new on the resource. Thanks :-)
Review of attachment 302485 [details] [review]: ::: src/thetvdb/grl-thetvdb.c @@ +395,3 @@ tables = g_list_prepend (NULL, GINT_TO_POINTER (SERIES_TYPE_RESOURCE)); tables = g_list_prepend (tables, GINT_TO_POINTER (EPISODE_TYPE_RESOURCE)); + gom_repository_automatic_migrate_async (source->priv->repository, 3, tables, Would be nice to put that in a #define. ::: src/thetvdb/thetvdb-resources-series.c @@ +338,3 @@ specs[PROP_URL_POSTER]); + /* New in version 2 */ + specs[PROP_OTHER_NAMES] = g_param_spec_string (SERIES_COLUMN_OTHER_NAMES, Hmm, this really isn't too great. That means that we can only store a single alternate name for each series, which is bound to cause problems.
(In reply to Bastien Nocera from comment #10) > Review of attachment 302485 [details] [review] [review]: > > ::: src/thetvdb/grl-thetvdb.c > @@ +395,3 @@ > tables = g_list_prepend (NULL, GINT_TO_POINTER (SERIES_TYPE_RESOURCE)); > tables = g_list_prepend (tables, GINT_TO_POINTER (EPISODE_TYPE_RESOURCE)); > + gom_repository_automatic_migrate_async (source->priv->repository, 3, > tables, > > Would be nice to put that in a #define. Sure. > > ::: src/thetvdb/thetvdb-resources-series.c > @@ +338,3 @@ > specs[PROP_URL_POSTER]); > + /* New in version 2 */ > + specs[PROP_OTHER_NAMES] = g_param_spec_string (SERIES_COLUMN_OTHER_NAMES, > > Hmm, this really isn't too great. That means that we can only store a single > alternate name for each series, which is bound to cause problems. We can have multiple names as well, maybe split by '|' like we have in Actors/Directors. This shoudn't affect the sql query and it's not hard to set after the web request. But I'm not sure how useful it will be, do you expect several folders/names for the same series?
I would add a separate table with name (the various matches from the web service) and tvdb-id columns. The tvdb-id column would have a reference to the primary key in the real series DB. See gom_resource_class_set_reference(). (Better than "|" as a separator)
Created attachment 308589 [details] [review] thetvdb: use define for gom table names This is useful when referencing external tables as we are going to do in the next commit.
Created attachment 308590 [details] [review] thetvdb: avoid cache miss with fuzzy series name Including the requested show name in the database we can map it with what thetvdb returned. By looking in both columns we can avoid the cache miss. e.g: thetvdb: "CSI: Miami" application: "CSI - Miami"
Those patches are working but I did not test yet for leak and migration. I think including a test case for migration would be good. Anyway, I think I'm not using the "gom_resource_class_set_reference()" correctly, I was expecting that a query on the auxiliar table I could get the GomResource for the main table e.g query fuzzy-name in the fuzzy-table I would get row for main series table. Sorry for the delay
Review of attachment 308589 [details] [review]: Looks good.
Review of attachment 308590 [details] [review]: Looks good, do you have test updates for that feature?
(In reply to Bastien Nocera from comment #17) > Review of attachment 308590 [details] [review] [review]: > > Looks good, do you have test updates for that feature? Yup. Current vertion (to be attached) works fine on new databases but when migrating from old database it does not create the fuzzy naming database. Bug created #753563 Following patch changed only the bits related to saving the show-name and requested-show-name The reviewed patch would *always* save the show-name for every requested-show-name. Current patch only insert show-name when it succeed to insert in series table. Either this or making series-name unique on fuzzy-series-name database. <snip> gom_resource_save_sync (GOM_RESOURCE (sres), &error); if (error != NULL) { GRL_DEBUG ("Failed to store series '%s' due %s", show, error->message); g_error_free (error); } else if (series_id != NULL) { /* This is a new series to our db. Keep it on fuzzy naming db as well */ cache_save_fuzzy_series_names (repository, show, series_id); } if (series_id != NULL && requested_show != NULL && g_strcmp0 (show, requested_show) != 0) { /* Always save the user's requested show to our fuzzy naming db */ cache_save_fuzzy_series_names (repository, requested_show, series_id); } </snip>
Created attachment 309163 [details] [review] avoid cache miss
The gom bug is fixed, just waiting on a test for this feature.
Created attachment 309952 [details] [review] thetvdb: avoid cache miss with fuzzy series name Including the requested show name in the database we can map it with what thetvdb returned. By looking in both columns we can avoid the cache miss. e.g: thetvdb: "CSI: Miami" application: "CSI - Miami"
Created attachment 309953 [details] [review] tests: improve helper function with cache only arg And also check if out args are not NULL; both improvements will be used in the next commit
Created attachment 309954 [details] [review] tests: thetvdb with fuzzy name in show name
Created attachment 309963 [details] [review] thetvdb: Always search show names in all languages Also update the tests' mock data to match.
Created attachment 309964 [details] [review] tests: Add another alias test with a translated series name
Review of attachment 309952 [details] [review]: Looks good.
Review of attachment 309953 [details] [review]: Sure.
Review of attachment 309954 [details] [review]: Looks good.
Review of attachment 309963 [details] [review]: Yes!
Review of attachment 309964 [details] [review]: Yep!
Attachment 308589 [details] pushed as 66710d9 - thetvdb: use define for gom table names Attachment 309952 [details] pushed as 510f88a - thetvdb: avoid cache miss with fuzzy series name Attachment 309953 [details] pushed as 2a6dcfc - tests: improve helper function with cache only arg Attachment 309954 [details] pushed as 060472c - tests: thetvdb with fuzzy name in show name Attachment 309963 [details] pushed as 2b90f1b - thetvdb: Always search show names in all languages Attachment 309964 [details] pushed as bd39ef8 - tests: Add another alias test with a translated series name