GNOME Bugzilla – Bug 672933
Add thetvdb.com metadata
Last modified: 2014-11-27 14:55:52 UTC
Links: http://thetvdb.com/ API: http://thetvdb.com/wiki/index.php?title=Programmers_API Filename parsing: https://code.google.com/p/mptvseries/wiki/FAQ#File_Naming_Conventions Python implementation: http://code.mythtv.org/doxygen/ttvdb_8py_source.html
*** Bug 723113 has been marked as a duplicate of this bug. ***
Created attachment 277126 [details] [review] Include source The TVDB This is the first version of the source, still needs some work. At the moment, I'm fetching once to get SeriesID and another fetch to get all Episodes in a zip file. I'll create a DB with sqlite to insert all fetched data in order to be accessible offline. I'll create a few GrlKeyID (the same way tmdb does) to provide more metadata fields. At the moment, for easiness, I've harded coded the API-KEY and the language ('en') as well. Doubt: how the best way to get application language preference? (TMDB has a lot of data translated)
Created attachment 277129 [details] [review] Include tests of The TVDB source Just a few basic tests at the moment. Check the title from a specific show at specific season and episode.
Review of attachment 277126 [details] [review]: ::: src/thetvdb/grl-thetvdb.c @@ +59,3 @@ +/* --- Helpers --- */ +#define TOSO_APIKEY "3F476CEF2FBD0FB0" +#define XMLCAST_CHARPTR (const xmlChar *) /usr/include/libxml2/libxml/xmlstring.h:#define BAD_CAST (xmlChar *) @@ +299,3 @@ + EpisodeEntry *epi; + + if (title == NULL && season_number == 0 && episode_number == 0) Is that supposed to be "or" instead? @@ +546,3 @@ + +static gboolean +unzip_all_series_data (gchar *zip, We should at least move this to a copy/paste file to be shared between the lua-factory and this code.
Review of attachment 277129 [details] [review]: It will definitely need some offline tests, like tmdb has.
(In reply to comment #2) > Created an attachment (id=277126) [details] [review] > Include source The TVDB > > This is the first version of the source, still needs some work. > > At the moment, I'm fetching once to get SeriesID and another fetch to get all > Episodes in a zip file. That's probably not that much of a problem, but just how big are those files? Is this going to cause significant delays the first time it's accessed? What worries me is multiple concurrent resolution requests would currently cause a pretty big problem with concurrent initial downloads of the series ID file, and the series files. > I'll create a DB with sqlite to insert all fetched data in order to be > accessible offline. Use gom! That's what it's there for. > I'll create a few GrlKeyID (the same way tmdb does) to provide more metadata > fields. > > At the moment, for easiness, I've harded coded the API-KEY and the language > ('en') as well. Doubt: how the best way to get application language preference? > (TMDB has a lot of data translated) I assume the list of supported languages is quite small. Make sure that your database schema supports multiple languages, just in case. I would always get the language data for the current UI language.
> /usr/include/libxml2/libxml/xmlstring.h:#define BAD_CAST (xmlChar *) Thanks! > @@ +299,3 @@ > + EpisodeEntry *epi; > + > + if (title == NULL && season_number == 0 && episode_number == 0) > > Is that supposed to be "or" instead? Yes, thanks! To specify an episode of the show, we need the title or season and episode number. Fixed. (In reply to comment #5) > Review of attachment 277129 [details] [review]: > > It will definitely need some offline tests, like tmdb has. Included! > That's probably not that much of a problem, but just how big are those files? > Is this going to cause significant delays the first time it's accessed? > > What worries me is multiple concurrent resolution requests would currently > cause a pretty big problem with concurrent initial downloads of the series ID > file, and the series files. I have included a few of those files in the offline tests. The biggest so far is 56K (all metadata for House) > > I'll create a DB with sqlite to insert all fetched data in order to be > > accessible offline. > > Use gom! That's what it's there for. Yes, I'm playing with it. Should be ready by Tuesday as +1 patch. > I assume the list of supported languages is quite small. Make sure that your > database schema supports multiple languages, just in case. Ok! http://thetvdb.com/wiki/index.php?title=Multi_Language#Available_Languages > I would always get the language data for the current UI language. Shouldn't this be an option in grl_resolve_operation? From the source, how could I know the UI language? (g_get_language_names () ??)
Created attachment 277709 [details] [review] Include source: The TVDB
Created attachment 277710 [details] [review] Include tests to The TVDB source
(In reply to comment #7) > > @@ +299,3 @@ > > + EpisodeEntry *epi; > > + > > + if (title == NULL && season_number == 0 && episode_number == 0) > > > > Is that supposed to be "or" instead? > > Yes, thanks! To specify an episode of the show, we need the title or season and > episode number. Fixed. No, it's "||" because you need the title and the season and the episode number :) > (In reply to comment #5) > > Review of attachment 277129 [details] [review] [details]: > > That's probably not that much of a problem, but just how big are those files? > > Is this going to cause significant delays the first time it's accessed? > > > > What worries me is multiple concurrent resolution requests would currently > > cause a pretty big problem with concurrent initial downloads of the series ID > > file, and the series files. > > I have included a few of those files in the offline tests. The biggest so far > is 56K (all metadata for House) But how big is the list of series? Is it possible that multiple resolve() requests, when launched the first time, would download that file multiple times, try to insert it multiple times in the database, etc. ? > > > I'll create a DB with sqlite to insert all fetched data in order to be > > > accessible offline. > > > > Use gom! That's what it's there for. > > Yes, I'm playing with it. Should be ready by Tuesday as +1 patch. Cool. Let me know if you run into any problems. > > I assume the list of supported languages is quite small. Make sure that your > > database schema supports multiple languages, just in case. > > Ok! > http://thetvdb.com/wiki/index.php?title=Multi_Language#Available_Languages Yes, it was more about the database itself supporting those. So you don't have code saying "hey, the database is already populated, but I can't find anything for your language". > > I would always get the language data for the current UI language. > > Shouldn't this be an option in grl_resolve_operation? From the source, how > could I know the UI language? (g_get_language_names () ??) I'd use g_get_language_names(), yes. An option might be nice, but this should be the default behaviour.
Review of attachment 277709 [details] [review]: Looks good overall, waiting on the caching (with gom) and possibly some way to avoid problems on the first query. ::: src/thetvdb/grl-thetvdb.c @@ +126,3 @@ + gchar *imdb_id; + gchar *first_aired; + gdouble rating; If you're going to line up variable names, make sure that the names are lined up. @@ +139,3 @@ + gchar **actor_names; + gchar **alias_names; + gchar **genres; Ditto. @@ +152,3 @@ + gchar *url_episode_screen; + gchar **director_names; + gchar **guest_stars_names; Ditto.
Review of attachment 277710 [details] [review]: Looking good!
(In reply to comment #10) > > > > I'll create a DB with sqlite to insert all fetched data in order to be > > > > accessible offline. > > > > > > Use gom! That's what it's there for. > > > > Yes, I'm playing with it. Should be ready by Tuesday as +1 patch. > > Cool. Let me know if you run into any problems. And I did. The https://github.com/chergert/gom.git is the right branch with the latest fixes? Basically I had a few problems when using multiples GomFilters in one query (the idea was using the gom_repository_find_one_sync with 'series_id' == num, and information to get an episode. I got double_free inside gom with it. Backtrace: http://fpaste.org/108266/90241140/ So, for now I'm geting fetching all episodes into GomResourceGroup and iterating until I find the right episode. I plan to fix that afterwords. Also, there is two TODOs regarding database updates [0] and working-around with concurrency. [0] http://thetvdb.com/wiki/index.php?title=API:Update_Records It is quite faster with GOM even comparing with the offline data.
Created attachment 278172 [details] [review] Include source: The TVDB With gom and new metadata-keys
Created attachment 278173 [details] [review] Include tests to The TVDB source No new changes in the tests.
(In reply to comment #13) > (In reply to comment #10) > > > > > I'll create a DB with sqlite to insert all fetched data in order to be > > > > > accessible offline. > > > > > > > > Use gom! That's what it's there for. > > > > > > Yes, I'm playing with it. Should be ready by Tuesday as +1 patch. > > > > Cool. Let me know if you run into any problems. > > And I did. The https://github.com/chergert/gom.git is the right branch with the > latest fixes? No, it's on git.gnome.org. > Basically I had a few problems when using multiples GomFilters in one query > (the idea was using the gom_repository_find_one_sync with 'series_id' == num, > and information to get an episode. I got double_free inside gom with it. > > Backtrace: http://fpaste.org/108266/90241140/ If that still happens, please create a reproducer, or modify one of the tests in gom/tests and we'll look at it. > So, for now I'm geting fetching all episodes into GomResourceGroup and > iterating until I find the right episode. I plan to fix that afterwords. OK. > Also, there is two TODOs regarding database updates [0] and working-around with > concurrency. > > [0] http://thetvdb.com/wiki/index.php?title=API:Update_Records > > It is quite faster with GOM even comparing with the offline data. Cool.
Review of attachment 278172 [details] [review]: Looks good overall, I'm waiting on the test case for that crasher you mentioned and your ideas on how to solve concurrency. ::: src/thetvdb/grl-thetvdb.c @@ +160,3 @@ + GRL_DEBUG ("thetvdb_plugin_init"); + + config = GRL_CONFIG (configs->data); That makes applications that don't pass any configuration crash on startup. @@ +320,3 @@ + * thetvdb.com/wiki/index.php?title=Multi_Language#Available_Languages */ + ht = g_hash_table_new_full (g_str_hash, g_str_equal, NULL, NULL); + g_hash_table_insert (ht, "en", GINT_TO_POINTER (7)); I don't like this, I'd rather have a static table: static { int lid; const char *name; } languages[] = { { 7, "en" }, }; etc. @@ +367,3 @@ + tables = g_list_prepend (tables, GINT_TO_POINTER (EPISODE_TYPE_RESOURCE)); + gom_repository_automatic_migrate_async (source->priv->repository, 2, tables, + thetvdb_migrate_db_done, source); The source should probably disappear from the list of sources, or we'll need to add more guards around the other entry points of the source like resolve(). @@ +422,3 @@ + gchar lang[3]; + + g_strlcpy (lang, strv[i], 3); What's this supposed to do? Get the language code? There are plenty of 3 letter languages in /usr/share/locale/. @@ +466,3 @@ + continue; + + if (!xmlStrcasecmp (child->name, THETVDB_LANGUAGE)) I don't like "!strcmp". It makes it look like a negation. I'd prefer == 0. And multi-line conditionals should have curly braces {}. @@ +704,3 @@ + + /* Do not insert empty strings */ + if (g_strcmp0 (strv[i], "") == 0) if (strv[i] && *strv[i] == '\0') continue. @@ +762,3 @@ + for (it = keys; it; it = g_list_next (it)) { + key_id = GRLPOINTER_TO_KEYID (it->data); + num = -1; You don't need num outside this loop, please move all the variables that are only used inside the loop to be declared at the beginning of the loop. @@ +1222,3 @@ + + tvdb_source = GRL_THETVDB_SOURCE (os->source); + group = gom_repository_find_sync (tvdb_source->priv->repository, That will obviously need fixing.
Review of attachment 278173 [details] [review]: That looks fine to me.
> ::: src/thetvdb/grl-thetvdb.c > @@ +160,3 @@ > + GRL_DEBUG ("thetvdb_plugin_init"); > + > + config = GRL_CONFIG (configs->data); > > That makes applications that don't pass any configuration crash on startup. Fixed. > > @@ +320,3 @@ > + * thetvdb.com/wiki/index.php?title=Multi_Language#Available_Languages */ > + ht = g_hash_table_new_full (g_str_hash, g_str_equal, NULL, NULL); > + g_hash_table_insert (ht, "en", GINT_TO_POINTER (7)); > > I don't like this, I'd rather have a static table: > static { > int lid; > const char *name; > } languages[] = { > { 7, "en" }, > }; etc. Indeed, way better. Thanks! > > @@ +367,3 @@ > + tables = g_list_prepend (tables, GINT_TO_POINTER (EPISODE_TYPE_RESOURCE)); > + gom_repository_automatic_migrate_async (source->priv->repository, 2, tables, > + thetvdb_migrate_db_done, source); > > The source should probably disappear from the list of sources, or we'll need to > add more guards around the other entry points of the source like resolve(). If it fails, I've removed the source from the list now. > @@ +422,3 @@ > + gchar lang[3]; > + > + g_strlcpy (lang, strv[i], 3); > > What's this supposed to do? Get the language code? > There are plenty of 3 letter languages in /usr/share/locale/. Fixed. > > @@ +466,3 @@ > + continue; > + > + if (!xmlStrcasecmp (child->name, THETVDB_LANGUAGE)) > > I don't like "!strcmp". It makes it look like a negation. I'd prefer == 0. > > And multi-line conditionals should have curly braces {}. Done! > > @@ +704,3 @@ > + > + /* Do not insert empty strings */ > + if (g_strcmp0 (strv[i], "") == 0) > > if (strv[i] && *strv[i] == '\0') > continue. Fixed! > @@ +762,3 @@ > + for (it = keys; it; it = g_list_next (it)) { > + key_id = GRLPOINTER_TO_KEYID (it->data); > + num = -1; > > You don't need num outside this loop, please move all the variables that are > only used inside the loop to be declared at the beginning of the loop. Fixed. > @@ +1222,3 @@ > + > + tvdb_source = GRL_THETVDB_SOURCE (os->source); > + group = gom_repository_find_sync (tvdb_source->priv->repository, > > That will obviously need fixing. I'm not using group anymore, thanks to the fix at https://bugzilla.gnome.org/show_bug.cgi?id=731701 I am searching the data on cache async but saving it sync. I'm dealing with concurrency inserting new request to a wait_list and when the fetch is done I resolve all requests from cache. I included a stress test to verify this, fetching two seasons of house. The only problem I know at the moment is duplication of values. I thougth that setting the unique table id of the new element with the same value of the old element it would update the row. It does not. Steps to verify with currents tests: 1-) execute test_thetvdb_resolve_episodes 2-) remove episode 8x22 of house 3-) execute test_thetvdb_resolve_episodes
Created attachment 278649 [details] [review] Include source: The TVDB
Created attachment 278650 [details] [review] Include tests to The TVDB source
Review of attachment 278649 [details] [review]: Needs rebasing.
Review of attachment 278650 [details] [review]: Needs rebasing as well (especially given the fact that it contains binary files that patch won't handle).
(In reply to comment #19) <snip> > The only problem I know at the moment is duplication of values. I thougth that > setting the unique table id of the new element with the same value of the old > element it would update the row. It does not. > Steps to verify with currents tests: > 1-) execute test_thetvdb_resolve_episodes > 2-) remove episode 8x22 of house > 3-) execute test_thetvdb_resolve_episodes I don't understand what's getting duplicated here. If you need a particular column in the database to be unique, use the unique constraint (see tests/test-gom-constraints.c)
(In reply to comment #24) > (In reply to comment #19) > <snip> > > The only problem I know at the moment is duplication of values. I thougth that > > setting the unique table id of the new element with the same value of the old > > element it would update the row. It does not. > > Steps to verify with currents tests: > > 1-) execute test_thetvdb_resolve_episodes > > 2-) remove episode 8x22 of house > > 3-) execute test_thetvdb_resolve_episodes > > I don't understand what's getting duplicated here. If you need a particular > column in the database to be unique, use the unique constraint (see > tests/test-gom-constraints.c) Thanks, It worked. (In reply to comment #23) > Review of attachment 278650 [details] [review]: > > Needs rebasing as well (especially given the fact that it contains binary files > that patch won't handle). Sorry, I think I've made a confusion with my local branches last time. I'm sure it is rebased from master now - dc1d904a8d9b41f97d
Created attachment 278729 [details] [review] Include the tvdb source
Created attachment 278730 [details] [review] include tests to the tvdb source
Review of attachment 278729 [details] [review]: Looks good to commit after fixing those small problems! ::: src/thetvdb/grl-thetvdb.c @@ +110,3 @@ + gchar *api_key; + GList *supported_keys; + GHashTable *ht_wait_list; Please mention the types of the keys and values. @@ +134,3 @@ + const gchar *name; +} supported_languages[] = { + { 7, "en" }, { 8, "sv" }, { 9, "no" }, one per line is fine. @@ +431,3 @@ + g_list_free (os->keys); + if (os->serie_resource) + g_object_unref (os->serie_resource); g_clear_object (&os->serie_resource); @@ +475,3 @@ + /* Get ptr to <Serie> */ + node = node->xmlChildrenNode; + /* Interate in the <Episode> nodes */ Iterate @@ +606,3 @@ + /* Get ptr to <Serie> */ + node = node->xmlChildrenNode; + /* Interate in the <Serie> */ Iterate @@ +640,3 @@ + /* Get ptr to <Serie> */ + node = node->xmlChildrenNode; + /* Interate in the <Serie> */ Iterate @@ +735,3 @@ + + strv_len = g_strv_length (strv); + for (i = 0; i < strv_len; i++) { for (i = 0; strv[i] != NULL; i++) { No need to traverse the list twice. @@ +1060,3 @@ + + wait_list = g_hash_table_lookup (tvdb_source->priv->ht_wait_list, show); + for (it = wait_list; it; it = g_list_next (it)) { it = wait_list; it != NULL; it = it->next @@ +1123,3 @@ + g_free (target_xml); + if (r != ARCHIVE_OK) { + if (r != ARCHIVE_EOF && r == ARCHIVE_FATAL) { r == ARCHIVE_FATAL directly? ::: src/thetvdb/thetvdb-resources-episodes.c @@ +21,3 @@ + */ + +#include <glib/gi18n.h> Doesn't looks like you need this include. ::: src/thetvdb/thetvdb-resources-series.c @@ +21,3 @@ + */ + +#include <glib/gi18n.h> Here neither.
Review of attachment 278730 [details] [review]: Looks good.
One thing to commit at the same time as this, an API key to setup grilo-test-ui-0.2, so it works out of the box. In the future, I'd like a small application to populate the database once and for all. Useful for those that will want to fetch all that data offline. At least the series to series ID table looks useful.
Created attachment 278971 [details] [review] Include API-KEY for tools/test-ui This is the api-key I was using to develop the source.
Review of attachment 278971 [details] [review]: ++
Review of attachment 278729 [details] [review]: up
Review of attachment 278730 [details] [review]: up
Review of attachment 278971 [details] [review]: up
That got merged!