GNOME Bugzilla – Bug 551402
Cannot update list of episodes in podcasts
Last modified: 2010-03-20 12:01:04 UTC
Please describe the problem: The list of episodes of a podcast is not updated automatically, nor can I update it via the "Update podcast provider" (or something like that, I am using the Spanish translation) context menu option. However, If I remove the podcast and add it again, I see the new episodes. Steps to reproduce: 1. Add a podcast. For example http://www.rtve.es/podcast/SSATEL.xml. 2. Wait a few days 3. Check that the podcast does have new episodes (using another client, like wget or firefox dynamic bookmarks) 4. Notice that rhythmbox does not show any new episode. 5. Try to update using the context menu, and see that nothing happens. Actual results: Nothing. That's the problem ;-) Expected results: New episodes should appear. Does this happen every time? As far as I can see, it does. Caveat: I have only tried with feeds from rtve.es for now. Other information: The new episodes do appear if you remove and then add the podcast again.
Could you please get the output of "rhythmbox -d" when reproducing the problem, and mention which version of Rhythmbox you're using?
Created attachment 118365 [details] rhythmbox -d output
I am using version 0.11.6 (from rhythmbox-0.11.6-2.fc9.x86_64 rpm). About the submitted trace: - At 17:53:32 I select the podcast. Eight entries appear in the list of episodes. - At 17:54:06 I select the update action from the context menu (actually I finish clicking it at 17:54:12). Still eight entries only. - At 17:55:14 I remove the podcast. Shortly after that, I add the same podcast again. When I select it now, it has nine episodes (that is, it now has last night's program too). I have noticed that the feeds from rtve.es are not entirely correct, I think. At least the dates are not encoded using RFC 822, hence they cannot be parsed correctly by rhythmbox. But I think that should not make any difference with respect to this bug.
The dates are parsed fine. It seems that it doesn't see any new entries, probably because the file is still the old one. Do you use a proxy of any kind? Does your ISP use transparent proxy?
No, the dates are not parsed correctly. But this is not rhythmbox's fault, as the format used by rtve.es is braindead: "dd-mm-yyyy hh:mm" which is undistinguishable with "mm-dd-yyyy hh:mm" (that's what rhythmbox picks, I get the month and the day reversed). Anyway, I think this is not related at all with the problem, and I don't care that much about the dates (I have already sent a "bug report" to rtve.es, fwiw). Sorry from bringing this up in the previous comment, it was offtopic. I am not using a proxy, and I doubt that there is any transparent proxy involved (I cannot know for sure, but I have never had any problem that could be explained by a transparent proxy). Moreover, wget does always get an up to date file. Firefox dynamic bookmarks work correctly too with these feeds. And remember that even rhythmbox does get the updated file if I remove the podcast and add it again, only not when clicking the update context menu option. Thank you for your help.
(In reply to comment #5) > No, the dates are not parsed correctly. But this is not rhythmbox's fault, as > the format used by rtve.es is braindead: "dd-mm-yyyy hh:mm" which is > undistinguishable with "mm-dd-yyyy hh:mm" (that's what rhythmbox picks, I get > the month and the day reversed). Anyway, I think this is not related at all > with the problem, and I don't care that much about the dates (I have already > sent a "bug report" to rtve.es, fwiw). Sorry from bringing this up in the > previous comment, it was offtopic. They *are* being parsed properly, using totem-pl-parser (which is what Rhythmbox uses): $ ./test-parser http://www.rtve.es/podcast/SSATEL.xml | grep publication-date publication-date = '2008-09-09T03:21:00Z' (1220930460/'09-09-2008 03:21') publication-date = '2008-08-09T03:23:00Z' (1218252180/'08-09-2008 03:23') publication-date = '2008-05-09T03:22:00Z' (1210303320/'05-09-2008 03:22') publication-date = '2008-04-09T03:22:00Z' (1207711320/'04-09-2008 03:22') publication-date = '2008-03-09T03:22:00Z' (1205032920/'03-09-2008 03:22') publication-date = '2008-02-09T03:22:00Z' (1202527320/'02-09-2008 03:22') publication-date = '2008-01-09T03:22:00Z' (1199848920/'01-09-2008 03:22') publication-date = '2010-05-08T04:51:00Z' (1273294260/'29-08-2008 04:51') publication-date = '2008-01-07T11:55:00Z' (1199706900/'01-07-2008 11:55') > I am not using a proxy, and I doubt that there is any transparent proxy > involved <snip> There's another case for which Rhythmbox doesn't add entries: when the newest post's publication-date isn't newer than the last one we've seen from the podcast. So please check whether the dates are parsed properly, and if they're not, mention which version of evolution-data-server and totem-pl-parser you're using.
> They *are* being parsed properly, using totem-pl-parser (which is what > Rhythmbox uses): No, they aren't. The parser may not detect any error, but the month and the day are reversed. For example: '08-09-2008 03:23' is interpreted as '2008-08-09T03:23:00Z', but should be '2008-09-08T03:23:00Z'. Or worse: '29-08-2008 04:51' is interpreted as '2010-05-08T04:51:00Z', but should be '2008-08-29T04:51:00Z'. This one is obviously wrong. There is no way to write a computer program to correctly parse these dates (in Spanish format, dd-mm-yyyy) and still be able to correctly parse the American format (mm-dd-yyyy) without extra information, because some dates are ambiguous. > There's another case for which Rhythmbox doesn't add entries: when the newest > post's publication-date isn't newer than the last one we've seen from the > podcast. I think this is a bug, because dates may be incorrect (as this bug shows). You should not trust the dates provided by the feed. Also, I don't think that feeds are _required_ to only add entries in chronological order (and even if they were, some feeds would not respect the requirement). > So please check whether the dates are parsed properly, and if they're not, > mention which version of evolution-data-server and totem-pl-parser you're > using. I am using evolution-data-server-2.22.3-2.fc9.x86_64 and totem-pl-parser-2.22.3-1.fc9.x86_64. I think that, to fix this bug, rhythmbox should add all entries in the feed regardless of their dates (as long as they have not been added already, of course).
I'm seeing this problem too in git master (3b8c0f2cfeb8c533264ff5f8d03d1894a702ad5d), that is, version 0.12.2.91. For the following feed: http://rss.conversationsnetwork.org/series/stackoverflow.xml Rhythmbox does not, when updating the feed on Fri, 3 Jul 2009 14:09:00 CEST, add the latest episode (#60 as of writing) that was published on Wed, 1 Jul 2009 00:00:00 CDT (see rhythmbox_feedupdate.log). When afterwards deleting/re-adding the feed, it adds episode #60 (see rhythmbox_feedadd.log). However, the following feed has not yet had this problem as far as I recall: http://feeds.feedburner.com/linuxoutlaws Using: totem-plparser 2.26.2 evolution-data-server 2.26.2
Created attachment 137786 [details] rhythmbox_feedupdate.log
Created attachment 137787 [details] rhythmbox_feedadd.log
Created attachment 152459 [details] [review] Syncs podcasts to exactly what is in the feed This patch changes the way Rhythmbox updates its podcast feeds. This patch always syncs the podcasts shown on screen with what the feed contains. Previously, the code would only add a new entry if the post-time for that entry happened after the last-overall-update time for the feed. This scenario didn't hold true for a lot of feeds. Furthermore, even if it did hold true some of the time for a feed, this condition would only be hit consistently if you kept Rhythmbox always running. If not, you could miss some podcast entries. My patch does away with all of that. Simply put, after my patch is applied, you will always see every podcast entry that is currently in the feed.
Review of attachment 152459 [details] [review]: ::: podcast/rb-podcast-manager.c @@ +712,3 @@ + rb_debug ("got file info results for %s", + get_remote_location (data->entry)); Please don't reformat code you haven't modified. It's harder to review patches when half of the patch doesn't have any actual changes in it. @@ +1107,3 @@ + RhythmDBQueryModel *existing_entries = NULL; + existing_entries = rhythmdb_query_model_new_empty (db); + rhythmdb_do_full_query (db, RHYTHMDB_QUERY_RESULTS (existing_entries), This seems like it wouldn't be a very reliable way of identifying duplicates. What are we trying to catch here? The same episode showing up in the feed with multiple URIs? @@ +1121,3 @@ + filesize, + RHYTHMDB_QUERY_END); + if (existing_entries != NULL) { existing_entries cannot be NULL here, and even if it was, the previous call to rhythmdb_do_full_query would have crashed, so this check is useless. @@ +1989,3 @@ + download_entries = g_list_prepend (download_entries, post_entry); + new_last_post = item->pub_date; + } you seem to have reformatted this entire block of code, which makes it very hard to see what you've actually changed.
Thanks for the review. Would it make sense for me to redo my changes by modifying the least amount of code possible? If so, I can do that and submit another patch. Chad
Created attachment 152609 [details] [review] Syncs podcasts to exactly what is in the feed (Rev 2) How does this revision look? I agree that the extra check I've added to rb_podcast_manager_add_post is a bit clunky. What is really needed is a rhythmdb_entry_lookup_by_mount_point function, since the unique identifier for a podcast-entry (the location) changes to the mount_point attribute after the podcast file is downloaded. Therefore, a data entry would have a different location (the url) than the same db entry (the local file path). Thus, duplicates will occur for all downloaded entries. What do you think?
Review of attachment 152609 [details] [review]: ::: podcast/rb-podcast-manager.c @@ +1108,3 @@ + * this check is necessary since after an entry's file is downloaded, + * the location stored in the db changes to the local file path + * instead of the url. The url moves to the mount-point attribute. Then this should query by mount point, rather than messing around with the title and file size. @@ +1972,3 @@ + * because that check was not a great + * indicator of whether or not to insert a post_entry. + * For some podcasts, this condition was never met. Well, that's one perspective. My view on this is that it's stupid for podcasts to post entries out of order, and that this patch is working around that. This removes the ability for the user to delete podcast entries, since they'll just show up again next time the feed is updated. We could possibly hide entries, rather than deleting them.
(In reply to comment #15) > Review of attachment 152609 [details] [review]: > > ::: podcast/rb-podcast-manager.c > @@ +1108,3 @@ > + * this check is necessary since after an entry's file is downloaded, > + * the location stored in the db changes to the local file path > + * instead of the url. The url moves to the mount-point attribute. > > Then this should query by mount point, rather than messing around with the > title and file size. > > @@ +1972,3 @@ > + * because that check was not a great > + * indicator of whether or not to insert a post_entry. > + * For some podcasts, this condition was never met. > > Well, that's one perspective. My view on this is that it's stupid for podcasts > to post entries out of order, and that this patch is working around that. > > This removes the ability for the user to delete podcast entries, since they'll > just show up again next time the feed is updated. We could possibly hide > entries, rather than deleting them. What would your ideal solution be? Querying by MountPoint and hiding deleted entries? If so, I can go after that.
I think that would work and would not break anything else, so yes.
Created attachment 154903 [details] [review] Sets deleted podcast posts to hidden Per your last review, this patch sets deleted podcast posts to hidden so they do not return after the user does an update.
Review of attachment 154903 [details] [review]: I think this is roughly in the right shape now, but there are some style issues and the like that need to be fixed. ::: podcast/rb-podcast-manager.c @@ +1120,3 @@ + RHYTHMDB_QUERY_PROP_EQUALS, + RHYTHMDB_PROP_SUBTITLE, + subtitle, Why do we still need to check the subtitle here? @@ +1126,3 @@ + RHYTHMDB_QUERY_PROP_EQUALS, + RHYTHMDB_PROP_HIDDEN, + FALSE, Why are we excluding hidden entries here? @@ +1701,3 @@ RHYTHMDB_QUERY_END); + GtkTreeIter iter; Variable declarations go at the start of the block. Move this back where it was. @@ +1969,3 @@ + /* + * Always try to add a post_entry. + * Removed the check of whether or not the entry's post_time > Don't comment on code that isn't there any more. Just explain what it does now. ::: rhythmdb/rhythmdb-query-model.h @@ +100,2 @@ RhythmDBQueryModel * rhythmdb_query_model_new_empty (RhythmDB *db); +RhythmDBQueryModel * rhythmdb_query_model_new_empty_with_hidden (RhythmDB *db); I don't think we want a new constructor for this. If you want a query model that includes hidden entries, you just need to set the show-hidden property on it after creating it. ::: rhythmdb/rhythmdb-tree.c @@ +1949,3 @@ + return FALSE; + } + RHYTHMDB_PROPERTY_COMPARE (!=) No need for all of this, just change the strcmp in RHYTHMDB_PROPERTY_COMPARE to g_strcmp0. ::: rhythmdb/rhythmdb.h @@ +390,2 @@ void rhythmdb_entry_delete (RhythmDB *db, RhythmDBEntry *entry); +void rhythmdb_entry_set_visibility (RhythmDB *db, RhythmDBEntry *entry, gboolean visible); This is an internal rhythmdb function, it shouldn't be exposed. ::: sources/rb-podcast-source.c @@ +942,3 @@ + /* set podcast entries to invisible instead of deleted so they will + * not reappear after the podcast has been updated*/ + rhythmdb_entry_set_visibility (source->priv->db, l->data, FALSE); use this instead: GValue v = {0,}; g_value_init (&v, G_TYPE_BOOLEAN); g_value_set_boolean (&v, TRUE); rhythmdb_entry_set (source->priv->db, l->data, RHYTHMDB_PROP_HIDDEN, &v); g_value_unset (&v); rhythmdb_entry_set_visibility should only be used inside rhythmdb itself. Yes, it's more verbose, but we have to deal with that everywhere.
Created attachment 156078 [details] [review] cleaned up - set deleted podcast posts to hidden I fixed up all of the style issues from my previous patch. Most were fairly easy fixes. The only thing that tripped me up was setting a query model's show_hidden property. To me it seems like to do this directly requires exposing private fields and/or functions. Instead, I created a simple "setter" method called "rhythmdb_query_model_set_show_hidden". It is used in rb-podcast-manager.c on lines 1692 and 1859. Please let me know if this is bad style. If so, please help me out with the direction I should take on this. To me, it seems like any other solution breaks encapsulation in some way. I could be missing something, though. Any help is appreciated. Thank you for taking the time to review my code and provide feedback. I hope we are close to a final solution. Chad
Review of attachment 156078 [details] [review]: Just one last thing and I think this is ready. ::: rhythmdb/rhythmdb-query-model.h @@ +142,3 @@ const char *plural); +void rhythmdb_query_model_set_show_hidden (RhythmDBQueryModel *model, + gboolean hidden); Sorry, I should have explained this better. 'show-hidden' is exposed as GObject property, so to set it, all you need to do is: g_object_set (model, "show-hidden", TRUE, NULL); See the GObject documentation for more information on properties: http://library.gnome.org/devel/gobject/stable/gobject-properties.html
Created attachment 156089 [details] [review] cleaned up (2) - set deleted podcast posts to hidden now using g_object_set. Thanks for the explanation.
*** Bug 558391 has been marked as a duplicate of this bug. ***
*** Bug 588170 has been marked as a duplicate of this bug. ***
I cleaned the patch up a bit (tabs vs spaces, minor comment formatting things) and pushed it as commit 99f6d8a. Thanks for working on this, Chad.