GNOME Bugzilla – Bug 484768
Use totem's playlist parser for Podcast parsing
Last modified: 2008-01-05 06:01:16 UTC
The feature is in current Totem trunk.
Created attachment 97641 [details] [review] rb-use-newer-plparser.patch First patch to use the new playlist parser all over.
Created attachment 97738 [details] [review] rb-use-newer-plparser-2.patch Port the podcast parsing to using totem-pl-parser. Adds support for Atom podcasts.
Created attachment 97740 [details] [review] rb-use-newer-plparser-3.patch Add support for itpc schemes, and itunes podcasts. I need to: - move the podcast detection code to the URI helpers - make the shell detect and handle RSS and Atom feeds, so they get added to the Podcast manager instead of as playlists (eek!) - once that's added, make sure we can't create duplicate feeds
Created attachment 97787 [details] [review] rb-use-newer-plparser-4.patch Updated to support OPML files.
Created attachment 97789 [details] [review] rb-use-newer-plparser-5.patch Add rb_uri_could_be_podcast() and use it.
Created attachment 97791 [details] [review] rb-use-newer-plparser-6.patch - Fix detection for itms: URLs (needs Totem trunk) Only the command-line integration left to do...
Created attachment 97831 [details] [review] rb-use-newer-plparser-7.patch Passing http://planet.lugradio.org/hashlugradio/releases/episodes/feed/atom/ on the command-line doesn't work as expected (adds an iRadio). So I added a hack for it in rb-file-helpers.c. This adds command-line support. It will also switch to the Podcast source when added. Once installed, you should be able to go to: http://www.itunes.com/podcast?id=207870198 and click on "I have iTunes" to make it load directly in Rhythmbox It's ready to commit as far as I'm concerned, please test thoroughly.
The only thing that seems to be missing, is better error reporting when an itms, or itunes music store URL is passed that isn't a Podcast... Also, would it make sense to have the "might be podcast" function to totem's playlist parser?
This doesn't work for most of the podcasts I'm subscribed to, as they are mostly detected as application/xml (rather than application/rss+xml), and totem_pl_parser_add_xml_feed just returns TOTEM_PL_PARSER_RESULT_ERROR. I added a dumb implementation of that function - read the feed, if is_rss() { add_rss() }, etc. - to get past that. A number are also detected as text/plain, so they don't work at all: http://dot.cult.bg/notitle/wp/?cat=10&feed=rss2 http://www.npr.org/rss/podcast.php?id=35 http://thestreamguide.com/xml.php?channelid=58 http://www.mashit.com/?feed=rss2&category_name=podcast I haven't tried anything relating to atom or opml.
Which version of Totem are you using, and which version of shared-mime-info? I've committed a fix to shared-mime-info to up the priority of the RSS magic, so it doesn't get detected as an XML file. I could parse all those podcasts with test-parser, in totem 2.21.1. Could you let me know if those work now?
I'm using current svn totem, shared-mime-info 0.22. I've copied the rss-related bits from shared-mime-info cvs (and run update-mime-database) and it doesn't seem to have helped. I'll debug further tomorrow, hopefully.
Setting the 'force' property on the parser seems to help. With that and the current rss bits from shared-mime-info (and without my add_xml_feed hack), almost all the podcasts I use are identified and parsed correctly. Feeds served from feedburner.com include too much xml junk at the start for the magic in shared-mime-info to work: <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/rss2enclosuresfull.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http: //feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><rss ... which is apparently 267 characters, but the magic only checks the first 256. Ugh. The old parser handles the 'smart quote' entities (&#xxxx; encoding) in this feed properly, but the totem parser leaves them encoded: http://www.mashit.com/?feed=rss2&category_name=podcast
(In reply to comment #12) > Setting the 'force' property on the parser seems to help. With that and the > current rss bits from shared-mime-info (and without my add_xml_feed hack), > almost all the podcasts I use are identified and parsed correctly. > > Feeds served from feedburner.com include too much xml junk at the start for the > magic in shared-mime-info to work: > > <?xml version="1.0" encoding="UTF-8"?> > <?xml-stylesheet > href="http://feeds.feedburner.com/~d/styles/rss2enclosuresfull.xsl" > type="text/xsl" media="screen"?><?xml-stylesheet href="http: > //feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" > media="screen"?><rss ... > > which is apparently 267 characters, but the magic only checks the first 256. > Ugh. But they should be recognised as XML, and then tested. is_xml_feed in totem's plparser needs to be implemented. Do you have a test URL please? > The old parser handles the 'smart quote' entities (&#xxxx; encoding) in this > feed properly, but the totem parser leaves them encoded: > > http://www.mashit.com/?feed=rss2&category_name=podcast Fixed: 2007-11-05 Bastien Nocera <hadess@hadess.net> * src/plparse/xmllexer.c: (lexer_get_token), (lexer_decode_entities): Fix decoding of decimal entities, useful for Podcasts (see #484768)
Here's one: http://feeds.feedburner.com/dppodcasts
(In reply to comment #14) > Here's one: http://feeds.feedburner.com/dppodcasts Fix in totem: 2007-11-06 Bastien Nocera <hadess@hadess.net> * src/plparse/totem-pl-parser-podcast.c: (totem_pl_parser_add_xml_feed): Implement, fixes parsing of feeds from Feedburner.com (see #484768) Better?
Thanks, they're working now.
It's not exactly a real-world test case, but I just hacked up a perl script to create an OPML file from rhythmdb.xml, then used that to add the feeds to a new db file, and everything came through correctly.
(In reply to comment #17) > It's not exactly a real-world test case, but I just hacked up a perl script to > create an OPML file from rhythmdb.xml, then used that to add the feeds to a new > db file, and everything came through correctly. Great. Would OPML save support in totem-pl-parser be useful then?
I think it would be useful. As a slightly more relevant test, I found that the OPML from here: http://share.opml.org/toppodcasts/ was handled correctly.
(In reply to comment #19) > I think it would be useful. Ok. I'll leave it to bug #489856 though, as I'd rather get this code released before putting more work into it. > As a slightly more relevant test, I found that the OPML from here: > http://share.opml.org/toppodcasts/ was handled correctly. Cool!
Created attachment 99916 [details] [review] rb-use-newer-plparser-8.patch - Set the locale in the test-parser, so we can output UTF-8 - Fix stupid bug that made the podcasts never update, even when forced
Created attachment 99919 [details] [review] rb-use-newer-plparser-9.patch With the gecko plugin from bug 489874 integrated.
The only remaining question I have about this is about dependencies. If we depend on totem 2.21+, will it still be practical to build rhythmbox trunk on recent desktop distribution releases (fedora 8, ubuntu gutsy, etc.)? I've seen from some recent totem changes that you're planning to split the playlist parser out into a separate module. I definitely wouldn't have a problem with this going in once that's done. Is this likely to happen soon?
2007-12-03 Bastien Nocera <hadess@hadess.net> * configure.ac: Require totem-pl-parser 2.21.4 * lib/rb-file-helpers.c: (rb_uri_could_be_podcast): * lib/rb-file-helpers.h: Add helper function to detect whether it could be a podcast from the URI passed * data/rhythmbox.schemas: Handle feed://, itpc:// and itms:// URIs * podcast/Makefile.am: * podcast/rb-podcast-manager.c: (rb_podcast_manager_subscribe_feed), (rb_podcast_manager_parse_complete_cb), (rb_podcast_manager_thread_parse_feed), (rb_podcast_manager_save_metadata), (rb_podcast_manager_insert_feed): * podcast/rb-podcast-parse.c: (playlist_metadata_foreach), (playlist_started), (playlist_ended), (entry_metadata_foreach), (entry_parsed), (rb_podcast_parse_load_feed), (rb_podcast_parse_channel_free): * podcast/rb-podcast-parse.h: Use totem-pl-parser to parse Podcast files (including Atom files) (Closes: #355984), as well as OPML files * shell/rb-playlist-manager.c: (handle_playlist_entry_cb), (playlist_load_started_cb), (playlist_load_ended_cb), (rb_playlist_manager_parse_file), (_is_dirty_playlist), (rb_playlist_manager_save_playlists): * shell/rb-shell-player.c: (rb_shell_player_class_init), (rb_shell_player_handle_eos_unlocked), (rb_shell_player_init), (playlist_entry_cb), (open_location_thread), (rb_shell_player_open_location), (rb_shell_player_sync_play_order), (rb_shell_player_do_previous), (rb_shell_player_do_next_internal), (rb_shell_player_entry_activated_cb): * shell/rb-shell.c: (handle_playlist_entry_cb), (rb_shell_load_uri): Update API to totem-pl-parser 2.21.4, detect podcasts and OPML files (Closes: #489520) and add them to the podcast source * podcast/test-podcast-parse.c: (main): Call setlocale so we print UTF-8 characters (Closes: #484768)
(In reply to comment #23) > The only remaining question I have about this is about dependencies. If we > depend on totem 2.21+, will it still be practical to build rhythmbox trunk on > recent desktop distribution releases (fedora 8, ubuntu gutsy, etc.)? > > I've seen from some recent totem changes that you're planning to split the > playlist parser out into a separate module. I definitely wouldn't have a > problem with this going in once that's done. Is this likely to happen soon? > Was there ever an answer to this? I've been playing with various iPod bugs and patches on my Ubuntu Gutsy desktop recently and basically backed out the -9 patch to get trunk to build. I don't use Podcasts personally, and Bastien's patch currently backs out quite well, but it's not really a solution for current "Desktop" users or if any major work goes into the podcast parsing in the near future (ie. before GNOME 2.22).
(In reply to comment #25) > Was there ever an answer to this? The totem/totem-plparser split occurred just before the patch was committed. If it hadn't, I'm sure I would have objected by now. > I've been playing with various iPod bugs and patches on my Ubuntu Gutsy desktop > recently and basically backed out the -9 patch to get trunk to build. > > I don't use Podcasts personally, and Bastien's patch currently backs out quite > well, but it's not really a solution for current "Desktop" users or if any > major work goes into the podcast parsing in the near future (ie. before GNOME > 2.22). If you're already building rhythmbox from source, you can certainly build totem-plparser too. It's a very simple library that has few dependencies.