GNOME Bugzilla – Bug 320507
Deal with HTML in Podcast Descriptions
Last modified: 2010-04-05 11:30:52 UTC
Distribution/Version: Fedora Core Rawhide Right now if you click on the properties of a Podcast Entry, you get the source, so if there is HTML, it looks like crap.
As far as I can tell, the description element isn't supposed to contain HTML. Although none of them are "official", several pages about podcast feeds say that you can't put anything besides plain text in them (such as http://phobos.apple.com/static/iTunesRSS.html). If we did want to support html-in-descriptions we'd have to pull in a HTML renderer as a dependency, which I don't think is something we should do.
The description element is an RSS thing, not a podcast thing and HTML is definitely allowed. See http://blogs.law.harvard.edu/tech/rss
James, how about simply stripping out the HTML? Skadz, could you provide a link to an RSS feed with that problem?
Stripping sorta works, but something like this one (http://www.coverville.com/index.xml), it has tables, which still won't translate all that well even with the HTML stripped out.
Thanks for the link.
http://phobos.apple.com/static/iTunesRSS.html#_Toc526931676 seems pretty clear that this shouldn't be allowed. WONTFIX?
Incorrect. First off, it says: CDATA sections are strongly discouraged. CDATA can contain HTML and it does not forbid it. Second, though this may be true for the iTunes elements, it is very much not true for RSS feeds in general and since we use the RSS description element, it can include HTML with no problem.
Alternatively, could you have simple gtkhtml support ? Better than nothing ? How many podcasts actually use html in the description ?
copy me
Change to "Podcast" Component
This bug has been described also at https://bugs.launchpad.net/distros/ubuntu/+source/rhythmbox/+bug/64917 Thanks
Wade: Quiet many, actually. Some highlights are both LUGRadio and the Linux Action Show.
Created attachment 97917 [details] [review] use webkit? This more or less works (with the current webkit-gtk packages in debian unstable), but: - links need to be handled externally (gnome_vfs_url_show or something) rather than showing the new page in the podcast properties dialog - probably need to do something with font settings - maybe we could use a "user" css file to set the background to match the dialog? (I guess this could do fonts too) Memory usage impact is pretty small. Virtual size seems to be 30MB bigger, but resident size is unchanged until it actually get used, at which point it grows about 3MB.
There's some useful stuff that could be done here - webkit/gtk svn has the ability to set a transparent background, for one, aside from the items I already mentioned.
(In reply to comment #13) > Created an attachment (id=97917) [edit] > use webkit? > > This more or less works (with the current webkit-gtk packages in debian > unstable), but: > - links need to be handled externally (gnome_vfs_url_show or something) rather > than showing the new page in the podcast properties dialog You can see how to do this in Blam SVN. > - probably need to do something with font settings Can be done with WebSettings. > - maybe we could use a "user" css file to set the background to match the > dialog? (I guess this could do fonts too) Real transparent background should be in by 2.24. The WebKit API is stable now if you want to take a shot at landing this (it'll need forward-porting to the current WebView-based API).
(In reply to comment #15) > (In reply to comment #13) > > Created an attachment (id=97917) [edit] > > use webkit? > > > > This more or less works (with the current webkit-gtk packages in debian > > unstable), but: > > - links need to be handled externally (gnome_vfs_url_show or something) rather > > than showing the new page in the podcast properties dialog > > You can see how to do this in Blam SVN. > > > - probably need to do something with font settings > > Can be done with WebSettings. > > > - maybe we could use a "user" css file to set the background to match the > > dialog? (I guess this could do fonts too) > > Real transparent background should be in by 2.24. > > The WebKit API is stable now if you want to take a shot at landing this (it'll > need forward-porting to the current WebView-based API). > Can't we go the easy route and use some tags (a, b, i, em, p, br, lists, etc) and strip out everything else (like tables) and just ignore css all togeather? How much do we need anyway?
(In reply to comment #16) <snip> > Can't we go the easy route and use some tags (a, b, i, em, p, br, lists, etc) > and strip out everything else (like tables) and just ignore css all togeather? Because it would look arse, would be just as hard as using WebKit (if not harder to do nicely), and we'd be losing information.
*** Bug 551429 has been marked as a duplicate of this bug. ***
Using webkit, available as a compile-time option, would be the way to go.
Does this patch put the description outside the properties dialog? It would look nice in the main window imo. Specially with podcasts using more than just some lines to describe the episode or edition. For example this one has images and some formatting: http://vocalfruits.com/develCuy/w090d7/
This doesn't change the way the podcast description is accessed. There's another bug (possibly more than one..) elsewhere about that. I've updated the patch; now it sort of looks like it's properly transparent, and navigation requests are handled using gtk_show_uri(). A fair number of podcasts I'm subscribed to use preformatted plain text descriptions, which look like crap when interpreted as HTML. We probably need to process the text a bit in that case.. g_content_guess_type doesn't seem to be very helpful here - if I don't give it a filename, it always returns text/plain, and if I give it 'x.html', it always returns text/html. Maybe I'll have to write a specialised bit of type guessing code for this situation.
(In reply to comment #21) <snip> > g_content_guess_type doesn't seem to be very helpful here - if I don't give it > a filename, it always returns text/plain, and if I give it 'x.html', it always > returns text/html. Maybe I'll have to write a specialised bit of type guessing > code for this situation. That sounds like a bug in g_content_guess_type(), or the way you're calling it (data_size correct?). Can you make some reproducer data available?
Created attachment 120852 [details] [review] guess description content type in test-parser This totem-pl-parser patch makes the test-parser program try to guess the content type for the description field. It almost always says text/plain; occasionally text/x-vhdl (!) and sometimes text/html. I guess the problem is that the descriptions we're dealing with aren't anywhere near being correct html documents, they're just fragments.
I updated this a bit, implemented some custom content type checking (because nothing else would ever work), made webkit behave itself, and pushed the result in commit f2b94b4.