After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 320507 - Deal with HTML in Podcast Descriptions
Deal with HTML in Podcast Descriptions
Status: RESOLVED FIXED
Product: rhythmbox
Classification: Other
Component: Podcast
HEAD
Other Linux
: Normal normal
: ---
Assigned To: RhythmBox Maintainers
RhythmBox Maintainers
: 551429 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2005-11-02 14:11 UTC by Ryan P Skadberg
Modified: 2010-04-05 11:30 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
use webkit? (57.86 KB, patch)
2007-10-26 13:26 UTC, Jonathan Matthew
needs-work Details | Review
guess description content type in test-parser (864 bytes, patch)
2008-10-19 03:15 UTC, Jonathan Matthew
none Details | Review

Description Ryan P Skadberg 2005-11-02 14:11:09 UTC
Distribution/Version: Fedora Core Rawhide

Right now if you click on the properties of a Podcast Entry, you get the source,
so if there is HTML, it looks like crap.
Comment 1 James "Doc" Livingston 2005-11-03 06:10:20 UTC
As far as I can tell, the description element isn't supposed to contain HTML.
Although none of them are "official", several pages about podcast feeds say that
you can't put anything besides plain text in them (such as
http://phobos.apple.com/static/iTunesRSS.html).

If we did want to support html-in-descriptions we'd have to pull in a HTML
renderer as a dependency, which I don't think is something we should do.
Comment 2 Ryan P Skadberg 2005-11-03 14:49:23 UTC
The description element is an RSS thing, not a podcast thing and HTML is
definitely allowed.  See http://blogs.law.harvard.edu/tech/rss
Comment 3 Bastien Nocera 2005-11-13 17:42:30 UTC
James, how about simply stripping out the HTML?
Skadz, could you provide a link to an RSS feed with that problem?
Comment 4 Ryan P Skadberg 2005-11-13 18:14:44 UTC
Stripping sorta works, but something like this one
(http://www.coverville.com/index.xml), it has tables, which still won't
translate all that well even with the HTML stripped out.
Comment 5 Bastien Nocera 2005-11-13 21:52:28 UTC
Thanks for the link.
Comment 6 William Jon McCann 2005-12-12 15:32:37 UTC
http://phobos.apple.com/static/iTunesRSS.html#_Toc526931676 seems pretty clear
that this shouldn't be allowed.  WONTFIX?
Comment 7 Ryan P Skadberg 2005-12-12 15:50:48 UTC
Incorrect.

First off, it says:

CDATA sections are strongly discouraged.

CDATA can contain HTML and it does not forbid it.

Second, though this may be true for the iTunes elements, it is very much not
true for RSS feeds in general and since we use the RSS description element, it
can include HTML with no problem.
Comment 8 Wade Mealing 2005-12-23 01:52:17 UTC
Alternatively, could you have simple gtkhtml support ? Better than nothing ?

How many podcasts actually use html in the description ?
Comment 9 Baptiste Mille-Mathias 2006-01-03 21:37:30 UTC
copy me
Comment 10 Baptiste Mille-Mathias 2006-01-17 16:19:34 UTC
Change to "Podcast" Component
Comment 11 Carlo Facci 2006-11-30 20:33:35 UTC
This bug has been described also at
https://bugs.launchpad.net/distros/ubuntu/+source/rhythmbox/+bug/64917

Thanks
Comment 12 Martin Ahnelöv 2007-08-01 12:17:20 UTC
Wade: Quiet many, actually. Some highlights are both LUGRadio and the Linux Action Show.
Comment 13 Jonathan Matthew 2007-10-26 13:26:40 UTC
Created attachment 97917 [details] [review]
use webkit?

This more or less works (with the current webkit-gtk packages in debian unstable), but:
- links need to be handled externally (gnome_vfs_url_show or something) rather than showing the new page in the podcast properties dialog
- probably need to do something with font settings
- maybe we could use a "user" css file to set the background to match the dialog? (I guess this could do fonts too)

Memory usage impact is pretty small.  Virtual size seems to be 30MB bigger, but resident size is unchanged until it actually get used, at which point it grows about 3MB.
Comment 14 Jonathan Matthew 2008-04-08 12:27:51 UTC
There's some useful stuff that could be done here - webkit/gtk svn has the ability to set a transparent background, for one, aside from the items I already mentioned.
Comment 15 alp 2008-06-17 10:55:06 UTC
(In reply to comment #13)
> Created an attachment (id=97917) [edit]
> use webkit?
> 
> This more or less works (with the current webkit-gtk packages in debian
> unstable), but:
> - links need to be handled externally (gnome_vfs_url_show or something) rather
> than showing the new page in the podcast properties dialog

You can see how to do this in Blam SVN.

> - probably need to do something with font settings

Can be done with WebSettings.

> - maybe we could use a "user" css file to set the background to match the
> dialog? (I guess this could do fonts too)

Real transparent background should be in by 2.24.

The WebKit API is stable now if you want to take a shot at landing this (it'll need forward-porting to the current WebView-based API).
Comment 16 Martin Ahnelöv 2008-06-17 11:08:16 UTC
(In reply to comment #15)
> (In reply to comment #13)
> > Created an attachment (id=97917) [edit]
> > use webkit?
> > 
> > This more or less works (with the current webkit-gtk packages in debian
> > unstable), but:
> > - links need to be handled externally (gnome_vfs_url_show or something) rather
> > than showing the new page in the podcast properties dialog
> 
> You can see how to do this in Blam SVN.
> 
> > - probably need to do something with font settings
> 
> Can be done with WebSettings.
> 
> > - maybe we could use a "user" css file to set the background to match the
> > dialog? (I guess this could do fonts too)
> 
> Real transparent background should be in by 2.24.
> 
> The WebKit API is stable now if you want to take a shot at landing this (it'll
> need forward-porting to the current WebView-based API).
> 

Can't we go the easy route and use some tags (a, b, i, em, p, br, lists, etc) and strip out everything else (like tables) and just ignore css all togeather? How much do we need anyway?
Comment 17 Bastien Nocera 2008-06-17 13:25:28 UTC
(In reply to comment #16)
<snip>
> Can't we go the easy route and use some tags (a, b, i, em, p, br, lists, etc)
> and strip out everything else (like tables) and just ignore css all togeather?

Because it would look arse, would be just as hard as using WebKit (if not harder to do nicely), and we'd be losing information.
Comment 18 Baptiste Mille-Mathias 2008-09-08 21:44:27 UTC
*** Bug 551429 has been marked as a duplicate of this bug. ***
Comment 19 Bastien Nocera 2008-09-08 22:06:43 UTC
Using webkit, available as a compile-time option, would be the way to go.
Comment 20 Diego Escalante Urrelo (not reading bugmail) 2008-10-13 02:34:27 UTC
Does this patch put the description outside the properties dialog? It would look nice in the main window imo. Specially with podcasts using more than just some lines to describe the episode or edition. For example this one has images and some formatting:
  http://vocalfruits.com/develCuy/w090d7/
Comment 21 Jonathan Matthew 2008-10-18 13:56:26 UTC
This doesn't change the way the podcast description is accessed.  There's another bug (possibly more than one..) elsewhere about that.

I've updated the patch; now it sort of looks like it's properly transparent, and navigation requests are handled using gtk_show_uri().

A fair number of podcasts I'm subscribed to use preformatted plain text descriptions, which look like crap when interpreted as HTML.  We probably need to process the text a bit in that case..

g_content_guess_type doesn't seem to be very helpful here - if I don't give it a filename, it always returns text/plain, and if I give it 'x.html', it always returns text/html.  Maybe I'll have to write a specialised bit of type guessing code for this situation.
Comment 22 Bastien Nocera 2008-10-18 19:58:26 UTC
(In reply to comment #21)
<snip>
> g_content_guess_type doesn't seem to be very helpful here - if I don't give it
> a filename, it always returns text/plain, and if I give it 'x.html', it always
> returns text/html.  Maybe I'll have to write a specialised bit of type guessing
> code for this situation.

That sounds like a bug in g_content_guess_type(), or the way you're calling it (data_size correct?). Can you make some reproducer data available?
Comment 23 Jonathan Matthew 2008-10-19 03:15:56 UTC
Created attachment 120852 [details] [review]
guess description content type in test-parser

This totem-pl-parser patch makes the test-parser program try to guess the content type for the description field.  It almost always says text/plain; occasionally text/x-vhdl (!) and sometimes text/html.

I guess the problem is that the descriptions we're dealing with aren't anywhere near being correct html documents, they're just fragments.
Comment 24 Jonathan Matthew 2010-04-05 11:30:52 UTC
I updated this a bit, implemented some custom content type checking (because nothing else would ever work), made webkit behave itself, and pushed the result in commit f2b94b4.