GNOME Bugzilla – Bug 302579
Strip and unescape HTML in titles
Last modified: 2005-05-15 11:55:07 UTC
At least with Atom feeds, it's legal for titles to contain HTML content, including escaped character entities (i.e. ", &, ’, etc.). At present Blam! is just stuffing the content of the title into a string without unescaping it. This results in titles like "I Swear I’ Stop Soon." Titles should be unescaped when the feed is updated. I have provided a patch. It's a little funny looking because of namespace clashing between Imendio.Blam.Utils and Atom.Utils, but I've been using it here for the last few days, and it seems to work. This could also be done to RSS titles, if you think it's appropriate. This might fix bug bug 169815 and some instances of 156269. I can easily modify the patch, if desired.
Created attachment 45876 [details] [review] Patch to unescape HTML entities in Atom feed and entry titles
We also need to strip away any HTML since the GTK+ widgets will only display that. Adding Patch keyword.
Just a comment to show I haven't forgotten this bug -- I'm currently working on a much better version of this patch that strips HTML tags using a regex and unescapes all the entities for both RSS and Atom feeds. I should finish it tomorrow or the next day, assuming I continue to progress at the same pace.
That would be really awesome!
*** Bug 156269 has been marked as a duplicate of this bug. ***
I'm attaching an initial patch that decodes entities and strips HTML in titles. I'm using a pretty complicated regex to remove XHTML/HTML tags and comments, and as a result, blam should leave alone titles like <rant>GRRR!</rant>. It seems to work for me, and I've created some pretty obscene titles in the last couple of days, but it could use some testing anyway. Two notes about the patch: The patch is against the CVS as of Saturday. I would have updated it against HEAD, but anon CVS is being unruly. My patch applies cleanly against the 1.8.0 tarball though, so I'm guessing it will work with CVS HEAD. Let me know if it needs fixing. The patch removes the need for the EncodeUnicode method in ItemView.cs, but I didn't remove the method from the source. If the current patch works, EncodeUnicode should probably be removed.
Created attachment 46210 [details] [review] Patch that strips HTML tags and entities from Item titles.
Just to let you know, I haven't forgotten this, just had a stressful week.
Heath suggests that this also solves bug 170644
Thanks, I've commited the patch, worked like a charm. But shouldn't something like the HtmlUtils class you produced exist inside of Mono?