After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 302579 - Strip and unescape HTML in titles
Strip and unescape HTML in titles
Status: RESOLVED FIXED
Product: blam
Classification: Other
Component: General
1.6.x
Other All
: Normal normal
: ---
Assigned To: Mikael Hallendal
Mikael Hallendal
: 156269 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2005-04-30 20:52 UTC by Heath Harrelson
Modified: 2005-05-15 11:55 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to unescape HTML entities in Atom feed and entry titles (1.53 KB, patch)
2005-04-30 20:53 UTC, Heath Harrelson
none Details | Review
Patch that strips HTML tags and entities from Item titles. (5.87 KB, patch)
2005-05-09 03:19 UTC, Heath Harrelson
none Details | Review

Description Heath Harrelson 2005-04-30 20:52:37 UTC
At least with Atom feeds, it's legal for titles to contain HTML content,
including  escaped character entities (i.e. ", &, ’, etc.).  At
present Blam! is just stuffing the content of the title into a string without
unescaping it.  This results in titles like "I Swear I’ Stop Soon."

Titles should be unescaped when the feed is updated.  I have provided a patch. 
It's a little funny looking because of namespace clashing between
Imendio.Blam.Utils and Atom.Utils, but I've been using it here for the last few
days, and it seems to work.

This could also be done to RSS titles, if you think it's appropriate.  This
might fix bug bug 169815 and some instances of 156269.  I can easily modify the
patch, if desired.
Comment 1 Heath Harrelson 2005-04-30 20:53:40 UTC
Created attachment 45876 [details] [review]
Patch to unescape HTML entities in Atom feed and entry titles
Comment 2 Mikael Hallendal 2005-04-30 22:10:47 UTC
We also need to strip away any HTML since the GTK+ widgets will only display that. Adding Patch 
keyword.
Comment 3 Heath Harrelson 2005-05-03 05:57:02 UTC
Just a comment to show I haven't forgotten this bug -- I'm currently working on
a much better version of this patch that strips HTML tags using a regex and
unescapes all the entities for both RSS and Atom feeds.  I should finish it
tomorrow or the next day, assuming I continue to progress at the same pace.
Comment 4 Mikael Hallendal 2005-05-03 06:20:34 UTC
That would be really awesome!
Comment 5 Mikael Hallendal 2005-05-08 10:31:55 UTC
*** Bug 156269 has been marked as a duplicate of this bug. ***
Comment 6 Heath Harrelson 2005-05-09 03:18:29 UTC
I'm attaching an initial patch that decodes entities and strips HTML in titles.
 I'm using a pretty complicated regex to remove XHTML/HTML tags and comments,
and as a result, blam should leave alone titles like <rant>GRRR!</rant>.  It
seems to work for me, and I've created some pretty obscene titles in the last
couple of days, but it could use some testing anyway.

Two notes about the patch: 

The patch is against the CVS as of Saturday.  I would have updated it against
HEAD, but anon CVS is being unruly.  My patch applies cleanly against the 1.8.0
tarball though, so I'm guessing it will work with CVS HEAD.  Let me know if it
needs fixing.

The patch removes the need for the EncodeUnicode method in ItemView.cs, but I
didn't remove the method from the source.  If the current patch works,
EncodeUnicode should probably be removed.
Comment 7 Heath Harrelson 2005-05-09 03:19:51 UTC
Created attachment 46210 [details] [review]
Patch that strips HTML tags and entities from Item titles.
Comment 8 Mikael Hallendal 2005-05-13 06:43:15 UTC
Just to let you know, I haven't forgotten this, just had a stressful week.
Comment 9 Mikael Hallendal 2005-05-13 06:45:05 UTC
Heath suggests that this also solves bug 170644
Comment 10 Mikael Hallendal 2005-05-15 11:55:07 UTC
Thanks, I've commited the patch, worked like a charm. But shouldn't something like the HtmlUtils class 
you produced exist inside of Mono?