After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 599640 - Messages are not correctly escaped for adium themes
Messages are not correctly escaped for adium themes
Status: RESOLVED FIXED
Product: empathy
Classification: Core
Component: Chat themes
unspecified
Other Linux
: Normal normal
: ---
Assigned To: empathy-maint
Depends on:
Blocks: 525576 597049
 
 
Reported: 2009-10-26 11:05 UTC by Xavier Claessens
Modified: 2009-11-26 09:29 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Xavier Claessens 2009-10-26 11:05:45 UTC
This is probably a big security issue: If smiley are disabled for chats, incoming messages are not correctly escaped, so html code can be sent to webkit for rendering. For example it makes possible for remote contact to send <frame> tag that load an external website inside empathy chat window.

However this can _maybe_ be mitigated by the fact that urls are prefixed with /home/<where adium theme is installed>. I'm not sure if there is a way for an attacker to workaround that to load internet page.

This issue shows lots of issues in the message parser we have for adium themes. It is not as easy as we could think in the first place because:

1) links, smileys and newlines must be replaced by html code <a>, <img/> and <br/>. So we can't escape the message *after* replacement.
2) if we escape *before* replacement, some links and smileys will be broken. For example ">:-)" for devil face won't work anymore because '>' is escaped to &gt; and that kind of message won't work neither:

hello check think link: "www.test.com"

because the " will be first replaced by &quot; which is not recognized as excluded from URL regex.

The solution is to escape only text between replacements. This is what my branch do.

Branch fixing those issues: http://git.collabora.co.uk/?p=user/xclaesse/empathy.git;a=shortlog;h=refs/heads/escape
Comment 1 Xavier Claessens 2009-10-26 11:09:54 UTC
Note that this patch could be a bit too much for GNOME 2.28. I could just escape before replacement as trivial fix for the security issue, even if that will break some smiley/link, but not that much...
Comment 2 Guillaume Desmottes 2009-10-26 11:17:17 UTC
Agreed; It's prefer to have a less intrusive patch for 2.28.
Comment 3 Guillaume Desmottes 2009-10-26 14:57:44 UTC
I pushed a smaller branch fixing the security issue in master and gnome-2.28.

You should probably rebase your branch on top of the new master.
Comment 4 Xavier Claessens 2009-10-26 15:39:20 UTC
rebased
Comment 5 Guillaume Desmottes 2009-11-09 13:54:24 UTC
Comments after a first quick and incomplete review:

• add a comment documenting the return type of empathy_smiley_manager_parse_len
• empathy_smiley_manager_parse_len: I don't understand the for loop: "*cur_str && cur_str - text < len"  Why are you substracting pointers?
• please document the different fields of EmpathySmileyHit
• Shouldn't EmpathySmileyHit.{start, end} be guint?
• empathy_smiley_manager_parse_len: please comment, this function is a bit cryptic
• len = G_MAXSSIZE;  why not use strlen (or g_utf8_strlen ?)

I'd be much more confident to merge such big refactoring if we had some tests...
Comment 6 Xavier Claessens 2009-11-09 14:32:19 UTC
(In reply to comment #5)

> • empathy_smiley_manager_parse_len: I don't understand the for loop: "*cur_str
> && cur_str - text < len"  Why are you substracting pointers?
cur_str is a pointer inside the string where we are atm, it start with cur_str==text. len is the number of char we want to parse after text. so if cur_str - text < len we have parsed enough and we can leave.

I will add comments to that function.

> • Shouldn't EmpathySmileyHit.{start, end} be guint?
I just copied GRegex that gives positions with gint. But guint could be better indeed.

> • len = G_MAXSSIZE;  why not use strlen (or g_utf8_strlen ?)
Because strlen needs to go through the whole string to find the '\0'. That's useless because the loop after we already do that.
Comment 7 Xavier Claessens 2009-11-16 08:28:39 UTC
Branch updated
Comment 8 Guillaume Desmottes 2009-11-16 12:41:59 UTC
As said, I'd really like to have some tests for empathy_smiley_manager_parse_len testing the different cases.
Comment 9 Xavier Claessens 2009-11-16 12:57:43 UTC
I tested with the logs of a test account I have. I made there all kind of messages that I try each time I change something. It's working fine.

I won't have time to write a proper test case, so if you think that's blocker please do it yourself, or I'll try to do it, but that won't be possible anytime soon.
Comment 10 Xavier Claessens 2009-11-24 14:57:01 UTC
I updated my branch with more stuff:
 - Made a better API for parsers.
 - Port EmpathyChatTextView to new parsers, now that the generic API make it possible.
 - Add tests framework for parsers. It needs adding more test cases, but that's made really easy.

Review needed :)
Comment 11 Gustavo Noronha (kov) 2009-11-25 17:51:51 UTC
Awesome work, just went through all the commits! +1 from me for merging, with the following comments:

		/* Replace smileys by a <img/> tag */
		g_string_append (string, "<abbr title=\"");
		g_string_append_len (string, text, len);
		g_string_append_printf (string, "\"><img src=\"%s\" alt=\"",
					hit->path);
		g_string_append_len (string, text, len);
		g_string_append (string, "\"/></abbr>");

I think we should use the title attribute for img here instead of adding an abbr tag.

----------------------------------

About SmileyManagerTree/SmileyHit:

I feel accessing hit->pixbuf directly is inefficient, because you end up allocating a pixbuf you will not use, when using adium themes. Can we use something other than tree->pixbuf to control whether we found a smiley, and a get method that loads the pixbuf lazily? I can work on a patch if you agree in principle.

----------------------------------

 +#define SCHEMES   "([a-zA-Z\\+]+)"

You also need [0-9] here. Some IANA-registered schemes do contain numbers in them. '.' and '-' are also accepted. Perhaps we need something like:

#define SCHEMES "([a-zA-Z][a-zA-Z\\+\\.-]+)"

The initial [a-z][A-Z] is here because schemes need to start with a letter.
Comment 12 Xavier Claessens 2009-11-26 08:53:43 UTC
(In reply to comment #11)
> Awesome work
Thanks :)

> just went through all the commits! +1 from me for merging, with
> the following comments:
> 
>         /* Replace smileys by a <img/> tag */
>         g_string_append (string, "<abbr title=\"");
>         g_string_append_len (string, text, len);
>         g_string_append_printf (string, "\"><img src=\"%s\" alt=\"",
>                     hit->path);
>         g_string_append_len (string, text, len);
>         g_string_append (string, "\"/></abbr>");
> 
> I think we should use the title attribute for img here instead of adding an
> abbr tag.

Ok, I'm fixing this.

> About SmileyManagerTree/SmileyHit:
> 
> I feel accessing hit->pixbuf directly is inefficient, because you end up
> allocating a pixbuf you will not use, when using adium themes. Can we use
> something other than tree->pixbuf to control whether we found a smiley, and a
> get method that loads the pixbuf lazily? I can work on a patch if you agree in
> principle.

I feel like it's a really ugly hack to write the path of icons like that. Do you know if it's possible to insert directly the pixbuf into webkit, or something? Will webkit cache the image data if it has the same path many times?

If it's not possible to do better, then indeed we should add a boolean in the tree node that tells if it's a smiley, and also a string with the icon-name. Then lazy init the pixbuf/path.

Note that with custom smileys (when implemented) we could have only the pixbuf with no icon-name, that's why I prefer having a boolean than relying on the icon name field.

Feel free to propose a patch ;)

>  +#define SCHEMES   "([a-zA-Z\\+]+)"
> 
> You also need [0-9] here. Some IANA-registered schemes do contain numbers in
> them. '.' and '-' are also accepted. Perhaps we need something like:
> 
> #define SCHEMES "([a-zA-Z][a-zA-Z\\+\\.-]+)"
> 
> The initial [a-z][A-Z] is here because schemes need to start with a letter.

You forgot 0-9 in your proposal. Please propose a patch with some tests cases.

Thanks :)