GNOME Bugzilla – Bug 599640
Messages are not correctly escaped for adium themes
Last modified: 2009-11-26 09:29:36 UTC
This is probably a big security issue: If smiley are disabled for chats, incoming messages are not correctly escaped, so html code can be sent to webkit for rendering. For example it makes possible for remote contact to send <frame> tag that load an external website inside empathy chat window. However this can _maybe_ be mitigated by the fact that urls are prefixed with /home/<where adium theme is installed>. I'm not sure if there is a way for an attacker to workaround that to load internet page. This issue shows lots of issues in the message parser we have for adium themes. It is not as easy as we could think in the first place because: 1) links, smileys and newlines must be replaced by html code <a>, <img/> and <br/>. So we can't escape the message *after* replacement. 2) if we escape *before* replacement, some links and smileys will be broken. For example ">:-)" for devil face won't work anymore because '>' is escaped to > and that kind of message won't work neither: hello check think link: "www.test.com" because the " will be first replaced by " which is not recognized as excluded from URL regex. The solution is to escape only text between replacements. This is what my branch do. Branch fixing those issues: http://git.collabora.co.uk/?p=user/xclaesse/empathy.git;a=shortlog;h=refs/heads/escape
Note that this patch could be a bit too much for GNOME 2.28. I could just escape before replacement as trivial fix for the security issue, even if that will break some smiley/link, but not that much...
Agreed; It's prefer to have a less intrusive patch for 2.28.
I pushed a smaller branch fixing the security issue in master and gnome-2.28. You should probably rebase your branch on top of the new master.
rebased
Comments after a first quick and incomplete review: • add a comment documenting the return type of empathy_smiley_manager_parse_len • empathy_smiley_manager_parse_len: I don't understand the for loop: "*cur_str && cur_str - text < len" Why are you substracting pointers? • please document the different fields of EmpathySmileyHit • Shouldn't EmpathySmileyHit.{start, end} be guint? • empathy_smiley_manager_parse_len: please comment, this function is a bit cryptic • len = G_MAXSSIZE; why not use strlen (or g_utf8_strlen ?) I'd be much more confident to merge such big refactoring if we had some tests...
(In reply to comment #5) > • empathy_smiley_manager_parse_len: I don't understand the for loop: "*cur_str > && cur_str - text < len" Why are you substracting pointers? cur_str is a pointer inside the string where we are atm, it start with cur_str==text. len is the number of char we want to parse after text. so if cur_str - text < len we have parsed enough and we can leave. I will add comments to that function. > • Shouldn't EmpathySmileyHit.{start, end} be guint? I just copied GRegex that gives positions with gint. But guint could be better indeed. > • len = G_MAXSSIZE; why not use strlen (or g_utf8_strlen ?) Because strlen needs to go through the whole string to find the '\0'. That's useless because the loop after we already do that.
Branch updated
As said, I'd really like to have some tests for empathy_smiley_manager_parse_len testing the different cases.
I tested with the logs of a test account I have. I made there all kind of messages that I try each time I change something. It's working fine. I won't have time to write a proper test case, so if you think that's blocker please do it yourself, or I'll try to do it, but that won't be possible anytime soon.
I updated my branch with more stuff: - Made a better API for parsers. - Port EmpathyChatTextView to new parsers, now that the generic API make it possible. - Add tests framework for parsers. It needs adding more test cases, but that's made really easy. Review needed :)
Awesome work, just went through all the commits! +1 from me for merging, with the following comments: /* Replace smileys by a <img/> tag */ g_string_append (string, "<abbr title=\""); g_string_append_len (string, text, len); g_string_append_printf (string, "\"><img src=\"%s\" alt=\"", hit->path); g_string_append_len (string, text, len); g_string_append (string, "\"/></abbr>"); I think we should use the title attribute for img here instead of adding an abbr tag. ---------------------------------- About SmileyManagerTree/SmileyHit: I feel accessing hit->pixbuf directly is inefficient, because you end up allocating a pixbuf you will not use, when using adium themes. Can we use something other than tree->pixbuf to control whether we found a smiley, and a get method that loads the pixbuf lazily? I can work on a patch if you agree in principle. ---------------------------------- +#define SCHEMES "([a-zA-Z\\+]+)" You also need [0-9] here. Some IANA-registered schemes do contain numbers in them. '.' and '-' are also accepted. Perhaps we need something like: #define SCHEMES "([a-zA-Z][a-zA-Z\\+\\.-]+)" The initial [a-z][A-Z] is here because schemes need to start with a letter.
(In reply to comment #11) > Awesome work Thanks :) > just went through all the commits! +1 from me for merging, with > the following comments: > > /* Replace smileys by a <img/> tag */ > g_string_append (string, "<abbr title=\""); > g_string_append_len (string, text, len); > g_string_append_printf (string, "\"><img src=\"%s\" alt=\"", > hit->path); > g_string_append_len (string, text, len); > g_string_append (string, "\"/></abbr>"); > > I think we should use the title attribute for img here instead of adding an > abbr tag. Ok, I'm fixing this. > About SmileyManagerTree/SmileyHit: > > I feel accessing hit->pixbuf directly is inefficient, because you end up > allocating a pixbuf you will not use, when using adium themes. Can we use > something other than tree->pixbuf to control whether we found a smiley, and a > get method that loads the pixbuf lazily? I can work on a patch if you agree in > principle. I feel like it's a really ugly hack to write the path of icons like that. Do you know if it's possible to insert directly the pixbuf into webkit, or something? Will webkit cache the image data if it has the same path many times? If it's not possible to do better, then indeed we should add a boolean in the tree node that tells if it's a smiley, and also a string with the icon-name. Then lazy init the pixbuf/path. Note that with custom smileys (when implemented) we could have only the pixbuf with no icon-name, that's why I prefer having a boolean than relying on the icon name field. Feel free to propose a patch ;) > +#define SCHEMES "([a-zA-Z\\+]+)" > > You also need [0-9] here. Some IANA-registered schemes do contain numbers in > them. '.' and '-' are also accepted. Perhaps we need something like: > > #define SCHEMES "([a-zA-Z][a-zA-Z\\+\\.-]+)" > > The initial [a-z][A-Z] is here because schemes need to start with a letter. You forgot 0-9 in your proposal. Please propose a patch with some tests cases. Thanks :)