After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 791168 - Add protection against IDN homograph attacks
Add protection against IDN homograph attacks
Status: RESOLVED FIXED
Product: epiphany
Classification: Core
Component: General
git master
Other Linux
: Normal normal
: ---
Assigned To: Gabriel Ivașcu
Epiphany Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-12-03 15:05 UTC by Gabriel Ivașcu
Modified: 2018-01-06 17:08 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
uri-helpers: Implement Mozilla's IDN display algorithm (4.11 KB, patch)
2017-12-09 09:49 UTC, Gabriel Ivașcu
none Details | Review
uri-helpers: Implement Mozilla's IDN display algorithm (10.59 KB, patch)
2017-12-17 15:32 UTC, Gabriel Ivașcu
committed Details | Review

Description Gabriel Ivașcu 2017-12-03 15:05:52 UTC
From https://bugs.webkit.org/show_bug.cgi?id=174816#c0:

"""If we add this property, we would need to include a defense against IDN homograph attacks. Epiphany does not currently have such a defense. So some research on IDN homograph defenses will be needed. I like the Firefox approach better than Chrome's approach. (Chrome completely bans characters that look like Latin characters, essentially banning Cyrillic URIs entirely. I'd rather have lower security than harm Cyrillic users like that.)"""
Comment 1 Gabriel Ivașcu 2017-12-09 09:49:13 UTC
Created attachment 365286 [details] [review]
uri-helpers: Implement Mozilla's IDN display algorithm

https://wiki.mozilla.org/IDN_Display_Algorithm#Algorithm
Comment 2 Michael Catanzaro 2017-12-11 00:29:18 UTC
Review of attachment 365286 [details] [review]:

Can you add a couple test cases in ephy-uri-helpers-test.c? It's not often that we have the ability to easily test some component of Epiphany; let's take advantage of it.

::: lib/ephy-uri-helpers.c
@@ +261,3 @@
+
+  value = g_hash_table_lookup (table, GINT_TO_POINTER (script));
+  new_value = GINT_TO_POINTER (GPOINTER_TO_INT (value) + 1);

I see it's a table tracking how many times a character of a particular script appears in the domain. But this required too much squinting for me to figure out, due to all the int/pointer conversions that are needed to use GHashTable properly. So please add a comment to explain how the table works.

@@ +276,3 @@
+
+static gboolean
+domain_is_safe (const char *domain)

What is "safe?" It needs a quick documentation comment, and a link to the Mozilla wiki, as well.

@@ +286,3 @@
+  g_assert (domain);
+
+  if (!g_utf8_validate (domain, -1, NULL))

This seems like a useful check to perform... but the domain you're passing into domain_is_safe() is still percent-encoded. I think it would be better to percent-decode it before checking if it is valid UTF-8. In fact, we surely need to percent-decode before running any of the checks in this function, otherwise it's possible to defeat the checking by percent-encoding a character. (Percent-encoding is another way to encode special characters in URIs... e.g. %20 is space.)

That will require a bit of thinking as to how to reorganize ephy_uri_decode(). Right now you're able to nicely choose to just throw out the punycode translation in the case where domain_is_safe() returns FALSE, by not assigning the decoded hostname to uri->host. That won't be possible anymore. I think you need to save the original hostname somewhere, percent decode the URI, then run your domain_is_safe() checks, and then revert back to the original hostname if it's not considered safe.

@@ +290,3 @@
+
+  if (!table)
+    table = g_hash_table_new (g_direct_hash, g_direct_equal);

I was thinking that you'll need to use GOnce here, because this function is supposed to be threadsafe. That's why we have a mutex around the initialization of the UIDNA object.

But...

@@ +292,3 @@
+    table = g_hash_table_new (g_direct_hash, g_direct_equal);
+  else
+    g_hash_table_remove_all (table);

...so you delete everything from the table each time this function is called? That means it doesn't have to be static at all, then, right? That would simplify things nicely, so you won't need a mutex.

@@ +316,3 @@
+  /* Single script, allow. */
+  if (g_hash_table_size (table) < 2)
+    return TRUE;

With the hash table no longer static, this would become:

if (...) {
  result = TRUE;
  goto out;
}

And then out would be at the bottom:

out:
  g_hash_table_destroy (table);
  return result;

@@ +347,3 @@
+    return FALSE;
+
+  /* Allow Latin + any other single script. */

OK, one final complaint. The algorithm is supposed to be run on a URI label, not the entire domain. A label is any component of the URI separated from other components by a dot. So, for example, the domain bugzilla.gnome.org (SoupURI host component) actually has three labels, that should be considered separately by the algorithm. One label might be considered safe, while another might not be. So that, too, will require a little more work in ephy_uri_decode() to get right.
Comment 3 Gabriel Ivașcu 2017-12-12 20:26:30 UTC
(In reply to Michael Catanzaro from comment #2)
> Can you add a couple test cases in ephy-uri-helpers-test.c? It's not often
> that we have the ability to easily test some component of Epiphany; let's
> take advantage of it.

OK.

> @@ +276,3 @@
> +
> +static gboolean
> +domain_is_safe (const char *domain)
> 
> What is "safe?" It needs a quick documentation comment, and a link to the
> Mozilla wiki, as well.

I meant safe as in safe for display. I guess we can come up with a better name though.

> @@ +286,3 @@
> +  g_assert (domain);
> +
> +  if (!g_utf8_validate (domain, -1, NULL))
> 
> This seems like a useful check to perform... but the domain you're passing
> into domain_is_safe() is still percent-encoded. I think it would be better
> to percent-decode it before checking if it is valid UTF-8. In fact, we
> surely need to percent-decode before running any of the checks in this
> function, otherwise it's possible to defeat the checking by percent-encoding
> a character. (Percent-encoding is another way to encode special characters
> in URIs... e.g. %20 is space.)

You're right, thanks for pointing it out.
 
> @@ +347,3 @@
> +    return FALSE;
> +
> +  /* Allow Latin + any other single script. */
> 
> OK, one final complaint. The algorithm is supposed to be run on a URI label,
> not the entire domain. A label is any component of the URI separated from
> other components by a dot. So, for example, the domain bugzilla.gnome.org
> (SoupURI host component) actually has three labels, that should be
> considered separately by the algorithm. One label might be considered safe,
> while another might not be. So that, too, will require a little more work in
> ephy_uri_decode() to get right.

Ah, I missed that. OK.
Comment 4 Gabriel Ivașcu 2017-12-17 15:32:10 UTC
Created attachment 365640 [details] [review]
uri-helpers: Implement Mozilla's IDN display algorithm

https://wiki.mozilla.org/IDN_Display_Algorithm#Algorithm
Comment 5 Michael Catanzaro 2017-12-17 18:36:38 UTC
Review of attachment 365640 [details] [review]:

OK, good. Thanks!

You can move this into Epiphany now, but next step is to move all this code into WebKit. I failed to get a recommendation from Carlos Garcia as to where he wants it, so I would just add a new file (WebKitURIUtilities.h) and a new section in the documentation "URI Utilities" for them. We can always reorganize those later; the exact layout of the headers is not API.

::: lib/ephy-uri-helpers.h
@@ +30,3 @@
+char      *ephy_uri_normalize             (const char *uri);
+char      *ephy_uri_to_security_origin    (const char *uri);
+gboolean   ephy_uri_validate_label        (const char *label);

I think this should be a private static function. You can rewrite the test to just use ephy_uri_decode() and check which labels in the URI were not decoded, right?
Comment 6 Gabriel Ivașcu 2017-12-17 19:11:45 UTC
(In reply to Michael Catanzaro from comment #5)
> Review of attachment 365640 [details] [review] [review]:
>  
> ::: lib/ephy-uri-helpers.h
> @@ +30,3 @@
> +char      *ephy_uri_normalize             (const char *uri);
> +char      *ephy_uri_to_security_origin    (const char *uri);
> +gboolean   ephy_uri_validate_label        (const char *label);
> 
> I think this should be a private static function. You can rewrite the test
> to just use ephy_uri_decode() and check which labels in the URI were not
> decoded, right?

OK.
Comment 7 Gabriel Ivașcu 2017-12-17 19:12:36 UTC
Attachment 365640 [details] pushed as 0459be7 - uri-helpers: Implement Mozilla's IDN display algorithm
Comment 8 Michael Catanzaro 2017-12-17 20:10:12 UTC
(In reply to Michael Catanzaro from comment #5)
> I think this should be a private static function. You can rewrite the test
> to just use ephy_uri_decode() and check which labels in the URI were not
> decoded, right?

You need to remove it from the header now that you've marked it as static, and not try to use it in the test. That's not going to build. ;)
Comment 9 Gabriel Ivașcu 2017-12-17 20:15:45 UTC
(In reply to Michael Catanzaro from comment #8)
> (In reply to Michael Catanzaro from comment #5)
> > I think this should be a private static function. You can rewrite the test
> > to just use ephy_uri_decode() and check which labels in the URI were not
> > decoded, right?
> 
> You need to remove it from the header now that you've marked it as static,
> and not try to use it in the test. That's not going to build. ;)

I did. I edited the patch before pushing it (https://git.gnome.org/browse/epiphany/commit/?id=0459be7). I only forgot to make the function static in the source file, so I made a follow-up commit.
Comment 10 Michael Catanzaro 2017-12-17 20:22:40 UTC
Ah, OK then. Thanks.