After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 612078 - Epiphany does not properly lowercase host names
Epiphany does not properly lowercase host names
Status: RESOLVED OBSOLETE
Product: epiphany
Classification: Core
Component: General
2.29.x
Other Linux
: Normal normal
: ---
Assigned To: Epiphany Maintainers
Epiphany Maintainers
Depends on:
Blocks:
 
 
Reported: 2010-03-07 12:22 UTC by Wouter Bolsterlee (uws)
Modified: 2015-02-07 16:40 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch against Epiphany 2.30 (1.24 KB, patch)
2010-04-06 22:22 UTC, Alexandre Martani
reviewed Details | Review
Test (2.51 KB, patch)
2010-04-09 03:44 UTC, Alexandre Martani
none Details | Review
Test (2.62 KB, patch)
2010-04-09 21:44 UTC, Alexandre Martani
needs-work Details | Review
Patch against Epiphany 2.30.2 (1.25 KB, patch)
2010-04-09 23:29 UTC, Alexandre Martani
needs-work Details | Review
Patch against Epiphany 2.30.2 (1.10 KB, patch)
2010-04-12 03:17 UTC, Alexandre Martani
none Details | Review
Testcase (2.92 KB, patch)
2010-04-12 03:19 UTC, Alexandre Martani
none Details | Review
Patch against Epiphany 2.30.2 (1.10 KB, patch)
2010-04-12 19:19 UTC, Alexandre Martani
needs-work Details | Review
Testcase (3.28 KB, patch)
2010-04-12 19:24 UTC, Alexandre Martani
none Details | Review
Testcase (3.27 KB, patch)
2010-04-13 04:37 UTC, Alexandre Martani
needs-work Details | Review
Patch against Epiphany 2.30.2 (5.08 KB, patch)
2010-04-19 03:54 UTC, Alexandre Martani
none Details | Review

Description Wouter Bolsterlee (uws) 2010-03-07 12:22:34 UTC
Epiphany does not convert URLs like http://PLANET.GNOME.ORG/ into the proper lowercase spelling (hostname only, of course). The result is a) messed up history and b) ugly. :)
Comment 1 Alexandre Martani 2010-04-06 22:22:11 UTC
Created attachment 158087 [details] [review]
Patch against Epiphany 2.30

Note that this is my first patch on a serious C project, sorry for mistakes.
Comment 2 Gustavo Noronha (kov) 2010-04-08 22:00:10 UTC
Review of attachment 158087 [details] [review]:

Overall looks good to me. I do wonder if we shouldn't fix this in libsoup instead, though. I would like to have danw's opinion.

::: embed/ephy-embed-utils.c
@@ +118,3 @@
+	host = g_ascii_strdown (soup_uri->host, -1);
+	soup_uri_set_host (soup_uri, host);
+	g_free (effective_address);

I would suggest freeing the resources right after their last usage, and add some blank lines. It makes the code more readable, some times.
Comment 3 Gustavo Noronha (kov) 2010-04-08 22:00:47 UTC
What do you think, Dan?
Comment 4 Gustavo Noronha (kov) 2010-04-08 22:01:43 UTC
hmmm... on second thought, fixing this in libsoup wouldn't do us any good, would it? So perhaps we should go ahead with this.
Comment 5 Alexandre Martani 2010-04-09 01:24:13 UTC
I think it might break some non http(s) urls, especifically javascript: links. I am writing a test case for this.
Comment 6 Dan Winship 2010-04-09 02:10:39 UTC
yeah, i don't think it makes sense to do it in libsoup; the uppercase version is still correct from a network point-of-view, it's just ugly for display/history
Comment 7 Alexandre Martani 2010-04-09 03:42:57 UTC
javascript: links and bookmarklets still work with this patch, which means these address doesn't reach this code. But I don't know if there is any other scheme which could get corrupted.
Comment 8 Alexandre Martani 2010-04-09 03:44:55 UTC
Created attachment 158252 [details] [review]
Test

I have never used glib test framework before, I don't know if there is some way to make it cleaner. My initial patch passes it.
Comment 9 Diego Escalante Urrelo (not reading bugmail) 2010-04-09 07:29:11 UTC
Review of attachment 158252 [details] [review]:

Nice! Our first contributed test! :-).

As a side project (after this bug) wouldn't you like to expand this test to cover more API of embed utils? blink-blink.

It looks good overall, perhaps someone has comments about what urls to test.

::: tests/testephyembedutils.c
@@ +33,3 @@
+    "http://www.example.com/path/to/some/file.php",
+    "www.example.com",
+    "exemple.com/path/",

Typo

@@ +41,3 @@
+    "http://www.example.com/path/to/some/file.php",
+    "http://www.example.com/",
+    "http://exemple.com/path/",

Typo
Comment 10 Alexandre Martani 2010-04-09 21:44:26 UTC
Created attachment 158331 [details] [review]
Test

Fixed typo and added url to test lowercase.

I'll look at the other embed utils API after this :)
Comment 11 Alexandre Martani 2010-04-09 23:29:51 UTC
Created attachment 158342 [details] [review]
Patch against Epiphany 2.30.2

Applying suggestion from Gustavo Noronha.
Comment 12 Diego Escalante Urrelo (not reading bugmail) 2010-04-11 05:23:52 UTC
Alexandre, for the records you can put later unit tests work here: bug #613756. If that one fills too much maybe we can open another one, but for now that bug works fine.
Comment 13 Diego Escalante Urrelo (not reading bugmail) 2010-04-11 05:27:43 UTC
Review of attachment 158342 [details] [review]:

Two comments.

::: embed/ephy-embed-utils.c
@@ +29,2 @@
 #include <glib/gi18n.h>
+#include <gio/gio.h>

This is a mistake I guess, I don't see any use of gio stuff.

@@ +123,3 @@
+	g_free (effective_address);
+
+	host = g_ascii_strdown (soup_uri->host, -1);

AFAIK UTF8 is allowed in domain names. So "ñu.cl" would be unpredictable here, for example.
Comment 14 Diego Escalante Urrelo (not reading bugmail) 2010-04-11 05:31:18 UTC
Review of attachment 158331 [details] [review]:

Take into account UTF8 domains, it's possible to register ágape.es as a domain name in some TLDs.
Even further, I think you could throw some intentionally broken stuff in there, stuff that Soup doesn't recognize so that you make sure we are handling that case also (for example "HELLO HELLO").

::: tests/testephyembedutils.c
@@ +38,3 @@
+    "WwW.ExAmPlE.CoM/Path/Is/Case/Sensitive.HTml",
+    ""
+  };

You need UTF8 examples here. Same below.
Comment 15 Alexandre Martani 2010-04-12 03:07:10 UTC
UTF8 domains should be handled as rfc3490 [1]. Currently, Epiphany does this. I don't know if this is done by webkit.

With my patch, UTF-8 domains got altered by libsoup (I think it encodes it like GET parameters):
"áéíóúâêîôûüöäÜÖÄß.example" becomes "http://%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA%C3%A2%C3%AA%C3%AE%C3%B4%C3%BB%C3%BC%C3%B6%C3%A4%C3%9C%C3%96%C3%84%C3%9F.example/"

It still works, although I don't know if it is the right behavior. Should I do the test anyway, using this encoded string as an expected output?

[1] http://www.ietf.org/rfc/rfc3490.txt
Comment 16 Alexandre Martani 2010-04-12 03:17:36 UTC
Created attachment 158457 [details] [review]
Patch against Epiphany 2.30.2

Removed gio.h include and changed lowercase function to g_utf8_strdown.
Comment 17 Alexandre Martani 2010-04-12 03:19:59 UTC
Created attachment 158458 [details] [review]
Testcase

Test UTF-8 domain name.
Comment 18 Diego Escalante Urrelo (not reading bugmail) 2010-04-12 04:09:06 UTC
Mmmm. Not sure now about the utf8 thing. I think you are right. Dan or Xan should know. Try to ping them on #epiphany tomorrow.
Comment 19 Dan Winship 2010-04-12 13:41:43 UTC
Yeah, URIs are supposed to be ASCII-only. If it has UTF-8 in it then it's an IRI (Internationalized Resource Identifier). SoupURI will accept IRIs on input, but always outputs URIs.
Comment 20 Alexandre Martani 2010-04-12 17:42:19 UTC
So, should we use libidn stringprep functions?

http://www.gnu.org/software/libidn/manual/libidn.html#Stringprep-Functions
Comment 21 Dan Winship 2010-04-12 18:12:02 UTC
No. You could use glib's (g_hostname_to_ascii(), etc) if you wanted.

Maybe it would be better if, rather than checking what "áéíóúâêîôûüöäÜÖÄß.example" actually normalizes to, you just verify that it normalizes to the same thing as "áéíóúâêîôûüöäüöäß.example" does
Comment 22 Alexandre Martani 2010-04-12 19:19:24 UTC
Created attachment 158523 [details] [review]
Patch against Epiphany 2.30.2

Using g_hostname_to_ascii. Note that this converts UTF8 hostnames to IDN encoding, what used to be done by webkit. Personally, after writing this patch, I think it is inconsistent of webkit to handle one thing and not another (handle IDN but not convert hostnames to lowercase).
Comment 23 Alexandre Martani 2010-04-12 19:24:36 UTC
Created attachment 158525 [details] [review]
Testcase

Change tests as recommended by Dan.
Comment 24 Alexandre Martani 2010-04-13 04:37:41 UTC
Created attachment 158571 [details] [review]
Testcase

Declare strings as const.
Comment 25 Diego Escalante Urrelo (not reading bugmail) 2010-04-15 03:41:07 UTC
Review of attachment 158523 [details] [review]:

Cool, thanks for this.

First comment: use git format-patch instead of only git diff. That way we can download the patch and do git am instead of patch -p#. Also it works better with git bz (http://blog.fishsoup.net/2008/11/16/git-bz-bugzilla-subcommand-for-git/) (althought that's in gnome's git nowadays I think).

Then, check out that crasher when soup_uri is NULL (you should handle that case gracefully)

::: embed/ephy-embed-utils.c
@@ +119,3 @@
+
+	/* Convert hostname to lowercase */
+	soup_uri = soup_uri_new (effective_address);

You need to check if soup_uri is not NULL here, when effective_address is about:blank, you get a crash.
In general, SoupURI must be checked for non NULL since that's its way to communicate it failed to be created.

@@ +122,3 @@
+	g_free (effective_address);
+
+	host = g_hostname_to_ascii (soup_uri->host);

Note that this removes trailing dots. I think that's ok and desirable though.
Comment 26 Diego Escalante Urrelo (not reading bugmail) 2010-04-15 03:41:10 UTC
Review of attachment 158523 [details] [review]:

Cool, thanks for this.

First comment: use git format-patch instead of only git diff. That way we can download the patch and do git am instead of patch -p#. Also it works better with git bz (http://blog.fishsoup.net/2008/11/16/git-bz-bugzilla-subcommand-for-git/) (althought that's in gnome's git nowadays I think).

Then, check out that crasher when soup_uri is NULL (you should handle that case gracefully)

::: embed/ephy-embed-utils.c
@@ +119,3 @@
+
+	/* Convert hostname to lowercase */
+	soup_uri = soup_uri_new (effective_address);

You need to check if soup_uri is not NULL here, when effective_address is about:blank, you get a crash.
In general, SoupURI must be checked for non NULL since that's its way to communicate it failed to be created.

@@ +122,3 @@
+	g_free (effective_address);
+
+	host = g_hostname_to_ascii (soup_uri->host);

Note that this removes trailing dots. I think that's ok and desirable though.
Comment 27 Diego Escalante Urrelo (not reading bugmail) 2010-04-15 03:48:07 UTC
Review of attachment 158571 [details] [review]:

Looks good, thanks for that extra test. We are almost done :-).

::: tests/testephyembedutils.c
@@ +72,3 @@
+  result2 = ephy_embed_utils_normalize_address (utf8_domain_lower);
+
+  g_assert_cmpstr (result1, ==, result2);

I think you want to confirm against utf8_domain_lower too. For example let's say normalize starts eating the last char in every string it returns. This would be TRUE, but still wrong.

I would say you want to compare:
result1 == utf8_domain_lower

then
result2 == utf8_domain_lower
Comment 28 Alexandre Martani 2010-04-15 18:16:08 UTC
(In reply to comment #27)
> I think you want to confirm against utf8_domain_lower too. For example let's
> say normalize starts eating the last char in every string it returns. This
> would be TRUE, but still wrong.
> 
> I would say you want to compare:
> result1 == utf8_domain_lower
> 
> then
> result2 == utf8_domain_lower

The problem is that now, g_hostname_to_ascii not only lowercase the domains, but also converts to IDN, so that the result is not anymore equal to the lowercase domain name. See comment 21.
Comment 29 Alexandre Martani 2010-04-19 03:54:25 UTC
Created attachment 159047 [details] [review]
Patch against Epiphany 2.30.2

Checks for soup_uri or soup_uri->hostname is NULL. Adds test for "about:" uri. Patch created using git format-patch.
Comment 30 Michael Catanzaro 2015-02-07 16:40:38 UTC
Looks like this got implemented at some point in the past.