GNOME Bugzilla – Bug 612078
Epiphany does not properly lowercase host names
Last modified: 2015-02-07 16:40:38 UTC
Epiphany does not convert URLs like http://PLANET.GNOME.ORG/ into the proper lowercase spelling (hostname only, of course). The result is a) messed up history and b) ugly. :)
Created attachment 158087 [details] [review] Patch against Epiphany 2.30 Note that this is my first patch on a serious C project, sorry for mistakes.
Review of attachment 158087 [details] [review]: Overall looks good to me. I do wonder if we shouldn't fix this in libsoup instead, though. I would like to have danw's opinion. ::: embed/ephy-embed-utils.c @@ +118,3 @@ + host = g_ascii_strdown (soup_uri->host, -1); + soup_uri_set_host (soup_uri, host); + g_free (effective_address); I would suggest freeing the resources right after their last usage, and add some blank lines. It makes the code more readable, some times.
What do you think, Dan?
hmmm... on second thought, fixing this in libsoup wouldn't do us any good, would it? So perhaps we should go ahead with this.
I think it might break some non http(s) urls, especifically javascript: links. I am writing a test case for this.
yeah, i don't think it makes sense to do it in libsoup; the uppercase version is still correct from a network point-of-view, it's just ugly for display/history
javascript: links and bookmarklets still work with this patch, which means these address doesn't reach this code. But I don't know if there is any other scheme which could get corrupted.
Created attachment 158252 [details] [review] Test I have never used glib test framework before, I don't know if there is some way to make it cleaner. My initial patch passes it.
Review of attachment 158252 [details] [review]: Nice! Our first contributed test! :-). As a side project (after this bug) wouldn't you like to expand this test to cover more API of embed utils? blink-blink. It looks good overall, perhaps someone has comments about what urls to test. ::: tests/testephyembedutils.c @@ +33,3 @@ + "http://www.example.com/path/to/some/file.php", + "www.example.com", + "exemple.com/path/", Typo @@ +41,3 @@ + "http://www.example.com/path/to/some/file.php", + "http://www.example.com/", + "http://exemple.com/path/", Typo
Created attachment 158331 [details] [review] Test Fixed typo and added url to test lowercase. I'll look at the other embed utils API after this :)
Created attachment 158342 [details] [review] Patch against Epiphany 2.30.2 Applying suggestion from Gustavo Noronha.
Alexandre, for the records you can put later unit tests work here: bug #613756. If that one fills too much maybe we can open another one, but for now that bug works fine.
Review of attachment 158342 [details] [review]: Two comments. ::: embed/ephy-embed-utils.c @@ +29,2 @@ #include <glib/gi18n.h> +#include <gio/gio.h> This is a mistake I guess, I don't see any use of gio stuff. @@ +123,3 @@ + g_free (effective_address); + + host = g_ascii_strdown (soup_uri->host, -1); AFAIK UTF8 is allowed in domain names. So "ñu.cl" would be unpredictable here, for example.
Review of attachment 158331 [details] [review]: Take into account UTF8 domains, it's possible to register ágape.es as a domain name in some TLDs. Even further, I think you could throw some intentionally broken stuff in there, stuff that Soup doesn't recognize so that you make sure we are handling that case also (for example "HELLO HELLO"). ::: tests/testephyembedutils.c @@ +38,3 @@ + "WwW.ExAmPlE.CoM/Path/Is/Case/Sensitive.HTml", + "" + }; You need UTF8 examples here. Same below.
UTF8 domains should be handled as rfc3490 [1]. Currently, Epiphany does this. I don't know if this is done by webkit. With my patch, UTF-8 domains got altered by libsoup (I think it encodes it like GET parameters): "áéíóúâêîôûüöäÜÖÄß.example" becomes "http://%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA%C3%A2%C3%AA%C3%AE%C3%B4%C3%BB%C3%BC%C3%B6%C3%A4%C3%9C%C3%96%C3%84%C3%9F.example/" It still works, although I don't know if it is the right behavior. Should I do the test anyway, using this encoded string as an expected output? [1] http://www.ietf.org/rfc/rfc3490.txt
Created attachment 158457 [details] [review] Patch against Epiphany 2.30.2 Removed gio.h include and changed lowercase function to g_utf8_strdown.
Created attachment 158458 [details] [review] Testcase Test UTF-8 domain name.
Mmmm. Not sure now about the utf8 thing. I think you are right. Dan or Xan should know. Try to ping them on #epiphany tomorrow.
Yeah, URIs are supposed to be ASCII-only. If it has UTF-8 in it then it's an IRI (Internationalized Resource Identifier). SoupURI will accept IRIs on input, but always outputs URIs.
So, should we use libidn stringprep functions? http://www.gnu.org/software/libidn/manual/libidn.html#Stringprep-Functions
No. You could use glib's (g_hostname_to_ascii(), etc) if you wanted. Maybe it would be better if, rather than checking what "áéíóúâêîôûüöäÜÖÄß.example" actually normalizes to, you just verify that it normalizes to the same thing as "áéíóúâêîôûüöäüöäß.example" does
Created attachment 158523 [details] [review] Patch against Epiphany 2.30.2 Using g_hostname_to_ascii. Note that this converts UTF8 hostnames to IDN encoding, what used to be done by webkit. Personally, after writing this patch, I think it is inconsistent of webkit to handle one thing and not another (handle IDN but not convert hostnames to lowercase).
Created attachment 158525 [details] [review] Testcase Change tests as recommended by Dan.
Created attachment 158571 [details] [review] Testcase Declare strings as const.
Review of attachment 158523 [details] [review]: Cool, thanks for this. First comment: use git format-patch instead of only git diff. That way we can download the patch and do git am instead of patch -p#. Also it works better with git bz (http://blog.fishsoup.net/2008/11/16/git-bz-bugzilla-subcommand-for-git/) (althought that's in gnome's git nowadays I think). Then, check out that crasher when soup_uri is NULL (you should handle that case gracefully) ::: embed/ephy-embed-utils.c @@ +119,3 @@ + + /* Convert hostname to lowercase */ + soup_uri = soup_uri_new (effective_address); You need to check if soup_uri is not NULL here, when effective_address is about:blank, you get a crash. In general, SoupURI must be checked for non NULL since that's its way to communicate it failed to be created. @@ +122,3 @@ + g_free (effective_address); + + host = g_hostname_to_ascii (soup_uri->host); Note that this removes trailing dots. I think that's ok and desirable though.
Review of attachment 158571 [details] [review]: Looks good, thanks for that extra test. We are almost done :-). ::: tests/testephyembedutils.c @@ +72,3 @@ + result2 = ephy_embed_utils_normalize_address (utf8_domain_lower); + + g_assert_cmpstr (result1, ==, result2); I think you want to confirm against utf8_domain_lower too. For example let's say normalize starts eating the last char in every string it returns. This would be TRUE, but still wrong. I would say you want to compare: result1 == utf8_domain_lower then result2 == utf8_domain_lower
(In reply to comment #27) > I think you want to confirm against utf8_domain_lower too. For example let's > say normalize starts eating the last char in every string it returns. This > would be TRUE, but still wrong. > > I would say you want to compare: > result1 == utf8_domain_lower > > then > result2 == utf8_domain_lower The problem is that now, g_hostname_to_ascii not only lowercase the domains, but also converts to IDN, so that the result is not anymore equal to the lowercase domain name. See comment 21.
Created attachment 159047 [details] [review] Patch against Epiphany 2.30.2 Checks for soup_uri or soup_uri->hostname is NULL. Adds test for "about:" uri. Patch created using git format-patch.
Looks like this got implemented at some point in the past.