GNOME Bugzilla – Bug 681085
SoupTLD not correctly detecting the public TLDs
Last modified: 2012-08-07 10:12:32 UTC
The patch for https://bugzilla.gnome.org/show_bug.cgi?id=679230 basically broke the is_public_suffix() method and our tests didn't detect that. So we basically have to fix this and also improve the unit tests in order to allow us to catch those regressions.
Created attachment 220164 [details] [review] Patch
Comment on attachment 220164 [details] [review] Patch >Both soup_tld_is_public_suffix() and soup_tld_get_base_domain() were not >working properly after cbae89f4 "working properly" according to what definition? In particular, why is it that you want them to behave differently from the mozilla reference implementation? >used was not cheking for exact matches in the case of is_public_suffix() ^ typo
(In reply to comment #2) > (From update of attachment 220164 [details] [review]) > >Both soup_tld_is_public_suffix() and soup_tld_get_base_domain() were not > >working properly after cbae89f4 > > "working properly" according to what definition? In particular, why is it that > you want them to behave differently from the mozilla reference implementation? Basically I asked Sergio yesterday why soup_tld_is_public_suffix(".com") was returning FALSE, assuming it was broken. He seemed to agree with this. If this is working as specified how are we meant to use the API to figure out that (simple?) question. Maybe it's just the documentation being very unclear, but it seemed to me that call should return TRUE.
(In reply to comment #2) > (From update of attachment 220164 [details] [review]) > >Both soup_tld_is_public_suffix() and soup_tld_get_base_domain() were not > >working properly after cbae89f4 > > "working properly" according to what definition? In particular, why is it that > you want them to behave differently from the mozilla reference implementation? According to our definition :). I meant, as Xan said things like (".com") are clearly public suffixes, the original implementation was returning TRUE for it and after the change I mentioned it started to return FALSE. Thing is that they were working as expected in the original code, then we regressed them but didn't notice because the tests were not good enough, and weren't able to detect such regressions. Regarding the Mozilla implementation, the thing is that they were always different because they don't even have the same API as we have. Mozilla provides a GetPublicSuffix() method instead of a is_public_suffix() as we have, which does not exactly the same. On the other hand that method does not make sense on its own, in the sense that it is only meant to be used by their cookie service, which is the one that performs some additional checks in order to know if they can set a cookie for a particular domain or not.
(In reply to comment #3) > Basically I asked Sergio yesterday why soup_tld_is_public_suffix(".com") was > returning FALSE, assuming it was broken. He seemed to agree with this. ah, yeah, that's definitely a bug But it seems weird to say that the base domain of "a.b.example.local" is "example.local". Why not "local"? Or "b.example.local"? That seems completely arbitrary. I think for things that aren't in the TLDs list, we should stick with the Mozilla behavior of saying that they have no base domain.
(In reply to comment #5) > But it seems weird to say that the base domain of "a.b.example.local" is > "example.local". Why not "local"? Or "b.example.local"? That seems completely > arbitrary. I think for things that aren't in the TLDs list, we should stick > with the Mozilla behavior of saying that they have no base domain. Do not let tests mislead you Dan ;), the tests cases I used for our unit tests do not come from Mozilla code but from the publicsuffix.org website. They're just published there to help implementors. Back to the point, in this case, I think we should not base our decision in the Mozilla implementation. Thing is that Mozilla behavior is exactly what this last patch does. Basically the base domain is defined in the code as the tld (even if it isn't public) plus an additional domain. Furthermore, the Mozilla implementation will return "local" as the public suffix for "a.local" even if "local" is not listed. What you said makes much more sense to me, deciding what's the base domain of a local url is absolutely arbitrary, so I'd say that we should keep the is_public_domain() fix and return NULL for get_base_domain() of local url's as you mentioned. We might add some comments to the docs in order to make that explicit. Does it make sense?
Created attachment 220427 [details] [review] Patch
(In reply to comment #6) > Thing is that Mozilla behavior is exactly what this > last patch does. Basically the base domain is defined in the code as the tld > (even if it isn't public) plus an additional domain. Furthermore, the Mozilla > implementation will return "local" as the public suffix for "a.local" even if > "local" is not listed. Hm... I suppose the idea is that as ICANN adds more pointless TLDs, we shouldn't *completely* lose. OK, keep the patch as it is currently. Beware that I just updated effective_tld_names.dat, so be sure to run tld-test again after rebasing to make sure it still works with the current data.
Pushed to master 82084a0