GNOME Bugzilla – Bug 599562
Fails to load (and doesn't even try) some URLs
Last modified: 2010-07-05 22:27:22 UTC
Hi, epiphany with webkit/gtk 1.1.15.2 fails to load (and doesn't even try) some URLs. Example would be the GstPad link here: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/ Directly loading http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPad.html doesn't work either The address entry becomes empty immediately and nothing happens. This is an epiphany bug, not a webkit bug, because with Midori (and the same webkit version) it works. I have many other example URLs that don't work... if you need more examples just ask.
This is with epiphany 2.28.1 btw
Hm, seems to be caused by the adblock extension
I mean, it works when disabling the extension ;)
Yeah, I bet it is caused by the 'ad' in 'Pad'. To be honest, this extension needs to be rewritten to use the new Adblock Plus format, otherwise it'll be craziness.
It looks like the patterns are intended to be used case-insensitively. Specifically, the cause of random "...ad" matches looks to be the [^a-z\d=+%@] bit in the longest pattern; adding A-Z into that character class seems to restore sanity. (I just ran into this on a URL containing 'Lead' -- googling for it is sufficient)
I'm pretty sure I have fixed this in epiphany-extensions already. Can you confirm you still see it with epiphany-extensions from master?
I just tried again with master epiphany-extensions, and I am pretty sure 7043ca8689d5ebb96eeb1f7a87f3aa8e862a84dd fixes this problem. Can you check that your /usr/share/epiphany-extensions/adblock-patterns file is up-to-date with that?
Yes, I've just confirmed the version I'm using matches that in master. Your change adds a [^/] before the 'ads' part... ignoring most of the optional subgroups, that leaves the pattern as: [^a-z\d=+%@](?!\d{5,})[^/]ads?\d*(?!\.org)[\W_](?!\w+\.(ac\.|edu)|astra|aware|adurl=|block|login|nl/|sears/|.*(&sbc|\.(wmv|rm))) further ignoring the negative lookaheads: [^a-z\d=+%@][^/]ads?\d*[\W_] Which will match an uppercase letter, followed by anything that isn't '/', followed by 'ad' or 'ads', followed by optional digits, and a non-alphanumeric. So while it will no longer match 'Pad', it will match 'Lead'. It'll also match a numer of other valid-seeming URLs: http://example.org/sad.html Looking closer, your change appears likely to have neutered this rule --- the rule appears to be trying to match various specific names of likely ad sources immediately following a separator character; your rule now requires it to find such matches only following a sequence of a separator character (or uppercase letter) and a non-slash. For example, it no longer matches http://example.org/ads/evil.png, which it certainly seems to intend to. So, I now believe that the rule should be reverted to its state prior to 7043ca8, and then either: A-Z should be added to the negative character class (presumably for all four rules that use a-z) or, better, I think that line 74 of adblock-pattern.c should change from: regex = g_regex_new (line, G_REGEX_OPTIMIZE, 0, &error); to: regex = g_regex_new (line, G_REGEX_CASELESS | G_REGEX_OPTIMIZE, 0, &error); I'm happy to attach a patch if that would be preferable.
It doesn't block Lead for me. Interesting. Well, I think your proposal makes sense, a patch would be most welcome.
Created attachment 154304 [details] [review] Treat patterns case-insensitively
Review of attachment 154304 [details] [review]: Looks good, but I'd be more confident in having the patch committed after more people have tested it a bit. Sebastian? =)
I've been bitten by this repeatedly; I'll give the patch a test drive.
Yup, this patch seems to work fine, and ads still seem to be blocked as they should.
*** Bug 612016 has been marked as a duplicate of this bug. ***
Comment on attachment 154304 [details] [review] Treat patterns case-insensitively Pushed as ff90315c15dc7e2711a36db10ed296bae7b4fbaf. Thanks!
*** Bug 599407 has been marked as a duplicate of this bug. ***