After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 599562 - Fails to load (and doesn't even try) some URLs
Fails to load (and doesn't even try) some URLs
Status: RESOLVED FIXED
Product: epiphany
Classification: Core
Component: Backend
2.28.x
Other Linux
: Normal normal
: ---
Assigned To: Xan Lopez
Epiphany Maintainers
: 599407 612016 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2009-10-25 15:33 UTC by Sebastian Dröge (slomo)
Modified: 2010-07-05 22:27 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Treat patterns case-insensitively (3.88 KB, patch)
2010-02-21 04:27 UTC, Matthew Draper
committed Details | Review

Description Sebastian Dröge (slomo) 2009-10-25 15:33:27 UTC
Hi,
epiphany with webkit/gtk 1.1.15.2 fails to load (and doesn't even try) some URLs. Example would be the GstPad link here: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/

Directly loading http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPad.html doesn't work either

The address entry becomes empty immediately and nothing happens.

This is an epiphany bug, not a webkit bug, because with Midori (and the same webkit version) it works.

I have many other example URLs that don't work... if you need more examples just ask.
Comment 1 Sebastian Dröge (slomo) 2009-10-25 15:33:50 UTC
This is with epiphany 2.28.1 btw
Comment 2 Sebastian Dröge (slomo) 2009-10-25 20:07:03 UTC
Hm, seems to be caused by the adblock extension
Comment 3 Sebastian Dröge (slomo) 2009-10-25 20:07:26 UTC
I mean, it works when disabling the extension ;)
Comment 4 Gustavo Noronha (kov) 2009-10-26 14:16:13 UTC
Yeah, I bet it is caused by the 'ad' in 'Pad'. To be honest, this extension needs to be rewritten to use the new Adblock Plus format, otherwise it'll be craziness.
Comment 5 Matthew Draper 2010-02-17 04:22:12 UTC
It looks like the patterns are intended to be used case-insensitively.

Specifically, the cause of random "...ad" matches looks to be the [^a-z\d=+%@] bit in the longest pattern; adding A-Z into that character class seems to restore sanity.

(I just ran into this on a URL containing 'Lead' -- googling for it is sufficient)
Comment 6 Gustavo Noronha (kov) 2010-02-18 17:41:10 UTC
I'm pretty sure I have fixed this in epiphany-extensions already. Can you confirm you still see it with epiphany-extensions from master?
Comment 7 Gustavo Noronha (kov) 2010-02-18 17:49:21 UTC
I just tried again with master epiphany-extensions, and I am pretty sure 7043ca8689d5ebb96eeb1f7a87f3aa8e862a84dd fixes this problem. Can you check that your /usr/share/epiphany-extensions/adblock-patterns file is up-to-date with that?
Comment 8 Matthew Draper 2010-02-19 01:59:54 UTC
Yes, I've just confirmed the version I'm using matches that in master.

Your change adds a [^/] before the 'ads' part... ignoring most of the optional subgroups, that leaves the pattern as:

[^a-z\d=+%@](?!\d{5,})[^/]ads?\d*(?!\.org)[\W_](?!\w+\.(ac\.|edu)|astra|aware|adurl=|block|login|nl/|sears/|.*(&sbc|\.(wmv|rm)))

further ignoring the negative lookaheads:

[^a-z\d=+%@][^/]ads?\d*[\W_]

Which will match an uppercase letter, followed by anything that isn't '/', followed by 'ad' or 'ads', followed by optional digits, and a non-alphanumeric.

So while it will no longer match 'Pad', it will match 'Lead'. It'll also match a numer of other valid-seeming URLs: http://example.org/sad.html


Looking closer, your change appears likely to have neutered this rule --- the rule appears to be trying to match various specific names of likely ad sources immediately following a separator character; your rule now requires it to find such matches only following a sequence of a separator character (or uppercase letter) and a non-slash.

For example, it no longer matches http://example.org/ads/evil.png, which it certainly seems to intend to.


So, I now believe that the rule should be reverted to its state prior to 7043ca8, and then either:

A-Z should be added to the negative character class (presumably for all four rules that use a-z)

or, better,

I think that line 74 of adblock-pattern.c should change from:
		regex = g_regex_new (line, G_REGEX_OPTIMIZE, 0, &error);
to:
		regex = g_regex_new (line, G_REGEX_CASELESS | G_REGEX_OPTIMIZE, 0, &error);


I'm happy to attach a patch if that would be preferable.
Comment 9 Gustavo Noronha (kov) 2010-02-19 03:00:58 UTC
It doesn't block Lead for me. Interesting. Well, I think your proposal makes sense, a patch would be most welcome.
Comment 10 Matthew Draper 2010-02-21 04:27:08 UTC
Created attachment 154304 [details] [review]
Treat patterns case-insensitively
Comment 11 Gustavo Noronha (kov) 2010-02-24 13:55:45 UTC
Review of attachment 154304 [details] [review]:

Looks good, but I'd be more confident in having the patch committed after more people have tested it a bit. Sebastian? =)
Comment 12 David Weinehall 2010-03-04 02:13:05 UTC
I've been bitten by this repeatedly; I'll give the patch a test drive.
Comment 13 David Weinehall 2010-03-04 02:27:30 UTC
Yup, this patch seems to work fine, and ads still seem to be blocked as they should.
Comment 14 Gustavo Noronha (kov) 2010-03-10 15:40:44 UTC
*** Bug 612016 has been marked as a duplicate of this bug. ***
Comment 15 Gustavo Noronha (kov) 2010-03-10 15:43:08 UTC
Comment on attachment 154304 [details] [review]
Treat patterns case-insensitively

Pushed as ff90315c15dc7e2711a36db10ed296bae7b4fbaf. Thanks!
Comment 16 Michael Kuhn 2010-07-05 22:27:22 UTC
*** Bug 599407 has been marked as a duplicate of this bug. ***