After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 776286 - Wasted download of adblock filter lists
Wasted download of adblock filter lists
Status: RESOLVED FIXED
Product: epiphany
Classification: Core
Component: General
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Epiphany Maintainers
Epiphany Maintainers
Depends on:
Blocks: 776514
 
 
Reported: 2016-12-19 16:32 UTC by Allison Karlitskaya (desrt)
Modified: 2017-01-23 19:38 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Only download adblock filters if adblocker is actually enabled (5.12 KB, patch)
2017-01-23 04:25 UTC, Michael Catanzaro
committed Details | Review

Description Allison Karlitskaya (desrt) 2016-12-19 16:32:49 UTC
I use epiphany only for webapps, and none of the webapps I use it with contain any advertisements, so I have no interest in the adblock filter list being downloaded or kept on my disk.

This is really two bugs, but they're related:

First: ephy-embed-shell.c contains this comment:

  /* The filters list is large, so we don't want to store a separate copy per
   * web app, but users should otherwise be able to configure different filters
   * per profile directory. */

but for some reason I end up with multiple copies of the filter (eg: /home/allison/.config/epiphany/app-epiphany-irccloud-1640676f13868a66001a01b118e91add4d711353/adblock/8567481f1721260913812bf14bdb073b).  From what I understand, that is not the intended behaviour.

Second: the adblock file is downloaded even if the preference is disabled.  We should skip the download if we will never end up using the file.
Comment 1 Allison Karlitskaya (desrt) 2016-12-19 16:33:26 UTC
Sorry for not mentioning: this is Debian testing, and the ephy version is 3.22.3-1.
Comment 2 Michael Catanzaro 2016-12-19 16:43:29 UTC
If running 3.23.2 or newer, you shouldn't have any filters under your web app profile dir, indeed. However, this behavior changed quite recently (bug #755379), so since you are running a stable release of Epiphany, you can expect to see those filters there. I decided not to backport this change to 3.22 because it would break anyone who has configured separate adblock filters for a web app.

> Second: the adblock file is downloaded even if the preference is disabled. 
> We should skip the download if we will never end up using the file.

Hm, that shouldn't happen regardless.
Comment 3 Michael Catanzaro 2016-12-19 17:05:36 UTC
(In reply to Michael Catanzaro from comment #2)
> Hm, that shouldn't happen regardless.

OK, it's broken in all versions.

The architecture of the code in 3.22, where filters are downloaded in the web process and then just race with each other to see which web process manages to save the file last, makes this awkward to fix there, and the extra file isn't really hurting anything, so I'd rather leave it be and fix only in master.

To fix this in master, we need to watch the setting in ephy-embed-shell and call ephy_embed_shell_update_adblock_filter_file if it ever toggles from off to on. We already have a time check there to ensure we won't download the file again and again if toggling the setting repeatedly, so no worry there. Then we need to add a D-Bus method to tell all the web processes to reload patterns using the newly-downloaded filters... similar to calling ephy_uri_tester_load again, except we'd need to add a way to tell it to perform the load async instead of sync to not block the web processes. None of this is hard, it's just a little work.
Comment 4 Michael Catanzaro 2017-01-23 04:24:48 UTC
(In reply to Michael Catanzaro from comment #3)
> Then we need to add a D-Bus method to tell all the web processes to reload
> patterns using the newly-downloaded filters...

No... just watching the setting suffices.
Comment 5 Michael Catanzaro 2017-01-23 04:24:57 UTC
The following fix has been pushed:
b360172 Only download adblock filters if adblocker is actually enabled
Comment 6 Michael Catanzaro 2017-01-23 04:25:02 UTC
Created attachment 344011 [details] [review]
Only download adblock filters if adblocker is actually enabled
Comment 7 Michael Catanzaro 2017-01-23 06:00:25 UTC
By the way, that trick that settings changed notifications don't work unless you read the setting after connecting to the signal... that's pretty nasty. :(
Comment 8 Carlos Garcia Campos 2017-01-23 06:55:10 UTC
Review of attachment 344011 [details] [review]:

I have some comments about other patches landed recently, since there are no bugs for all of them I'll comment here.

 * I don't understand the change to the default adblock filter url, if the file is exactly the same, changing the url is causing a download for everybody for now reason, and that the old filter list will be leaked probably.
 * I don't like the idea of reusing the filters setting for the easy privacy. Since we are adding the filter hardcoded, and it has nothing to do with ads, I would not use any setting, simply use the filter url hardcoded in the code.
 * In 2ec4cef600d57c79d8bacd27ecb6cdb24924fa4f. The uri is leaked in the warning message, g_file_get_uri returns a new allocated string. You are keeping a ref of the source file just to get the uri, why don't you keep the uri instead in the AdblockFilterRetrieveData struct?
 * In 2408bcfd8fe0663266512b2d5f1a2e809a775c8f. Wy do we need that? because you changed the default url? If we delete the filters when the setting changes we could know what to delete. I guess you can change the setting with ephy closed, though. In any case I don't think we want to do that cleanup task synchronously in the main thread. Yu are also leaking the result of g_file_get_path in that patch.

::: embed/ephy-embed-shell.c
@@ +864,3 @@
 
+  if (!g_settings_get_boolean (EPHY_SETTINGS_WEB, EPHY_PREFS_WEB_ENABLE_ADBLOCK)) {
+    ephy_embed_shell_remove_old_adblock_filters (shell, files);

I don't like this. Since we don't have a way to whitelist sites yet, sometimes, the only way I can access some contents is by disabling the adblocker -> realodng the page -> enabling adblocker. I don't want to remove and download the files again every time this happens. We should only remove the filters when removed from the filters setting.
Comment 9 Michael Catanzaro 2017-01-23 14:02:44 UTC
Thanks for the review.

(In reply to Carlos Garcia Campos from comment #8)
> Review of attachment 344011 [details] [review] [review]:
> 
> I have some comments about other patches landed recently, since there are no
> bugs for all of them I'll comment here.
> 
>  * I don't understand the change to the default adblock filter url, if the
> file is exactly the same, changing the url is causing a download for
> everybody for now reason, and that the old filter list will be leaked
> probably.

It's a one-time download. It provides very slightly newer filters, and parallels the location of the new EasyPrivacy filters, and we will want to make more filters from here (like the country-specific filters) available when we have a configuration UI. The old filter list will be deleted as it now cleans all files in the adblock directory unless the filename is the hash of a configured filter.

>  * I don't like the idea of reusing the filters setting for the easy
> privacy. Since we are adding the filter hardcoded, and it has nothing to do
> with ads, I would not use any setting, simply use the filter url hardcoded
> in the code.

I don't agree. There's no point in using a privacy blacklist unless you are also blocking ads, because ads track you. So the setting should be dependent on the adblock setting.

>  * In 2ec4cef600d57c79d8bacd27ecb6cdb24924fa4f. The uri is leaked in the
> warning message, g_file_get_uri returns a new allocated string. You are
> keeping a ref of the source file just to get the uri, why don't you keep the
> uri instead in the AdblockFilterRetrieveData struct?

Good catch, thanks!

All I need in the struct is the URI, so I'll change that.

>  * In 2408bcfd8fe0663266512b2d5f1a2e809a775c8f. Wy do we need that? because
> you changed the default url? If we delete the filters when the setting
> changes we could know what to delete. I guess you can change the setting
> with ephy closed, though. In any case I don't think we want to do that
> cleanup task synchronously in the main thread. Yu are also leaking the
> result of g_file_get_path in that patch.

When I wrote that patch yesterday it was OK to do the cleanup synchronously because it only occurred once when starting the browser. Now it is no longer OK, because it occurs when changing the setting. I'll try to change it.

Thanks for finding the leak. If only I had used Vala, then I wouldn't have to remember what return values I need to free. ;)

> ::: embed/ephy-embed-shell.c
> @@ +864,3 @@
>  
> +  if (!g_settings_get_boolean (EPHY_SETTINGS_WEB,
> EPHY_PREFS_WEB_ENABLE_ADBLOCK)) {
> +    ephy_embed_shell_remove_old_adblock_filters (shell, files);
> 
> I don't like this. Since we don't have a way to whitelist sites yet,
> sometimes, the only way I can access some contents is by disabling the
> adblocker -> realodng the page -> enabling adblocker. I don't want to remove
> and download the files again every time this happens. We should only remove
> the filters when removed from the filters setting.

OK, I'll change it.
Comment 10 Carlos Garcia Campos 2017-01-23 17:03:03 UTC
(In reply to Michael Catanzaro from comment #9)
> Thanks for the review.
> 
> (In reply to Carlos Garcia Campos from comment #8)
> > Review of attachment 344011 [details] [review] [review] [review]:
> > 
> > I have some comments about other patches landed recently, since there are no
> > bugs for all of them I'll comment here.
> > 
> >  * I don't understand the change to the default adblock filter url, if the
> > file is exactly the same, changing the url is causing a download for
> > everybody for now reason, and that the old filter list will be leaked
> > probably.
> 
> It's a one-time download. It provides very slightly newer filters, and
> parallels the location of the new EasyPrivacy filters, and we will want to
> make more filters from here (like the country-specific filters) available
> when we have a configuration UI. The old filter list will be deleted as it
> now cleans all files in the adblock directory unless the filename is the
> hash of a configured filter.
> 
> >  * I don't like the idea of reusing the filters setting for the easy
> > privacy. Since we are adding the filter hardcoded, and it has nothing to do
> > with ads, I would not use any setting, simply use the filter url hardcoded
> > in the code.
> 
> I don't agree. There's no point in using a privacy blacklist unless you are
> also blocking ads, because ads track you. So the setting should be dependent
> on the adblock setting.

But it doesn't depend on adblock setting, but on DNT setting, no? If you don't want to hardcode it, add a different setting for easy privacy filters, but using the same setting for filters, and a different one to enable/disable is a mess. Or did I misunderstand how it works?

> >  * In 2ec4cef600d57c79d8bacd27ecb6cdb24924fa4f. The uri is leaked in the
> > warning message, g_file_get_uri returns a new allocated string. You are
> > keeping a ref of the source file just to get the uri, why don't you keep the
> > uri instead in the AdblockFilterRetrieveData struct?
> 
> Good catch, thanks!
> 
> All I need in the struct is the URI, so I'll change that.
> 
> >  * In 2408bcfd8fe0663266512b2d5f1a2e809a775c8f. Wy do we need that? because
> > you changed the default url? If we delete the filters when the setting
> > changes we could know what to delete. I guess you can change the setting
> > with ephy closed, though. In any case I don't think we want to do that
> > cleanup task synchronously in the main thread. Yu are also leaking the
> > result of g_file_get_path in that patch.
> 
> When I wrote that patch yesterday it was OK to do the cleanup synchronously
> because it only occurred once when starting the browser. Now it is no longer
> OK, because it occurs when changing the setting. I'll try to change it.
> 
> Thanks for finding the leak. If only I had used Vala, then I wouldn't have
> to remember what return values I need to free. ;)
> 
> > ::: embed/ephy-embed-shell.c
> > @@ +864,3 @@
> >  
> > +  if (!g_settings_get_boolean (EPHY_SETTINGS_WEB,
> > EPHY_PREFS_WEB_ENABLE_ADBLOCK)) {
> > +    ephy_embed_shell_remove_old_adblock_filters (shell, files);
> > 
> > I don't like this. Since we don't have a way to whitelist sites yet,
> > sometimes, the only way I can access some contents is by disabling the
> > adblocker -> realodng the page -> enabling adblocker. I don't want to remove
> > and download the files again every time this happens. We should only remove
> > the filters when removed from the filters setting.
> 
> OK, I'll change it.
Comment 11 Michael Catanzaro 2017-01-23 19:37:06 UTC
(In reply to Carlos Garcia Campos from comment #10)
> But it doesn't depend on adblock setting, but on DNT setting, no? If you
> don't want to hardcode it, add a different setting for easy privacy filters,
> but using the same setting for filters, and a different one to
> enable/disable is a mess. Or did I misunderstand how it works?

It obviously depends on the enable-adblock setting, since that's the setting that controls whether the URI tester is used at all. And it obviously depends on the adblock-filters setting as well, since that setting controls the filters that are used. I think we definitely want to modify this setting rather than hardcoding the filter so that it appears properly in the future adblock configuration dialog. We just have to make sure they stay synced.

It does not actually depend on the DNT setting; however, enabling or disabling the DNT setting in the UI will cause the filters list to be modified accordingly. The goal is to have one setting in the UI to turn on all of the privacy features at once.

I know it's confusing and not really ideal, but this was the best I could come up with; do you have a better idea?
Comment 12 Michael Catanzaro 2017-01-23 19:38:36 UTC
In particular, I want to avoid having a setting named "tell websites I do not want to be tracked" that also just happens to cause tracking query removal.