After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 778578 - Snapshots not cleared with history
Snapshots not cleared with history
Status: RESOLVED FIXED
Product: epiphany
Classification: Core
Component: History
3.23.x
Other Linux
: High major
: ---
Assigned To: Michael Catanzaro
Epiphany Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-02-14 02:35 UTC by Michael Catanzaro
Modified: 2018-01-12 15:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Drop gnome-desktop dependency by copying thumbnail factory bits (36.29 KB, patch)
2018-01-06 02:20 UTC, Michael Catanzaro
none Details | Review
Drop gnome-desktop dependency by copying thumbnail factory bits (36.29 KB, patch)
2018-01-06 23:02 UTC, Michael Catanzaro
committed Details | Review
Delete snapshot thumbnails when clearing history (5.52 KB, patch)
2018-01-06 23:02 UTC, Michael Catanzaro
committed Details | Review
Add migrator to delete desktop thumbnail database (2.59 KB, patch)
2018-01-06 23:02 UTC, Michael Catanzaro
committed Details | Review

Description Michael Catanzaro 2017-02-14 02:35:00 UTC
Even if the user clears all website data and history, snapshots of every webpage ever visited are left behind in ~/.cache/thumbnails. They're not even namespaced to epiphany, but jumbled in with thumbnails of all the images in every directory visited in nautilus, for example.

Fortunately it is just a cache, so it's perfectly legitimate for Epiphany to do a one-time deletion of everything in it. And the entire GnomeDesktopThumbnailFactory API is unstable, so I think it's fair game to change it so that thumbnails are namespaced on a per-application basis. As far as I know, only Epiphany and Nautilus are using this API, and neither expect to share thumbnails. Bastien, does this sound reasonable to you?

Note that there is a FIXME in thumbnail_failed_path() in gnome-desktop-thumbnail.c:

  /* XXX: appname is only used for failed thumbnails. Is this a mistake? */
  path = g_build_filename (g_get_user_cache_dir (),
                           "thumbnails",
                           "fail",
                           appname,
                           file,
                           NULL);

But appname is hardcoded to "gnome-desktop-thumbnail-factory", which is pointless.
Comment 1 Michael Catanzaro 2017-02-14 02:36:41 UTC
Then Epiphany would delete ~/.cache/epiphany/thumbnails whenever any history item is deleted, since unfortunately we don't have an association of thumbnails to history items. The downside is the overview would have all missing snapshots any time any unrelated history item is deleted. It would be even better to stop using GnomeDesktopThumbnailFactory and store thumbnails in the history DB instead.
Comment 2 Carlos Garcia Campos 2017-02-14 06:05:24 UTC
We use it in evince for the recent view too, and we don't expect them to be shared either. Maybe it's time to design a new API for thumbnails, and hopefully move it outside of libgnomedesktop.
Comment 3 Michael Catanzaro 2017-08-08 15:41:53 UTC
Same problem for the WebKit favicon database. I've moved it to ~/.cache/epiphany but we have no way to clear it currently. So we need to:

 * One-time clear ~/.cache/thumbnails as a migration, to ensure old thumbnails aren't leaked
 * Save future thumbnails under ~/.cache/epiphany instead
 * Expose some way to clear all of ~/.cache/epiphany in the UI
Comment 4 Michael Catanzaro 2018-01-06 02:15:19 UTC
(In reply to Carlos Garcia Campos from comment #2)
> We use it in evince for the recent view too, and we don't expect them to be
> shared either. Maybe it's time to design a new API for thumbnails, and
> hopefully move it outside of libgnomedesktop.

Mmm, I forgot about this. I just spent three hours bundling it into Epiphany. If evince only needs the same small subset of the code that Epiphany does, I suggest just copying it. I originally bundled the entire file, but after progressively deleting more and more unneeded stuff (that's why it took so long), I wound up with only a couple hundred lines that were needed.
Comment 5 Michael Catanzaro 2018-01-06 02:20:27 UTC
Created attachment 366404 [details] [review]
Drop gnome-desktop dependency by copying thumbnail factory bits

gnome-desktop is not supposed to be used by applications, that's why
it's not in the runtime. And we have an Epiphany-specific security
problem in the thumbnailer, which we cannot otherwise fix without adding
new gnome-desktop API that would only be useful for Epiphany.

The thumbnail factory is a lot of code, and we only need a small bit of
it.

In particular, get rid of the thumbnail mtime tracking code, which is
unnecessary now that we control the thumbnailing code. For the past
couple of years, it has only been used to convince the thumbnail factory
to not discard our requests for thumbnails.

Also, save the snapshots under ~/.cache/epiphany, so that we can delete
them. Right now, we store undeletable snapshots of all the websites you
view, which don't go away when clearing history (except, fortunately, in
incognito mode). Now we can fix that.

This commit also contains a minor memory leak fix, for good measure.
Comment 6 Michael Catanzaro 2018-01-06 02:25:21 UTC
Remaining work:

 * Write a migrator that just deletes ~/.cache/thumbnails
 * Delete ~/.cache/epiphany/thumbnails when clearing all history

Maybe we could also delete the thumbnail associated with a history entry when deleting individual history entries.
Comment 7 Michael Catanzaro 2018-01-06 02:28:11 UTC
(In reply to Michael Catanzaro from comment #3)
> Same problem for the WebKit favicon database.

I forgot about this too.

Maybe we need to instead:

>  * Expose some way to clear all of ~/.cache/epiphany in the UI

And the only sane way to do that would be to bring back the clear data dialog we had before WebKitWebsiteDataManager. That's been requested several times anyway, e.g. bug #782620. We don't have to get rid of the new clear data dialog, of course: we can find a way to have both.

We should probably additionally clear ~/.cache/epiphany when clearing all browser history, as a matter of course.
Comment 8 Michael Catanzaro 2018-01-06 22:29:45 UTC
(In reply to Michael Catanzaro from comment #7)
> And the only sane way to do that would be to bring back the clear data
> dialog we had before WebKitWebsiteDataManager. That's been requested several
> times anyway, e.g. bug #782620. We don't have to get rid of the new clear
> data dialog, of course: we can find a way to have both.
> 
> We should probably additionally clear ~/.cache/epiphany when clearing all
> browser history, as a matter of course.

No need for this... snapshots correspond 1:1 to a URL in the history database. We can clear them when clearing the associated history entry. That's no privacy issue, since the URL corresponding to the snapshot is guaranteed to be stored in the history database if the snapshot exists on disk.
Comment 9 Michael Catanzaro 2018-01-06 23:02:12 UTC
Created attachment 366436 [details] [review]
Drop gnome-desktop dependency by copying thumbnail factory bits

gnome-desktop is not supposed to be used by applications, that's why
it's not in the runtime. And we have an Epiphany-specific security
problem in the thumbnailer, which we cannot otherwise fix without adding
new gnome-desktop API that would only be useful for Epiphany.

The thumbnail factory is a lot of code, and we only need a small bit of
it.

In particular, get rid of the thumbnail mtime tracking code, which is
unnecessary now that we control the thumbnailing code. For the past
couple of years, it has only been used to convince the thumbnail factory
to not discard our requests for thumbnails.

Also, save the snapshots under ~/.cache/epiphany, so that we can delete
them. Right now, we store undeletable snapshots of all the websites you
view, which don't go away when clearing history (except, fortunately, in
incognito mode). Now we can fix that.

This commit also contains a minor memory leak fix, for good measure.
Comment 10 Michael Catanzaro 2018-01-06 23:02:16 UTC
Created attachment 366437 [details] [review]
Delete snapshot thumbnails when clearing history

Currently, you can clear your browser history and delete your entire
disk cache, but thumbnails of all the webpages you've visited will still
be stored on disk. Fix this by deleting thumbnails when the associated
page is removed from history.
Comment 11 Michael Catanzaro 2018-01-06 23:02:20 UTC
Created attachment 366438 [details] [review]
Add migrator to delete desktop thumbnail database
Comment 12 Michael Catanzaro 2018-01-08 17:59:31 UTC
Attachment 366436 [details] pushed as ec96718 - Drop gnome-desktop dependency by copying thumbnail factory bits
Attachment 366437 [details] pushed as 60ca7d5 - Delete snapshot thumbnails when clearing history
Attachment 366438 [details] pushed as 97cad40 - Add migrator to delete desktop thumbnail database
Comment 13 Bastien Nocera 2018-01-09 09:53:07 UTC
(In reply to Michael Catanzaro from comment #9)
> Created attachment 366436 [details] [review] [review]
> Drop gnome-desktop dependency by copying thumbnail factory bits
> 
> gnome-desktop is not supposed to be used by applications, that's why
> it's not in the runtime.

No, it's because the code in there is mostly bound to specific compositor and environment versions. Ideally, the thumbnailer code would live in GIO, or GTK+, but nobody has done that work.

Attachment 366438 [details] is completely incorrect. It's not your remit to delete people's thumbnails whether or not you think they leak privacy.
Comment 14 Michael Catanzaro 2018-01-09 15:36:26 UTC
How else do you propose we clean up the personal data leak? There is a large thumbnail of every single web page you have ever visited stored in that thumbnail cache, and there's no way to distinguish between thumbnails saved by Epiphany and other thumbnails. If they were namespaced, we could just delete the ones saved by Epiphany, but they're not.

Fortunately, the thumbnail cache is just a cache: everything in there should be regenerated when needed. So deleting it should be harmless, right?
Comment 15 Bastien Nocera 2018-01-09 15:44:45 UTC
(In reply to Michael Catanzaro from comment #14)
> How else do you propose we clean up the personal data leak?

Go through the URLs in the history, generate the thumbnail name, and remove that file.

> There is a large
> thumbnail of every single web page you have ever visited stored in that
> thumbnail cache, and there's no way to distinguish between thumbnails saved
> by Epiphany and other thumbnails. If they were namespaced, we could just
> delete the ones saved by Epiphany, but they're not.

They're not, on purpose. It's a shared directory so that applications and file managers can generate and use those thumbnails.

> Fortunately, the thumbnail cache is just a cache: everything in there should
> be regenerated when needed. So deleting it should be harmless, right?

It's not up to a single application to make that call. I think the right way to fix this would be to add a dialogue to empty the shared thumbnail repository in g-c-c, certainly not for one application to consider all thumbnails to be removable.

You don't know how those thumbnails were created, and whether you'd be able to regenerate them.
Comment 16 Michael Catanzaro 2018-01-09 15:53:27 UTC
(In reply to Bastien Nocera from comment #15)
> Go through the URLs in the history, generate the thumbnail name, and remove
> that file.

Good idea, that would be possible.

Still, IMO any application that's not prepared for ~/.cache to disappear is broken....
Comment 17 Claudio Saavedra 2018-01-09 16:01:21 UTC
I agree with Bastien here, one application shouldn't arbitrarily delete all thumbnails. That's a user decision.

Also I might remember wrong or this might have changed, but I don't think we used to create thumbnails for every single website visited. Only for those popping up in the overview.

GNOME should indeed have some UI to remove thumbnails. Maybe as part of a feature to clean up disk space by cleaning the cache, don't know. But that's a different story.
Comment 18 Claudio Saavedra 2018-01-09 16:04:42 UTC
(I think gpoo had written a script to delete thumbnails for files no longer available. But that was ages ago.)
Comment 19 Bastien Nocera 2018-01-09 16:09:58 UTC
That's the privacy panel bug:
https://bugzilla.gnome.org/show_bug.cgi?id=723905

I think advertising that epiphany used to store some oft-visited sites thumbnails in a shared repo and for users to nuke their thumbnail cache would be the best way forward, whether or not UI is added to g-c-c.
Comment 20 Michael Catanzaro 2018-01-09 16:28:52 UTC
We definitely store thumbnails for every webpage visited, not just popular pages. It's needed to make the overview work. Otherwise, when you delete pages from the overview, the new page that appears would never have a thumbnail.

Anyway, we could go through history and delete all the thumbnails found there, but then we still leak all your old thumbnails for any pages you visited before the last time you cleared your history. And we'd also leak thumbnails for every page you've manually deleted from history. IMO that is not acceptable. It's a one-time problem, only affects data under ~/.cache, and won't ever happen again going forward because our thumbnails will from now on be namespaced under ~/.cache/epiphany.

But we do store the URI in the thumbnail, under the key tEXt::Thumb::URI. So we could, for instance, delete just all the thumbnails with http:// and https:// URIs.
Comment 21 Michael Catanzaro 2018-01-09 16:32:38 UTC
(In reply to Bastien Nocera from comment #19)
> I think advertising that epiphany used to store some oft-visited sites
> thumbnails in a shared repo and for users to nuke their thumbnail cache
> would be the best way forward, whether or not UI is added to g-c-c.

This is weird, why would you want to advertise the existence of the thumbnail cache to users, when we can just do the right thing automatically?

Do you know of any non-Epiphany application that is unable to gracefully handle deletion of the thumbnail cache?
Comment 22 Bastien Nocera 2018-01-09 16:47:13 UTC
What's weird is that you think that removing a *shared* repository of cached data is within a *single* application's purview.

I don't share my computer, and I don't care if the visited website thumbnails. I do care that it's going to take hours of CPU time to regenerate thumbnails for files on my NAS, whether it's storing successful or failed thumbnails.

Somebody made a mistake storing application-specific data in a shared repository, I don't think that using the big hammer within your application is the right way.

Security/privacy conscious users are more likely to want the thumbnail repository to be purged on a regular basis. That functionality has its place desktop-wide, not within an application, without any feedback or questions.
Comment 23 Germán Poo-Caamaño 2018-01-09 18:29:04 UTC
(In reply to Claudio Saavedra from comment #18)
> (I think gpoo had written a script to delete thumbnails for files no longer
> available. But that was ages ago.)

https://calcifer.org/notes/2006/08/how-much-space-are-eating-your-thumbnails.html

The code lives in https://github.com/gpoo/wasteland and requires to update the path from:

        rootdir = os.path.join(homedir, '.thumbnails')
to:
        rootdir = os.path.join(homedir, '.cache', 'thumbnails')

The UI is totally unnecessary, though.
Comment 24 Michael Catanzaro 2018-01-09 18:44:08 UTC
(In reply to Claudio Saavedra from comment #18)
> (I think gpoo had written a script to delete thumbnails for files no longer
> available. But that was ages ago.)

I don't see how this could be useful to us.
Comment 25 Claudio Saavedra 2018-01-10 08:46:35 UTC
(In reply to Michael Catanzaro from comment #20)
> We definitely store thumbnails for every webpage visited, not just popular
> pages. It's needed to make the overview work. Otherwise, when you delete
> pages from the overview, the new page that appears would never have a
> thumbnail.

This is wasteful. Having a blank thumbnail temporarily, until the page in question is visited again, is much better than thumbnailing every single page visited.

(In reply to Michael Catanzaro from comment #21)
> 
> Do you know of any non-Epiphany application that is unable to gracefully
> handle deletion of the thumbnail cache?

Do you understand that this is not a technical problem but one of domain? An app shouldn't unilaterally delete other applications' cache.


Removing only the thumbnails with a http/https Thumb::URI should be enough.
Comment 26 Michael Catanzaro 2018-01-10 16:07:46 UTC
(In reply to Claudio Saavedra from comment #25)
> This is wasteful. Having a blank thumbnail temporarily, until the page in
> question is visited again, is much better than thumbnailing every single
> page visited.

Extremely wasteful, yes. I didn't write the code. ;)

Best solution would be to heuristically thumbnail pages when we think they'll enter the top 20 or so, that way we don't have to save PNGs for every webpage ever visited in the history database.

> (In reply to Michael Catanzaro from comment #21) 
> Do you understand that this is not a technical problem but one of domain? An
> app shouldn't unilaterally delete other applications' cache.

I believe it's well-understood that applications should be prepared for everything saved under ~/.cache to be deleted externally. But I'd still delete the cache even if that wasn't the case, because this is a severe privacy leak and the harm of leaving the cache intact very clearly outweighs the harm of deleting other applications' data. But anyway:

> Removing only the thumbnails with a http/https Thumb::URI should be enough.

Yes, let's do this. No doubt this will still remove other application's thumbnails, and it won't remove all of the Epiphany thumbnails, but it will be plenty good enough, and it will at least avoid deleting the entire thumbnail cache.
Comment 27 Bastien Nocera 2018-01-10 20:43:43 UTC
(In reply to Michael Catanzaro from comment #26)
<snip>
> > (In reply to Michael Catanzaro from comment #21) 
> > Do you understand that this is not a technical problem but one of domain? An
> > app shouldn't unilaterally delete other applications' cache.
> 
> I believe it's well-understood that applications should be prepared for
> everything saved under ~/.cache to be deleted externally. But I'd still
> delete the cache even if that wasn't the case, because this is a severe
> privacy leak and the harm of leaving the cache intact very clearly outweighs
> the harm of deleting other applications' data. But anyway:

Then do a blog post about it, and explain how those can be identified. Trying to retrospectively hide that bug by nuking my 500 megs of thumbnail cache isn't going to do anyone any favours.

Being able to gracefully handle missing thumbnails and recreating them doesn't mean that there won't be data loss, or extra CPU and I/O resources used up.

> > Removing only the thumbnails with a http/https Thumb::URI should be enough.
> 
> Yes, let's do this. No doubt this will still remove other application's
> thumbnails, and it won't remove all of the Epiphany thumbnails, but it will
> be plenty good enough, and it will at least avoid deleting the entire
> thumbnail cache.

It will remove all of the thumbnails created by totem for the majority of its integrated services (YouTube, UPnP, movie trailers, etc.).
Comment 28 Michael Catanzaro 2018-01-10 21:55:18 UTC
I filed bug #792414 to improve the thumbnailing strategy.

(In reply to Bastien Nocera from comment #27)
> It will remove all of the thumbnails created by totem for the majority of
> its integrated services (YouTube, UPnP, movie trailers, etc.).

OK, there's really no way to even guess at which are created by Epiphany, then, and I insist that Epiphany must automatically clean up all that it might have created, rather than expect users to read my blog. I apologize for this problem -- I know it's Epiphany's fault -- but having a clean cache is not the end of the world, and seems to be the only solution. I presume that totem is capable of regenerating thumbnails when required.
Comment 29 Bastien Nocera 2018-01-11 10:52:28 UTC
(In reply to Michael Catanzaro from comment #28)
> I filed bug #792414 to improve the thumbnailing strategy.
> 
> (In reply to Bastien Nocera from comment #27)
> > It will remove all of the thumbnails created by totem for the majority of
> > its integrated services (YouTube, UPnP, movie trailers, etc.).
> 
> OK, there's really no way to even guess at which are created by Epiphany,
> then, and I insist that Epiphany must automatically clean up all that it
> might have created, rather than expect users to read my blog. I apologize
> for this problem -- I know it's Epiphany's fault -- but having a clean cache
> is not the end of the world, and seems to be the only solution. I presume
> that totem is capable of regenerating thumbnails when required.

You're deleting user files in a directory shared between all of the user's applications without asking them. I still don't see how this can ever be a good thing to do.

It doesn't even put the files in the trash, it deletes them.
Comment 30 Bastien Nocera 2018-01-11 10:56:43 UTC
The work-around would be to run:
echo 26 > ~/.config/epiphany/.migrated
just before running a version of epiphany with this "feature". I hope I don't forget.
Comment 31 Claudio Saavedra 2018-01-11 11:34:43 UTC
No, you can't make this kind of decision. Please revert.

I don't know if there's any formal procedure to do such a change but I presume that you will need approval from the release team to do something like this.
Comment 32 Michael Catanzaro 2018-01-12 14:38:47 UTC
I've gotten more opinions, and everyone except me seems to think the cache should not be removed. So I'll revert the commit that added the migrator. I think that's a shame, because it will leak a huge amount of browsing history, but oh well.

For historical interest: originally only the pages that were actually in the overview received snapshots. That changed by accident in https://git.gnome.org/browse/epiphany/commit/?id=0433ac9dafd91664134ae1c300c6841e86a58d91; looks like that was just an oversight in a big refactor.
Comment 33 Germán Poo-Caamaño 2018-01-12 14:45:50 UTC
I wonder how this is a fix. It seems you only see black and white ("my solution or nothing").
Comment 34 Michael Catanzaro 2018-01-12 15:10:08 UTC
(In reply to Germán Poo-Caamaño from comment #33)
> I wonder how this is a fix. It seems you only see black and white ("my
> solution or nothing").

German, there is no way to know which thumbnails were generated by Ephy. We don't have a ton of options here. Proposed compromises were:

 * Delete thumbs with http:// and https:/// URIs in the metadata. This will delete Totem's thumbs, Bastien doesn't like it.
 * Delete all thumbnails corresponding to Ephy history. This sounds good, but it leaks thumbs for all history that has been deleted by the user, so it only deletes all the thumbs that Epiphany does not actually need to delete... it fails to delete all of the thumbs that actually should be deleted. (Storing thumbs for pages that are still in history is unproblematic.)

If you have a better idea, that would be great, but it looks like there is no good option other than 'rm -rf ~/.cache/thumbnails'.
Comment 35 Bastien Nocera 2018-01-12 15:16:21 UTC
- Tell the user why we want to delete a bunch of thumbnails and ask if we should do it.
Comment 36 Michael Catanzaro 2018-01-12 15:29:46 UTC
Then the user is confused: what is a cache? What does it look like? Should I delete it? What are the consequences of deleting it? I don't want my computer to break!

Let's pick for the user: either we should, or should not, delete the cache. After all, if something bad happens after automatically deleting the cache (CPU churns regenerating thumbnails, or applications fail to regenerate thumbnails altogether), then why should we allow users to choose to trigger that breakage? Either it's safe to do (as I suggest), or it's not, and consensus here seems to be not.

Anyway, it's at least fixed going forward.
Comment 37 Bastien Nocera 2018-01-12 15:56:24 UTC
In which case I think you shouldn't. Get a CVE assigned to this problem, advertise the problem and possible solutions, and let security conscious users fix it themselves.

In most cases, this would only be a problem for shared accounts, and on systems where you'd expect the storage to also play a part (the data is probably recoverable for forensic analysis, so unlink'ing it isn't enough to protect people against that).