GNOME Bugzilla – Bug 778578
Snapshots not cleared with history
Last modified: 2018-01-12 15:56:24 UTC
Even if the user clears all website data and history, snapshots of every webpage ever visited are left behind in ~/.cache/thumbnails. They're not even namespaced to epiphany, but jumbled in with thumbnails of all the images in every directory visited in nautilus, for example. Fortunately it is just a cache, so it's perfectly legitimate for Epiphany to do a one-time deletion of everything in it. And the entire GnomeDesktopThumbnailFactory API is unstable, so I think it's fair game to change it so that thumbnails are namespaced on a per-application basis. As far as I know, only Epiphany and Nautilus are using this API, and neither expect to share thumbnails. Bastien, does this sound reasonable to you? Note that there is a FIXME in thumbnail_failed_path() in gnome-desktop-thumbnail.c: /* XXX: appname is only used for failed thumbnails. Is this a mistake? */ path = g_build_filename (g_get_user_cache_dir (), "thumbnails", "fail", appname, file, NULL); But appname is hardcoded to "gnome-desktop-thumbnail-factory", which is pointless.
Then Epiphany would delete ~/.cache/epiphany/thumbnails whenever any history item is deleted, since unfortunately we don't have an association of thumbnails to history items. The downside is the overview would have all missing snapshots any time any unrelated history item is deleted. It would be even better to stop using GnomeDesktopThumbnailFactory and store thumbnails in the history DB instead.
We use it in evince for the recent view too, and we don't expect them to be shared either. Maybe it's time to design a new API for thumbnails, and hopefully move it outside of libgnomedesktop.
Same problem for the WebKit favicon database. I've moved it to ~/.cache/epiphany but we have no way to clear it currently. So we need to: * One-time clear ~/.cache/thumbnails as a migration, to ensure old thumbnails aren't leaked * Save future thumbnails under ~/.cache/epiphany instead * Expose some way to clear all of ~/.cache/epiphany in the UI
(In reply to Carlos Garcia Campos from comment #2) > We use it in evince for the recent view too, and we don't expect them to be > shared either. Maybe it's time to design a new API for thumbnails, and > hopefully move it outside of libgnomedesktop. Mmm, I forgot about this. I just spent three hours bundling it into Epiphany. If evince only needs the same small subset of the code that Epiphany does, I suggest just copying it. I originally bundled the entire file, but after progressively deleting more and more unneeded stuff (that's why it took so long), I wound up with only a couple hundred lines that were needed.
Created attachment 366404 [details] [review] Drop gnome-desktop dependency by copying thumbnail factory bits gnome-desktop is not supposed to be used by applications, that's why it's not in the runtime. And we have an Epiphany-specific security problem in the thumbnailer, which we cannot otherwise fix without adding new gnome-desktop API that would only be useful for Epiphany. The thumbnail factory is a lot of code, and we only need a small bit of it. In particular, get rid of the thumbnail mtime tracking code, which is unnecessary now that we control the thumbnailing code. For the past couple of years, it has only been used to convince the thumbnail factory to not discard our requests for thumbnails. Also, save the snapshots under ~/.cache/epiphany, so that we can delete them. Right now, we store undeletable snapshots of all the websites you view, which don't go away when clearing history (except, fortunately, in incognito mode). Now we can fix that. This commit also contains a minor memory leak fix, for good measure.
Remaining work: * Write a migrator that just deletes ~/.cache/thumbnails * Delete ~/.cache/epiphany/thumbnails when clearing all history Maybe we could also delete the thumbnail associated with a history entry when deleting individual history entries.
(In reply to Michael Catanzaro from comment #3) > Same problem for the WebKit favicon database. I forgot about this too. Maybe we need to instead: > * Expose some way to clear all of ~/.cache/epiphany in the UI And the only sane way to do that would be to bring back the clear data dialog we had before WebKitWebsiteDataManager. That's been requested several times anyway, e.g. bug #782620. We don't have to get rid of the new clear data dialog, of course: we can find a way to have both. We should probably additionally clear ~/.cache/epiphany when clearing all browser history, as a matter of course.
(In reply to Michael Catanzaro from comment #7) > And the only sane way to do that would be to bring back the clear data > dialog we had before WebKitWebsiteDataManager. That's been requested several > times anyway, e.g. bug #782620. We don't have to get rid of the new clear > data dialog, of course: we can find a way to have both. > > We should probably additionally clear ~/.cache/epiphany when clearing all > browser history, as a matter of course. No need for this... snapshots correspond 1:1 to a URL in the history database. We can clear them when clearing the associated history entry. That's no privacy issue, since the URL corresponding to the snapshot is guaranteed to be stored in the history database if the snapshot exists on disk.
Created attachment 366436 [details] [review] Drop gnome-desktop dependency by copying thumbnail factory bits gnome-desktop is not supposed to be used by applications, that's why it's not in the runtime. And we have an Epiphany-specific security problem in the thumbnailer, which we cannot otherwise fix without adding new gnome-desktop API that would only be useful for Epiphany. The thumbnail factory is a lot of code, and we only need a small bit of it. In particular, get rid of the thumbnail mtime tracking code, which is unnecessary now that we control the thumbnailing code. For the past couple of years, it has only been used to convince the thumbnail factory to not discard our requests for thumbnails. Also, save the snapshots under ~/.cache/epiphany, so that we can delete them. Right now, we store undeletable snapshots of all the websites you view, which don't go away when clearing history (except, fortunately, in incognito mode). Now we can fix that. This commit also contains a minor memory leak fix, for good measure.
Created attachment 366437 [details] [review] Delete snapshot thumbnails when clearing history Currently, you can clear your browser history and delete your entire disk cache, but thumbnails of all the webpages you've visited will still be stored on disk. Fix this by deleting thumbnails when the associated page is removed from history.
Created attachment 366438 [details] [review] Add migrator to delete desktop thumbnail database
Attachment 366436 [details] pushed as ec96718 - Drop gnome-desktop dependency by copying thumbnail factory bits Attachment 366437 [details] pushed as 60ca7d5 - Delete snapshot thumbnails when clearing history Attachment 366438 [details] pushed as 97cad40 - Add migrator to delete desktop thumbnail database
(In reply to Michael Catanzaro from comment #9) > Created attachment 366436 [details] [review] [review] > Drop gnome-desktop dependency by copying thumbnail factory bits > > gnome-desktop is not supposed to be used by applications, that's why > it's not in the runtime. No, it's because the code in there is mostly bound to specific compositor and environment versions. Ideally, the thumbnailer code would live in GIO, or GTK+, but nobody has done that work. Attachment 366438 [details] is completely incorrect. It's not your remit to delete people's thumbnails whether or not you think they leak privacy.
How else do you propose we clean up the personal data leak? There is a large thumbnail of every single web page you have ever visited stored in that thumbnail cache, and there's no way to distinguish between thumbnails saved by Epiphany and other thumbnails. If they were namespaced, we could just delete the ones saved by Epiphany, but they're not. Fortunately, the thumbnail cache is just a cache: everything in there should be regenerated when needed. So deleting it should be harmless, right?
(In reply to Michael Catanzaro from comment #14) > How else do you propose we clean up the personal data leak? Go through the URLs in the history, generate the thumbnail name, and remove that file. > There is a large > thumbnail of every single web page you have ever visited stored in that > thumbnail cache, and there's no way to distinguish between thumbnails saved > by Epiphany and other thumbnails. If they were namespaced, we could just > delete the ones saved by Epiphany, but they're not. They're not, on purpose. It's a shared directory so that applications and file managers can generate and use those thumbnails. > Fortunately, the thumbnail cache is just a cache: everything in there should > be regenerated when needed. So deleting it should be harmless, right? It's not up to a single application to make that call. I think the right way to fix this would be to add a dialogue to empty the shared thumbnail repository in g-c-c, certainly not for one application to consider all thumbnails to be removable. You don't know how those thumbnails were created, and whether you'd be able to regenerate them.
(In reply to Bastien Nocera from comment #15) > Go through the URLs in the history, generate the thumbnail name, and remove > that file. Good idea, that would be possible. Still, IMO any application that's not prepared for ~/.cache to disappear is broken....
I agree with Bastien here, one application shouldn't arbitrarily delete all thumbnails. That's a user decision. Also I might remember wrong or this might have changed, but I don't think we used to create thumbnails for every single website visited. Only for those popping up in the overview. GNOME should indeed have some UI to remove thumbnails. Maybe as part of a feature to clean up disk space by cleaning the cache, don't know. But that's a different story.
(I think gpoo had written a script to delete thumbnails for files no longer available. But that was ages ago.)
That's the privacy panel bug: https://bugzilla.gnome.org/show_bug.cgi?id=723905 I think advertising that epiphany used to store some oft-visited sites thumbnails in a shared repo and for users to nuke their thumbnail cache would be the best way forward, whether or not UI is added to g-c-c.
We definitely store thumbnails for every webpage visited, not just popular pages. It's needed to make the overview work. Otherwise, when you delete pages from the overview, the new page that appears would never have a thumbnail. Anyway, we could go through history and delete all the thumbnails found there, but then we still leak all your old thumbnails for any pages you visited before the last time you cleared your history. And we'd also leak thumbnails for every page you've manually deleted from history. IMO that is not acceptable. It's a one-time problem, only affects data under ~/.cache, and won't ever happen again going forward because our thumbnails will from now on be namespaced under ~/.cache/epiphany. But we do store the URI in the thumbnail, under the key tEXt::Thumb::URI. So we could, for instance, delete just all the thumbnails with http:// and https:// URIs.
(In reply to Bastien Nocera from comment #19) > I think advertising that epiphany used to store some oft-visited sites > thumbnails in a shared repo and for users to nuke their thumbnail cache > would be the best way forward, whether or not UI is added to g-c-c. This is weird, why would you want to advertise the existence of the thumbnail cache to users, when we can just do the right thing automatically? Do you know of any non-Epiphany application that is unable to gracefully handle deletion of the thumbnail cache?
What's weird is that you think that removing a *shared* repository of cached data is within a *single* application's purview. I don't share my computer, and I don't care if the visited website thumbnails. I do care that it's going to take hours of CPU time to regenerate thumbnails for files on my NAS, whether it's storing successful or failed thumbnails. Somebody made a mistake storing application-specific data in a shared repository, I don't think that using the big hammer within your application is the right way. Security/privacy conscious users are more likely to want the thumbnail repository to be purged on a regular basis. That functionality has its place desktop-wide, not within an application, without any feedback or questions.
(In reply to Claudio Saavedra from comment #18) > (I think gpoo had written a script to delete thumbnails for files no longer > available. But that was ages ago.) https://calcifer.org/notes/2006/08/how-much-space-are-eating-your-thumbnails.html The code lives in https://github.com/gpoo/wasteland and requires to update the path from: rootdir = os.path.join(homedir, '.thumbnails') to: rootdir = os.path.join(homedir, '.cache', 'thumbnails') The UI is totally unnecessary, though.
(In reply to Claudio Saavedra from comment #18) > (I think gpoo had written a script to delete thumbnails for files no longer > available. But that was ages ago.) I don't see how this could be useful to us.
(In reply to Michael Catanzaro from comment #20) > We definitely store thumbnails for every webpage visited, not just popular > pages. It's needed to make the overview work. Otherwise, when you delete > pages from the overview, the new page that appears would never have a > thumbnail. This is wasteful. Having a blank thumbnail temporarily, until the page in question is visited again, is much better than thumbnailing every single page visited. (In reply to Michael Catanzaro from comment #21) > > Do you know of any non-Epiphany application that is unable to gracefully > handle deletion of the thumbnail cache? Do you understand that this is not a technical problem but one of domain? An app shouldn't unilaterally delete other applications' cache. Removing only the thumbnails with a http/https Thumb::URI should be enough.
(In reply to Claudio Saavedra from comment #25) > This is wasteful. Having a blank thumbnail temporarily, until the page in > question is visited again, is much better than thumbnailing every single > page visited. Extremely wasteful, yes. I didn't write the code. ;) Best solution would be to heuristically thumbnail pages when we think they'll enter the top 20 or so, that way we don't have to save PNGs for every webpage ever visited in the history database. > (In reply to Michael Catanzaro from comment #21) > Do you understand that this is not a technical problem but one of domain? An > app shouldn't unilaterally delete other applications' cache. I believe it's well-understood that applications should be prepared for everything saved under ~/.cache to be deleted externally. But I'd still delete the cache even if that wasn't the case, because this is a severe privacy leak and the harm of leaving the cache intact very clearly outweighs the harm of deleting other applications' data. But anyway: > Removing only the thumbnails with a http/https Thumb::URI should be enough. Yes, let's do this. No doubt this will still remove other application's thumbnails, and it won't remove all of the Epiphany thumbnails, but it will be plenty good enough, and it will at least avoid deleting the entire thumbnail cache.
(In reply to Michael Catanzaro from comment #26) <snip> > > (In reply to Michael Catanzaro from comment #21) > > Do you understand that this is not a technical problem but one of domain? An > > app shouldn't unilaterally delete other applications' cache. > > I believe it's well-understood that applications should be prepared for > everything saved under ~/.cache to be deleted externally. But I'd still > delete the cache even if that wasn't the case, because this is a severe > privacy leak and the harm of leaving the cache intact very clearly outweighs > the harm of deleting other applications' data. But anyway: Then do a blog post about it, and explain how those can be identified. Trying to retrospectively hide that bug by nuking my 500 megs of thumbnail cache isn't going to do anyone any favours. Being able to gracefully handle missing thumbnails and recreating them doesn't mean that there won't be data loss, or extra CPU and I/O resources used up. > > Removing only the thumbnails with a http/https Thumb::URI should be enough. > > Yes, let's do this. No doubt this will still remove other application's > thumbnails, and it won't remove all of the Epiphany thumbnails, but it will > be plenty good enough, and it will at least avoid deleting the entire > thumbnail cache. It will remove all of the thumbnails created by totem for the majority of its integrated services (YouTube, UPnP, movie trailers, etc.).
I filed bug #792414 to improve the thumbnailing strategy. (In reply to Bastien Nocera from comment #27) > It will remove all of the thumbnails created by totem for the majority of > its integrated services (YouTube, UPnP, movie trailers, etc.). OK, there's really no way to even guess at which are created by Epiphany, then, and I insist that Epiphany must automatically clean up all that it might have created, rather than expect users to read my blog. I apologize for this problem -- I know it's Epiphany's fault -- but having a clean cache is not the end of the world, and seems to be the only solution. I presume that totem is capable of regenerating thumbnails when required.
(In reply to Michael Catanzaro from comment #28) > I filed bug #792414 to improve the thumbnailing strategy. > > (In reply to Bastien Nocera from comment #27) > > It will remove all of the thumbnails created by totem for the majority of > > its integrated services (YouTube, UPnP, movie trailers, etc.). > > OK, there's really no way to even guess at which are created by Epiphany, > then, and I insist that Epiphany must automatically clean up all that it > might have created, rather than expect users to read my blog. I apologize > for this problem -- I know it's Epiphany's fault -- but having a clean cache > is not the end of the world, and seems to be the only solution. I presume > that totem is capable of regenerating thumbnails when required. You're deleting user files in a directory shared between all of the user's applications without asking them. I still don't see how this can ever be a good thing to do. It doesn't even put the files in the trash, it deletes them.
The work-around would be to run: echo 26 > ~/.config/epiphany/.migrated just before running a version of epiphany with this "feature". I hope I don't forget.
No, you can't make this kind of decision. Please revert. I don't know if there's any formal procedure to do such a change but I presume that you will need approval from the release team to do something like this.
I've gotten more opinions, and everyone except me seems to think the cache should not be removed. So I'll revert the commit that added the migrator. I think that's a shame, because it will leak a huge amount of browsing history, but oh well. For historical interest: originally only the pages that were actually in the overview received snapshots. That changed by accident in https://git.gnome.org/browse/epiphany/commit/?id=0433ac9dafd91664134ae1c300c6841e86a58d91; looks like that was just an oversight in a big refactor.
I wonder how this is a fix. It seems you only see black and white ("my solution or nothing").
(In reply to Germán Poo-Caamaño from comment #33) > I wonder how this is a fix. It seems you only see black and white ("my > solution or nothing"). German, there is no way to know which thumbnails were generated by Ephy. We don't have a ton of options here. Proposed compromises were: * Delete thumbs with http:// and https:/// URIs in the metadata. This will delete Totem's thumbs, Bastien doesn't like it. * Delete all thumbnails corresponding to Ephy history. This sounds good, but it leaks thumbs for all history that has been deleted by the user, so it only deletes all the thumbs that Epiphany does not actually need to delete... it fails to delete all of the thumbs that actually should be deleted. (Storing thumbs for pages that are still in history is unproblematic.) If you have a better idea, that would be great, but it looks like there is no good option other than 'rm -rf ~/.cache/thumbnails'.
- Tell the user why we want to delete a bunch of thumbnails and ask if we should do it.
Then the user is confused: what is a cache? What does it look like? Should I delete it? What are the consequences of deleting it? I don't want my computer to break! Let's pick for the user: either we should, or should not, delete the cache. After all, if something bad happens after automatically deleting the cache (CPU churns regenerating thumbnails, or applications fail to regenerate thumbnails altogether), then why should we allow users to choose to trigger that breakage? Either it's safe to do (as I suggest), or it's not, and consensus here seems to be not. Anyway, it's at least fixed going forward.
In which case I think you shouldn't. Get a CVE assigned to this problem, advertise the problem and possible solutions, and let security conscious users fix it themselves. In most cases, this would only be a problem for shared accounts, and on systems where you'd expect the storage to also play a part (the data is probably recoverable for forensic analysis, so unlink'ing it isn't enough to protect people against that).