After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 690509 - When a on-disk backend is finalized, space disk is not freed
When a on-disk backend is finalized, space disk is not freed
Status: RESOLVED FIXED
Product: GEGL
Classification: Other
Component: GeglBuffer
git master
Other All
: Normal major
: ---
Assigned To: Jehan
Default Gegl Component Owner
Depends on:
Blocks:
 
 
Reported: 2012-12-19 17:16 UTC by Jehan
Modified: 2013-01-28 18:25 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Freeing swapped disk space upon on-disk tile backend finalization (4.17 KB, patch)
2012-12-19 17:16 UTC, Jehan
none Details | Review
Freeing swapped disk space upon on-disk tile backend finalization (4.99 KB, patch)
2013-01-26 10:54 UTC, Jehan
none Details | Review

Description Jehan 2012-12-19 17:16:43 UTC
Created attachment 231912 [details] [review]
Freeing swapped disk space upon on-disk tile backend finalization

When a GEGL buffer swaps to disk, then is later destroyed, the swap file is not deleted (I found 3 backends which may swap to disk file, file-async, and file-mapped, if not mistaken).
So if you open a bunch of buffer, then destroy them, your disk will get more and more filled up and *never* down (and it can actually go pretty wild fast if you process very huge images, and do a lot of processing, duplicating layers, merging, scaling, etc. without ever closing GIMP).
This lost disk space won't be retrieved until you end GIMP (will call gegl_exit() which cleans all swap files created by this PID).

1/ Run GIMP master (or any other code using GEGL), and open some big image.
2/ Check it swapped to disk (defaults to $XDG_CACHE_HOME, on Linux, you should find this there: ~/.cache/gegl-0.2/swap/). You'll see probably a bunch of files there once the image is opened.
3/ Close the image. The files are still there, yet the GEGL buffer has been destroyed, the path too, so this is lost disk space.
Actually you can `du -sh ~/.cache/gegl-0.2/swap/` and see that the space used continually goes up, with new files, never down.

The attached patch fixes this issue by checking the existence of the swap file and removing it when finalizing the associated backend.

As a side change, I ensured the configured swap path variable consistently never ends up with a directory separator, even when set by environment variable. This allows proper path comparison (because I check that a tile swap path is consistent before deletion for security).
Comment 1 Jehan 2013-01-19 06:41:16 UTC
Hey Mitch,

now that December is over ;-), I believe this ticket is pretty important (disk space going constantly up the roof). Maybe with huge hard drives nowadays, nobody noticed it; but working with a small SSD myself, I can tell you I noticed when it got filled pretty fast just by working with GIMP. So if you could have a look at my patch, that would be awesome.
Comment 2 Michael Natterer 2013-01-19 09:24:35 UTC
That looks pretty sane to me. However pippin keeps telling me about some
swap files being meant to talk to gegl buffers in other processes. I would
appreciate a look from pippin too.
Comment 3 Jehan 2013-01-20 07:30:30 UTC
Ok. Let's wait his opinion then.

In the current case, we don't remove swaps when a tile backend is finalized. That remaining file on disk is basically a "disk leak" (can you say this for disk space? :p) because the "link" to this swap is destroyed in the same time. I don't think there should be much problem in cleaning properly, and I don't see what kind of "talk" there could still be with swap files of destroyed tiles.
But let's wait pippin's opinion to be sure. :-)
Comment 4 Jon Nordby 2013-01-20 11:41:31 UTC
There is some code for freeing swap in gegl-init.c also. To avoid duplication I suggest moving that code out to a function, and call it from the different file backends.
In GeglBufferHeader there is a lock flag which processes are to take when using (only when writing?) a GeglBuffer file. One could check this to see if one can remove a file, though the current swap freeing code in gegl-init.c seems to ignore it.
Comment 5 Jehan 2013-01-20 13:46:38 UTC
Jon > code factorization, of course, that's rarely a bad idea. Would it be better in gegl-init.c or gegl-buffer.c?

Could you explain what is the lock flag you are talking about? I checked gegl/buffer/gegl-buffer-index.h (which is where GeglBufferHeader is defined) and could not understand what you meant by "taking a lock flag" from this code.

Also in general, why should swap freeing code take any thing else into account than the fact that there exist swap files on disk and that they won't be referenced anywhere else after the code ends (so that's disk space leak)? Maybe I miss something?
Comment 6 Jon Nordby 2013-01-20 14:50:39 UTC
If the code is to be called from the tile backends, in gegl-buffer.c would be better. 
In general, a GeglBuffer may be used from multiple applications, using the on-disk datastructure to communicate between the different processes. But we can probably safely assume that swap files are not used like that.

Only problem with using the swap directory to determine whether a backend is used for swap or not; what if the swap directory changes during the lifetime of the tile backend?
Comment 7 Jehan 2013-01-20 15:26:51 UTC
Jon > I don't use the swap directory to determine if a file is a swap file or not. That's only an additional security. Like just in case there were an unseen security flaw in GEGL where someone could switch a swap file path with a user file path for instance, hence have GEGL delete user files (either through a program bug, or a malevolent action). Basically I ensure we never delete a file which is not in the known swap directory.

Now about your worry, the swap directory cannot change in the life of the process. As you can see from gegl_swap_dir() function, the swap directory is only computed once (and saved in a static variable, reused each time the value is needed again), at the start of the program. So even if the user were to change environment variables in the process lifetime, or whatever, the swap directory won't change. So there is no problem here.

Or maybe you are talking about the multi-process buffer communication thing you talk about and a hypothetical case where 2 processes use a different swap directory, and they would share a buffer backend; and the first process would create the backend, and the swap file, whereas the other process would terminate it (but not the first one?). Is that what you are talking about? Is this even actually possible with GEGL?
If it is possible, I would say that this can indeed be a small problem. I would also say that a user who plays with environment variables so that one has several GEGL-using processes all using different swap directories really likes to play with fire. And finally I would conclude that if I have to choose, I prefer to give priority to the basic security check over the space leak for the fire-playing user, because nothing is worse than an application with a security bug which erases user files (whereas space leak, well you can just erase swap files by hand after. No loss).
Comment 8 Jon Nordby 2013-01-20 16:50:15 UTC
Case 1:
The swap directory will change if someone using the GEGL API (like GIMP), does g_object_set(gegl_config(), "swap", ...); However it does not seem that we handle this at all right now either, so perhaps that the swap file leak in this case is fine.

Case 2:
It is supposed to be possible (citing pippin) for two processes to share a buffer, but I don't see a usecase for this on the automatically generated swap files. Rather, I would expect to do this by something like this in 2 or more processes:
GObject *backend = g_object_new(GEGL_TYPE_TILE_BACKEND_FILE, "path", "/somewhere/shared.geglbuf", NULL);
GeglBuffer *buf = gegl_buffer_new_for_backend(NULL, backend);
... do things to buf ...

In this case, I would expect the backend to leave removing the file to me (or to properly handle multiple-process access). Your patch does that, as long as the file is not in the swap dir, which is OK with me.
Comment 9 Michael Natterer 2013-01-20 19:06:02 UTC
Case 1: not going to happen with GIMP, the resp. config property has
        the RESTART flag set, so will never change in one session.

Case 2: is not going to happen in the swap dir itself either, unless
        the user/app are trying to shoot themselves in the foot deliberately.

So I agree, this seems safe. Note that we still need some function that
kills orphaned-by-crash swap files on startup.
Comment 10 Jehan 2013-01-26 10:47:20 UTC
Michael > orphaned-by-crash swap files are already removed on startup. See the function swap_clean() in gegl-init.c. It is run as some hook function set up during gegl_init() and remove any swap file whose creator PID (determined by file name) is not running anymore (so it either crashed, or the developer forgot to call gegl_exit()).

I have also tested on master to be sure, so I confirm that's working. :-)
Comment 11 Jehan 2013-01-26 10:54:39 UTC
Created attachment 234468 [details] [review]
Freeing swapped disk space upon on-disk tile backend finalization

Attached is the new version of the patch. Basically the same code as before, except with just a little factorization: I created the function gegl_tile_backend_unlink_swap() to be used in any backend which may possibly swap on disk.
Also I settled to define it in gegl-tile-backend.c (instead of gegl-init.c or gegl-buffer.c mentioned above) because I figured swap management is specific to some backends (which may or may not have swapping mechanisms), not at buffer level.
Comment 12 Øyvind Kolås (pippin) 2013-01-28 18:25:16 UTC
This seems good, I do see the use case of multiple processes accessing the same buffer through swap.. when one wants to use an already existing buffer from a different process. Though it might be that creating an infrastructure similar to how GIMPs core and plug-ins communicate,. and a separate tile backend for that would work better than directly sharing mmapped swap files.

commit 88caa4a4aced4e35e297a54325ad5336b66f4fad
Author: Jehan <jehan@girinstud.io>
Date:   Wed Dec 19 14:11:24 2012 +0900

    Bug 690509 - gegl-tile-backend: free disk space when finalizing a tile backend whic
    
    gegl_tile_backend_unlink_swap() written for this purpose and now used
    on the 3 currently existing backends which may swap to disk.
    Accessorily I ensure any swap path from GEGL config does not end with
    a directory separator, to ensure proper string comparison, even when
    provided by the user through environment variable.