Bug 674284 – EOG memory leak on viewing many jpeg images

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 674284 - EOG memory leak on viewing many jpeg images


Summary:	EOG memory leak on viewing many jpeg images


Status:	RESOLVED OBSOLETE

Product:	eog
Classification:	Core
Component:	general
Version:	3.8.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	EOG Maintainers
QA Contact:	EOG Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2012-04-17 20:15 UTC by Žygimantas Augilius
Modified:	2021-06-19 08:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
example of jpeg file, used for viewing (29.16 KB, image/jpeg) 2012-04-17 20:15 UTC, Žygimantas Augilius		Details
eog valgrind log (69.02 KB, application/zip) 2012-04-19 20:50 UTC, Žygimantas Augilius		Details
eog major memory leaks (287.92 KB, image/jpeg) 2014-04-17 21:30 UTC, florin.arjocu		Details
eog-list-store: unref a dangling EogImage reference (691 bytes, patch) 2018-05-22 10:11 UTC, Claudio Saavedra	committed	Details \| Review
Proof of concept (582 bytes, patch) 2018-06-09 11:07 UTC, Felix Riemann	none	Details \| Review

Description Žygimantas Augilius 2012-04-17 20:15:29 UTC

Created attachment 212237 [details]
example of jpeg file, used for viewing

-------
jeegiz@hiuvlet:~$ lsb_release -rd
Description:	Ubuntu precise (development branch)
Release:	12.04
-------
jeegiz@hiuvlet:~$ apt-cache showpkg eog
Package: eog
Versions:
3.4.0-0ubuntu1 (/var/lib/apt/lists/mirror.soften.ktu.lt_ubuntu_dists_precise_main_binary-amd64_Packages) (/var/lib/dpkg/status)
---------

When viewing multiple jpg images, generated by 'motion' (webcam monitoring) software eog is leaking memory, after viewing several hundred images computer starts to slow down. Top command results:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15574 jeegiz 20 0 616m 39m  13m S 0 1.0 0:01.84 eog <---- 1 jpg file
15574 jeegiz 20 0 619m 43m  14m S 0 1.1 0:24.56 eog <--- 50 jpg files
15574 jeegiz 20 0 620m 51m  14m S 0 1.3 0:39.13 eog <--- 100 jpg files
15574 jeegiz 20 0 684m 107m 14m S 0 2.7 0:47.83 eog <--- 200 jpg files

Comment 1 André Klapper 2012-04-18 10:01:10 UTC

Any chance to provide valgrind logs?

Comment 2 Žygimantas Augilius 2012-04-19 20:50:47 UTC

Created attachment 212385 [details]
eog valgrind log

Attached eog valgrind log.

It was run according https://wiki.ubuntu.com/Valgrind instructions:
G_SLICE=always-malloc G_DEBUG=gc-friendly  valgrind -v --tool=memcheck --leak-check=full --num-callers=40 --log-file=valgrind.log /usr/bin/eog

Have tried to view 300 jpeg images, when runing eog with Valgrind

Comment 3 Felix Riemann 2012-04-24 16:22:24 UTC

(In reply to comment #2)
> Created an attachment (id=212385) [details]
> eog valgrind log
> 
==22977== LEAK SUMMARY:
==22977==    definitely lost: 49,169 bytes in 465 blocks
==22977==    indirectly lost: 83,016 bytes in 2,191 blocks
==22977==      possibly lost: 369,816 bytes in 5,584 blocks

That's not much.

==22977==    still reachable: 41,063,652 bytes in 129,869 blocks

Well, this is a bit much I'd say. Looks like some objects are not disposed of correctly. Strange that they should be still reachable. We have some static data that is supposed to show up here but I'd be surprised if it were that much.

Do you have any plugins activated? Can you rerun valgrind with "--show-reachable=yes"? I'll see if I can reproduce it too.

Comment 4 Claudio Saavedra 2012-04-25 12:31:55 UTC

Don't want to start pointing fingers without proper investigation, but a quick glance at the valgrind log shows plenty of liappbmenu and other ubuntu specific bits calls. Perhaps the problem is somewhere there?

Comment 5 florin.arjocu 2014-04-17 21:30:24 UTC

Created attachment 274632 [details]
eog major memory leaks

A few versions passed and this is still here. Ubuntu 13.10, EOG 3.8.2 using 1.3 GB of memory after watching a few dozen files (18 megapixels). See the attached screenshot.

Comment 6 André Klapper 2014-04-18 05:51:11 UTC

Florin: Please provide valgrind logs if you can reproduce.

Comment 7 Furkan 2015-06-11 21:00:57 UTC

I run a slideshow on a Raspberry Pi (running Raspbian, with eog 3.4.2-1+build1). Right now it's cycling through 6 images. The RPi has 512MB of RAM. Initially eog uses about 9% of memory, and then after about 5 days it reaches over 85% and the system becomes nearly unresponsive to ssh.

Comment 8 André Klapper 2015-06-11 22:04:36 UTC

3.4 is several years old and ancient history. 
It's more interesting if this still happens in 3.16 or 3.14.

Comment 9 Furkan 2016-06-14 08:09:46 UTC

This probably warrants a new report by now, but I still have this issue on Raspbian Jessie, with eog 3.14. The slideshow runs 24/7, and even with 3-4 pictures (a mix of jpeg and png) the memory usage reaches over 80% within the span of 5-7 days. My workaround is a script that kills eog and restarts it every night.

If it will prove useful, I can run valgrind and share the output, but please let me know what arguments I should run it with.

Comment 10 André Klapper 2016-06-14 14:56:51 UTC

Testing with 3.20 (latest stable version) is very welcome. :)

Comment 11 Furkan 2016-06-14 17:52:57 UTC

I cloned the eog git repository, and ran autogen.sh, which resulted in this error:

configure: error: Package requirements (gtk+-3.0 >= 3.19.3              glib-2.0 >= 2.42.0              gio-2.0 >= 2.42.0              gio-unix-2.0 >= 2.42.0              gnome-desktop-3.0 >= 2.91.2              gdk-pixbuf-2.0 >= 2.30.0              gtk+-unix-print-3.0 >= 3.5.4              shared-mime-info >= 0.20              gsettings-desktop-schemas >= 2.91.92              libpeas-1.0 >= 0.7.4              libpeas-gtk-1.0 >= 0.7.4 libexif >= 0.6.14 lcms2 exempi-2.0 >= 1.99.5 librsvg-2.0 >= 2.36.2) were not met:

Requested 'gtk+-3.0 >= 3.19.3' but version of GTK+ is 3.14.5

So I guess compiling eog from git is out of the question on Raspbian, or at least not without putting significant effort into it.

Do you have any suggestions as to what I could do to find out if the source of the leak really is eog, or if it's one of the shared libraries as suggested in comment #4? If the cause is eog, I can do further testing when I get the chance to put Debian Stretch or Sid (which both seem to have eog 3.20) on a Raspberry Pi 2 or 3 (I don't currently have any spares on hand).

Comment 12 Martin Simon 2017-12-02 13:28:48 UTC

Hi,

I've run into the same error. After browsing hundreds of jpeg images in eog, the amount of used RAM by eog gets too high - it can easily allocate 3GB or RAM with only few thousands of images with their own size like 400MB.. So, it allocates a lot of memory for single image (that could not be a problem), but id doesn't free it after going to the next image.

I'm running eog of version 3.26.2-1 on debian testing. I can provide logs, if that would help, just tell me how to generate/collect them.

Comment 13 Felix Riemann 2017-12-10 16:51:12 UTC

(In reply to Martin Simon from comment #12)
> 
> I'm running eog of version 3.26.2-1 on debian testing. I can provide logs,
> if that would help, just tell me how to generate/collect them.

Please see here https://wiki.gnome.org/Valgrind

Although I tend to set these envvars: 
G_DEBUG=resident-modules,gc-friendly
G_SLICE=always-malloc

Note that eog will become extremely slow while running in valgrind. Please install debuginfo packages beforehand so valgrind can resolve symbols.

Comment 14 Martin Simon 2018-02-03 15:51:59 UTC

Hi Felix,

please, find the generated log here https://paste.gnome.org/pnbikukwl. I've generated the log running command

G_DEBUG=resident-modules,gc-friendly G_SLICE=always-malloc valgrind --tool=memcheck --leak-check=full --leak-resolution=high --num-callers=20 --log-file=vgdump eog imgs/img0000001.jpg

and the browsed to one-by-one from image 1 to image 1000 (approx). Hope that helps somehow. I can still see that the eog allocates about 250 MB after browsing of 1000 jpeg images of resolution 1280x720 and size about 250KB.

Comment 15 Felix Riemann 2018-02-05 18:41:07 UTC

Hmm, again this situation:

==2832== LEAK SUMMARY:
==2832==    definitely lost: 92,992 bytes in 1,005 blocks
==2832==    indirectly lost: 195,848 bytes in 4,544 blocks
==2832==      possibly lost: 7,123 bytes in 71 blocks

There's not much lost.

==2832==    still reachable: 104,610,975 bytes in 101,665 blocks

This is a lot. However, I'm unable to reproduce such a huge number with about 200 images. I am always below 10 MB.

 Could you recreate the log with the --show-leak-kinds=all parameter added? Also, which version and distribution is this? I am seeing some deltas between the line numbers in your log and the actual line numbers, so there could be some distro patching involved. Do you have any of eog's plugins activated?

Comment 16 Martin Simon 2018-02-26 17:31:09 UTC

Hi Alex,

sorry for delay. Please, find the log here https://paste.gnome.org/pbwnadaaw (sorry, I realized too late that maybe only a subset of images could be enough, so I had to snip the center a bit to make it fit to gnome's pastebin..). 

I executed the very same command above very same images, only with extra "--show-leak-kinds=all" parameter. The version of eog is 3.26.2-3 from debian testing, and so is the distribution - debian testing (Linux debian 4.14.0-2-amd64 #1 SMP Debian 4.14.7-1 (2017-12-22) x86_64 GNU/Linux).

The only plugin activated in eog is "Fullscreen with double-click", as far I can see in Preferences menu.

Regards,
Martin

Comment 17 Felix Riemann 2018-02-26 19:02:58 UTC

Thanks!

==3009== 58,060,800 bytes in 21 blocks are still reachable in loss record 14,074 of 14,074
==3009==    at 0x4C2CB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3009==    by 0x80F2FE2: gdk_pixbuf_new (in /usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0.3611.0)
==3009==    by 0x1A8BF6C7: ??? (in /usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-jpeg.so)
==3009==    by 0x80FAA93: gdk_pixbuf_loader_write (in /usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0.3611.0)
==3009==    by 0x4E608E4: eog_image_real_load (eog-image.c:1045)
==3009==    by 0x4E608E4: eog_image_load (eog-image.c:1293)
==3009==    by 0x4E64E50: eog_job_load_run (eog-jobs.c:573)
==3009==    by 0x4E630E5: eog_job_process (eog-job-scheduler.c:153)
==3009==    by 0x4E630E5: eog_job_scheduler (eog-job-scheduler.c:128)
==3009==    by 0x62585F4: ??? (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.5400.3)
==3009==    by 0x6501519: start_thread (pthread_create.c:465)
==3009==    by 0x680D3EE: clone (clone.S:95)

Well, this implicates that the image data of 21 images is still loaded, which is 20 too much if you only have one window. I don't really see how this happens yet. :-/

Since you probably clicked through more than 21 images I think it might be some kind of race condition, but finding these is even harder.

How do you actually click through your images? Did you click on each image in the collection or did you click the Next/Prev arrows in the UI or did you use keyboard navigation?

Comment 18 Martin Simon 2018-02-27 08:23:18 UTC

You're right, I have only one window of eog.

I usually open the first image (from cli here) and then using arrows (on keyboard) I'm moving forward/next. I don't open images one by one and generally don't use the Next GUI button by mouse. In this reproducer I tried all following; hold the right keyboard key for seconds, which normally behaves like a "movie", here it just jumps to the x-th image in a folder (due to the valgrind stuff), then I tried to go one by one by single keyboard clicks after the image is fully loaded (and rendered). I even tried to go back (no idea if this is relevant).

Basically, I went through more than 1000 images in the last reproducer, using only keyboard navigation, but both waiting till the image is loaded and not waiting by holding the right keyboard arrow.

Whoa, such a long text about nothing. Please, tell me how can I collect more information/logs and I'll do.

Regards,
Martin

Comment 19 Felix Riemann 2018-05-20 14:44:59 UTC

I was now able to make valgrind produce a similar backtrace as in comment 17.

However it only works with small images (that load fast enough??) and apparently only with slowdown due to valgrind. It's causable by "fast-forwarding"/skipping over images. Since it only happens with small images it may be a race condition between the loading job's completion and cancellation. Not sure how to debug this yet.

Additionally there could be a slight weakness in thumbnail handling especially if the collection pane is hidden. In this case it seems to be possible that the image's thumbnails are not cleared when no longer needed. If one assumes 128x78px thumbnails this could accumulate to 200MB RAM usage just for thumbnails in your 5238 image example. Will need to take a closer look at that too.

Comment 20 Claudio Saavedra 2018-05-22 10:11:58 UTC

Created attachment 372330 [details] [review]
eog-list-store: unref a dangling EogImage reference

Comment 21 Claudio Saavedra 2018-05-22 10:13:29 UTC

One for old time's sake. I don't have much time to look into this but the comments above made me look at eog list store. I think at least the above EogImage reference is leaking. It would be good if someone who can reproduce the memory issue can check whether this helps at all.

Comment 22 Felix Riemann 2018-05-25 17:54:20 UTC

Review of attachment 372330 [details] [review]:

Nice catch! :)

This looks like a valid fix. However, it seems there's something else keeping the EogImage instances around:

==4931== 53,664 bytes in 258 blocks are still reachable in loss record 14,653 of 14,659
==4931==    at 0x50E1F73: g_type_create_instance (gtype.c:1845)
==4931==    by 0x50C4A84: g_object_new_internal (gobject.c:1805)
==4931==    by 0x50C60E4: g_object_new_with_properties (gobject.c:1973)
==4931==    by 0x50C6B18: g_object_new (gobject.c:1645)
==4931==    by 0x4E5D9C3: eog_image_new_file (eog-image.c:339)
==4931==    by 0x4E6593D: eog_list_store_append_image_from_file (eog-list-store.c:368)
==4931==    by 0x4E65A82: directory_visit (eog-list-store.c:498)
==4931==    by 0x4E65A82: eog_list_store_append_directory.constprop.1 (eog-list-store.c:534)
==4931==    by 0x4E65F6A: eog_list_store_add_files (eog-list-store.c:602)
==4931==    by 0x4E63226: eog_job_model_run (eog-jobs.c:814)
==4931==    by 0x4E62495: eog_job_process (eog-job-scheduler.c:153)
==4931==    by 0x4E62495: eog_job_scheduler (eog-job-scheduler.c:128)
==4931==    by 0x5374AA5: g_thread_proxy (gthread.c:784)
==4931==    by 0x64F8593: start_thread (pthread_create.c:463)
==4931==    by 0x6809FDE: clone (clone.S:95)

Comment 23 Felix Riemann 2018-05-25 18:03:59 UTC

So, I think I found something as well. It seems to be related to a race with cancelling EogJobs.

The setup is the fast seeking through the collection where images may get skipped. Now for every image an EogLoadJob is created. It is asynchronously executed by the job queue. Once the job finishes it queues the "finished" signal to called from the main loop.

However it seems that if the timing is right and the images load fast enough there could be a race condition where cancelling and running the job race. It looks like the job has already run or is already running (although the former seems to be more common on my maching) but before the "finished" signal is executed by the main loop the job's owner has cancelled it and has disconnected itself from the job's signals. Due to the way the image data is handled in EogImage (having its own ref count) there is no simple way to clear the image data again, effectively increasing the memory usage. Not sure how to fix that yet.

Comment 24 Claudio Saavedra 2018-05-27 10:41:50 UTC

Comment on attachment 372330 [details] [review]
eog-list-store: unref a dangling EogImage reference

Attachment 372330 [details] pushed as 8032c8a - eog-list-store: unref a dangling EogImage reference

Comment 25 Claudio Saavedra 2018-05-27 11:00:11 UTC

From what you describe it might make sense to not disconnect from the finished callback at all. Once it's called, check if the operation had been cancelled, and in that case, discard the data. I think GAsyncResult is designed in that way.

Comment 26 Jan Claeys 2018-06-07 20:16:39 UTC

I also see this happen when you click on the "next image" or "previous image" overlay buttons, or the arrow keys, before the image is fully loaded, resized & displayed.  This is in a directory with large images (something like 8000×6000 pixels), so it takes a couple seconds to show the next image, and you can click the button several times during that period.

When I monitor memory usage of EOG, I see that when I wait until the next/previous image is shown, memory usage increases to load the new image, and then it goes down again as (I assume) it releases the memory of the previous image.

But when I click several times before the image is shown, it seems like it loads all these images (or at least allocates memory for them), but only releases memory for one of them.  The same seems to happen when you use the left/right arrow keys on the keyboard by pressing it repeatedly (or by keeping it pushed down).


[I just had to kill EOG because it was using nearly 60GB of memory... ugh!]

Comment 27 Felix Riemann 2018-06-09 10:55:39 UTC

(In reply to Claudio Saavedra from comment #25)
> From what you describe it might make sense to not disconnect from the
> finished callback at all. Once it's called, check if the operation had been
> cancelled, and in that case, discard the data. I think GAsyncResult is
> designed in that way.

I don't think that the cancellation is really the issue here. The problem lies in "discard the data". The image data (pixbuf, metadata) has its own ref counter (eog_image_data_ref/_unref). Loading the image data however doesn't earn you a reference. In the normal case the image view will ref the image data and unref it once you switch to the next image. That releases the image data.

Looking for eog_image_data_ref calls gives just a few users, so I have the feeling that the image data ref counter is a bit under-used and many users of the image data simply rely on the image's ref counter instead. And to make it a bit slightly more complicated one can choose to load only the metadata now and load the pixel data later (as used by the tooltip in the collection pane).

So, I fead a real solution for this might be turn out slightly larger and involve changing a few APIs.

Comment 28 Felix Riemann 2018-06-09 11:02:51 UTC

(In reply to Jan Claeys from comment #26)
> I also see this happen when you click on the "next image" or "previous
> image" overlay buttons, or the arrow keys, before the image is fully loaded,
> resized & displayed.  This is in a directory with large images (something
> like 8000×6000 pixels), so it takes a couple seconds to show the next image,
> and you can click the button several times during that period.
> 
> When I monitor memory usage of EOG, I see that when I wait until the
> next/previous image is shown, memory usage increases to load the new image,
> and then it goes down again as (I assume) it releases the memory of the
> previous image.
> 
> But when I click several times before the image is shown, it seems like it
> loads all these images (or at least allocates memory for them), but only
> releases memory for one of them.  The same seems to happen when you use the
> left/right arrow keys on the keyboard by pressing it repeatedly (or by
> keeping it pushed down).

The moment you trigger changing the image the window will disconnect from the already running job. Depending on how fast the cancellation reaches the loading thread it may be that the job cannot abort loading the image anymore (it is setup to do so if it cancelled if the image is still being read). This would as well result in the loaded image data not being releases.

Comment 29 Felix Riemann 2018-06-09 11:07:27 UTC

Created attachment 372620 [details] [review]
Proof of concept

Would it be possible for you to test this patch?

It's not a solution, but just to make sure we're on the right track here.

This is a proof-of-concept patch that makes the job take a reference on the image data. Note that it may cause a crash if it causes the image data or metadata to release too early or if other users don't hold references themselves.

Comment 30 André Klapper 2021-06-19 08:47:20 UTC

GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/eog/-/issues/

Thank you for your understanding and your help.