GNOME Bugzilla – Bug 674284
EOG memory leak on viewing many jpeg images
Last modified: 2021-06-19 08:47:20 UTC
Created attachment 212237 [details] example of jpeg file, used for viewing ------- jeegiz@hiuvlet:~$ lsb_release -rd Description: Ubuntu precise (development branch) Release: 12.04 ------- jeegiz@hiuvlet:~$ apt-cache showpkg eog Package: eog Versions: 3.4.0-0ubuntu1 (/var/lib/apt/lists/mirror.soften.ktu.lt_ubuntu_dists_precise_main_binary-amd64_Packages) (/var/lib/dpkg/status) --------- When viewing multiple jpg images, generated by 'motion' (webcam monitoring) software eog is leaking memory, after viewing several hundred images computer starts to slow down. Top command results: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15574 jeegiz 20 0 616m 39m 13m S 0 1.0 0:01.84 eog <---- 1 jpg file 15574 jeegiz 20 0 619m 43m 14m S 0 1.1 0:24.56 eog <--- 50 jpg files 15574 jeegiz 20 0 620m 51m 14m S 0 1.3 0:39.13 eog <--- 100 jpg files 15574 jeegiz 20 0 684m 107m 14m S 0 2.7 0:47.83 eog <--- 200 jpg files
Any chance to provide valgrind logs?
Created attachment 212385 [details] eog valgrind log Attached eog valgrind log. It was run according https://wiki.ubuntu.com/Valgrind instructions: G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind -v --tool=memcheck --leak-check=full --num-callers=40 --log-file=valgrind.log /usr/bin/eog Have tried to view 300 jpeg images, when runing eog with Valgrind
(In reply to comment #2) > Created an attachment (id=212385) [details] > eog valgrind log > ==22977== LEAK SUMMARY: ==22977== definitely lost: 49,169 bytes in 465 blocks ==22977== indirectly lost: 83,016 bytes in 2,191 blocks ==22977== possibly lost: 369,816 bytes in 5,584 blocks That's not much. ==22977== still reachable: 41,063,652 bytes in 129,869 blocks Well, this is a bit much I'd say. Looks like some objects are not disposed of correctly. Strange that they should be still reachable. We have some static data that is supposed to show up here but I'd be surprised if it were that much. Do you have any plugins activated? Can you rerun valgrind with "--show-reachable=yes"? I'll see if I can reproduce it too.
Don't want to start pointing fingers without proper investigation, but a quick glance at the valgrind log shows plenty of liappbmenu and other ubuntu specific bits calls. Perhaps the problem is somewhere there?
Created attachment 274632 [details] eog major memory leaks A few versions passed and this is still here. Ubuntu 13.10, EOG 3.8.2 using 1.3 GB of memory after watching a few dozen files (18 megapixels). See the attached screenshot.
Florin: Please provide valgrind logs if you can reproduce.
I run a slideshow on a Raspberry Pi (running Raspbian, with eog 3.4.2-1+build1). Right now it's cycling through 6 images. The RPi has 512MB of RAM. Initially eog uses about 9% of memory, and then after about 5 days it reaches over 85% and the system becomes nearly unresponsive to ssh.
3.4 is several years old and ancient history. It's more interesting if this still happens in 3.16 or 3.14.
This probably warrants a new report by now, but I still have this issue on Raspbian Jessie, with eog 3.14. The slideshow runs 24/7, and even with 3-4 pictures (a mix of jpeg and png) the memory usage reaches over 80% within the span of 5-7 days. My workaround is a script that kills eog and restarts it every night. If it will prove useful, I can run valgrind and share the output, but please let me know what arguments I should run it with.
Testing with 3.20 (latest stable version) is very welcome. :)
I cloned the eog git repository, and ran autogen.sh, which resulted in this error: configure: error: Package requirements (gtk+-3.0 >= 3.19.3 glib-2.0 >= 2.42.0 gio-2.0 >= 2.42.0 gio-unix-2.0 >= 2.42.0 gnome-desktop-3.0 >= 2.91.2 gdk-pixbuf-2.0 >= 2.30.0 gtk+-unix-print-3.0 >= 3.5.4 shared-mime-info >= 0.20 gsettings-desktop-schemas >= 2.91.92 libpeas-1.0 >= 0.7.4 libpeas-gtk-1.0 >= 0.7.4 libexif >= 0.6.14 lcms2 exempi-2.0 >= 1.99.5 librsvg-2.0 >= 2.36.2) were not met: Requested 'gtk+-3.0 >= 3.19.3' but version of GTK+ is 3.14.5 So I guess compiling eog from git is out of the question on Raspbian, or at least not without putting significant effort into it. Do you have any suggestions as to what I could do to find out if the source of the leak really is eog, or if it's one of the shared libraries as suggested in comment #4? If the cause is eog, I can do further testing when I get the chance to put Debian Stretch or Sid (which both seem to have eog 3.20) on a Raspberry Pi 2 or 3 (I don't currently have any spares on hand).
Hi, I've run into the same error. After browsing hundreds of jpeg images in eog, the amount of used RAM by eog gets too high - it can easily allocate 3GB or RAM with only few thousands of images with their own size like 400MB.. So, it allocates a lot of memory for single image (that could not be a problem), but id doesn't free it after going to the next image. I'm running eog of version 3.26.2-1 on debian testing. I can provide logs, if that would help, just tell me how to generate/collect them.
(In reply to Martin Simon from comment #12) > > I'm running eog of version 3.26.2-1 on debian testing. I can provide logs, > if that would help, just tell me how to generate/collect them. Please see here https://wiki.gnome.org/Valgrind Although I tend to set these envvars: G_DEBUG=resident-modules,gc-friendly G_SLICE=always-malloc Note that eog will become extremely slow while running in valgrind. Please install debuginfo packages beforehand so valgrind can resolve symbols.
Hi Felix, please, find the generated log here https://paste.gnome.org/pnbikukwl. I've generated the log running command G_DEBUG=resident-modules,gc-friendly G_SLICE=always-malloc valgrind --tool=memcheck --leak-check=full --leak-resolution=high --num-callers=20 --log-file=vgdump eog imgs/img0000001.jpg and the browsed to one-by-one from image 1 to image 1000 (approx). Hope that helps somehow. I can still see that the eog allocates about 250 MB after browsing of 1000 jpeg images of resolution 1280x720 and size about 250KB.
Hmm, again this situation: ==2832== LEAK SUMMARY: ==2832== definitely lost: 92,992 bytes in 1,005 blocks ==2832== indirectly lost: 195,848 bytes in 4,544 blocks ==2832== possibly lost: 7,123 bytes in 71 blocks There's not much lost. ==2832== still reachable: 104,610,975 bytes in 101,665 blocks This is a lot. However, I'm unable to reproduce such a huge number with about 200 images. I am always below 10 MB. Could you recreate the log with the --show-leak-kinds=all parameter added? Also, which version and distribution is this? I am seeing some deltas between the line numbers in your log and the actual line numbers, so there could be some distro patching involved. Do you have any of eog's plugins activated?
Hi Alex, sorry for delay. Please, find the log here https://paste.gnome.org/pbwnadaaw (sorry, I realized too late that maybe only a subset of images could be enough, so I had to snip the center a bit to make it fit to gnome's pastebin..). I executed the very same command above very same images, only with extra "--show-leak-kinds=all" parameter. The version of eog is 3.26.2-3 from debian testing, and so is the distribution - debian testing (Linux debian 4.14.0-2-amd64 #1 SMP Debian 4.14.7-1 (2017-12-22) x86_64 GNU/Linux). The only plugin activated in eog is "Fullscreen with double-click", as far I can see in Preferences menu. Regards, Martin
Thanks! ==3009== 58,060,800 bytes in 21 blocks are still reachable in loss record 14,074 of 14,074 ==3009== at 0x4C2CB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==3009== by 0x80F2FE2: gdk_pixbuf_new (in /usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0.3611.0) ==3009== by 0x1A8BF6C7: ??? (in /usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-jpeg.so) ==3009== by 0x80FAA93: gdk_pixbuf_loader_write (in /usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0.3611.0) ==3009== by 0x4E608E4: eog_image_real_load (eog-image.c:1045) ==3009== by 0x4E608E4: eog_image_load (eog-image.c:1293) ==3009== by 0x4E64E50: eog_job_load_run (eog-jobs.c:573) ==3009== by 0x4E630E5: eog_job_process (eog-job-scheduler.c:153) ==3009== by 0x4E630E5: eog_job_scheduler (eog-job-scheduler.c:128) ==3009== by 0x62585F4: ??? (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.5400.3) ==3009== by 0x6501519: start_thread (pthread_create.c:465) ==3009== by 0x680D3EE: clone (clone.S:95) Well, this implicates that the image data of 21 images is still loaded, which is 20 too much if you only have one window. I don't really see how this happens yet. :-/ Since you probably clicked through more than 21 images I think it might be some kind of race condition, but finding these is even harder. How do you actually click through your images? Did you click on each image in the collection or did you click the Next/Prev arrows in the UI or did you use keyboard navigation?
You're right, I have only one window of eog. I usually open the first image (from cli here) and then using arrows (on keyboard) I'm moving forward/next. I don't open images one by one and generally don't use the Next GUI button by mouse. In this reproducer I tried all following; hold the right keyboard key for seconds, which normally behaves like a "movie", here it just jumps to the x-th image in a folder (due to the valgrind stuff), then I tried to go one by one by single keyboard clicks after the image is fully loaded (and rendered). I even tried to go back (no idea if this is relevant). Basically, I went through more than 1000 images in the last reproducer, using only keyboard navigation, but both waiting till the image is loaded and not waiting by holding the right keyboard arrow. Whoa, such a long text about nothing. Please, tell me how can I collect more information/logs and I'll do. Regards, Martin
I was now able to make valgrind produce a similar backtrace as in comment 17. However it only works with small images (that load fast enough??) and apparently only with slowdown due to valgrind. It's causable by "fast-forwarding"/skipping over images. Since it only happens with small images it may be a race condition between the loading job's completion and cancellation. Not sure how to debug this yet. Additionally there could be a slight weakness in thumbnail handling especially if the collection pane is hidden. In this case it seems to be possible that the image's thumbnails are not cleared when no longer needed. If one assumes 128x78px thumbnails this could accumulate to 200MB RAM usage just for thumbnails in your 5238 image example. Will need to take a closer look at that too.
Created attachment 372330 [details] [review] eog-list-store: unref a dangling EogImage reference
One for old time's sake. I don't have much time to look into this but the comments above made me look at eog list store. I think at least the above EogImage reference is leaking. It would be good if someone who can reproduce the memory issue can check whether this helps at all.
Review of attachment 372330 [details] [review]: Nice catch! :) This looks like a valid fix. However, it seems there's something else keeping the EogImage instances around: ==4931== 53,664 bytes in 258 blocks are still reachable in loss record 14,653 of 14,659 ==4931== at 0x50E1F73: g_type_create_instance (gtype.c:1845) ==4931== by 0x50C4A84: g_object_new_internal (gobject.c:1805) ==4931== by 0x50C60E4: g_object_new_with_properties (gobject.c:1973) ==4931== by 0x50C6B18: g_object_new (gobject.c:1645) ==4931== by 0x4E5D9C3: eog_image_new_file (eog-image.c:339) ==4931== by 0x4E6593D: eog_list_store_append_image_from_file (eog-list-store.c:368) ==4931== by 0x4E65A82: directory_visit (eog-list-store.c:498) ==4931== by 0x4E65A82: eog_list_store_append_directory.constprop.1 (eog-list-store.c:534) ==4931== by 0x4E65F6A: eog_list_store_add_files (eog-list-store.c:602) ==4931== by 0x4E63226: eog_job_model_run (eog-jobs.c:814) ==4931== by 0x4E62495: eog_job_process (eog-job-scheduler.c:153) ==4931== by 0x4E62495: eog_job_scheduler (eog-job-scheduler.c:128) ==4931== by 0x5374AA5: g_thread_proxy (gthread.c:784) ==4931== by 0x64F8593: start_thread (pthread_create.c:463) ==4931== by 0x6809FDE: clone (clone.S:95)
So, I think I found something as well. It seems to be related to a race with cancelling EogJobs. The setup is the fast seeking through the collection where images may get skipped. Now for every image an EogLoadJob is created. It is asynchronously executed by the job queue. Once the job finishes it queues the "finished" signal to called from the main loop. However it seems that if the timing is right and the images load fast enough there could be a race condition where cancelling and running the job race. It looks like the job has already run or is already running (although the former seems to be more common on my maching) but before the "finished" signal is executed by the main loop the job's owner has cancelled it and has disconnected itself from the job's signals. Due to the way the image data is handled in EogImage (having its own ref count) there is no simple way to clear the image data again, effectively increasing the memory usage. Not sure how to fix that yet.
Comment on attachment 372330 [details] [review] eog-list-store: unref a dangling EogImage reference Attachment 372330 [details] pushed as 8032c8a - eog-list-store: unref a dangling EogImage reference
From what you describe it might make sense to not disconnect from the finished callback at all. Once it's called, check if the operation had been cancelled, and in that case, discard the data. I think GAsyncResult is designed in that way.
I also see this happen when you click on the "next image" or "previous image" overlay buttons, or the arrow keys, before the image is fully loaded, resized & displayed. This is in a directory with large images (something like 8000×6000 pixels), so it takes a couple seconds to show the next image, and you can click the button several times during that period. When I monitor memory usage of EOG, I see that when I wait until the next/previous image is shown, memory usage increases to load the new image, and then it goes down again as (I assume) it releases the memory of the previous image. But when I click several times before the image is shown, it seems like it loads all these images (or at least allocates memory for them), but only releases memory for one of them. The same seems to happen when you use the left/right arrow keys on the keyboard by pressing it repeatedly (or by keeping it pushed down). [I just had to kill EOG because it was using nearly 60GB of memory... ugh!]
(In reply to Claudio Saavedra from comment #25) > From what you describe it might make sense to not disconnect from the > finished callback at all. Once it's called, check if the operation had been > cancelled, and in that case, discard the data. I think GAsyncResult is > designed in that way. I don't think that the cancellation is really the issue here. The problem lies in "discard the data". The image data (pixbuf, metadata) has its own ref counter (eog_image_data_ref/_unref). Loading the image data however doesn't earn you a reference. In the normal case the image view will ref the image data and unref it once you switch to the next image. That releases the image data. Looking for eog_image_data_ref calls gives just a few users, so I have the feeling that the image data ref counter is a bit under-used and many users of the image data simply rely on the image's ref counter instead. And to make it a bit slightly more complicated one can choose to load only the metadata now and load the pixel data later (as used by the tooltip in the collection pane). So, I fead a real solution for this might be turn out slightly larger and involve changing a few APIs.
(In reply to Jan Claeys from comment #26) > I also see this happen when you click on the "next image" or "previous > image" overlay buttons, or the arrow keys, before the image is fully loaded, > resized & displayed. This is in a directory with large images (something > like 8000×6000 pixels), so it takes a couple seconds to show the next image, > and you can click the button several times during that period. > > When I monitor memory usage of EOG, I see that when I wait until the > next/previous image is shown, memory usage increases to load the new image, > and then it goes down again as (I assume) it releases the memory of the > previous image. > > But when I click several times before the image is shown, it seems like it > loads all these images (or at least allocates memory for them), but only > releases memory for one of them. The same seems to happen when you use the > left/right arrow keys on the keyboard by pressing it repeatedly (or by > keeping it pushed down). The moment you trigger changing the image the window will disconnect from the already running job. Depending on how fast the cancellation reaches the loading thread it may be that the job cannot abort loading the image anymore (it is setup to do so if it cancelled if the image is still being read). This would as well result in the loaded image data not being releases.
Created attachment 372620 [details] [review] Proof of concept Would it be possible for you to test this patch? It's not a solution, but just to make sure we're on the right track here. This is a proof-of-concept patch that makes the job take a reference on the image data. Note that it may cause a crash if it causes the image data or metadata to release too early or if other users don't hold references themselves.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/eog/-/issues/ Thank you for your understanding and your help.