Bug 720156 – Excessive memory consumption

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 720156 - Excessive memory consumption


Summary:	Excessive memory consumption


Status:	RESOLVED FIXED

Product:	frogr
Classification:	Other
Component:	general
Version:	0.8
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	---
Assigned To:	frogr maintainers
QA Contact:	frogr maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-12-09 23:37 UTC by Felipe Lessa
Modified:	2013-12-21 08:53 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Felipe Lessa 2013-12-09 23:37:21 UTC

It seems that frogr 0.8 is taking a lot of memory.  And, I don't know if that's a coincidence or not, it seems proportional to the amount that is being transferred to flickr.

I'm transfering ~2000 pics, which sum to ~4 GiB (can't tell exact values since frogr updates the values along the way).  I didn't notice anything unusual at first since my machine has 16 GiB, but ~12h after frogr began the upload I saw that it was consuming quite a bit of memory:

$ ps up 23510
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
felipe   23510  2.8 29.3 7810948 4809800 pts/1 SLl+ 09:43  20:10 frogr

That's ~4.5 GiB, which coincidentally looks like the whole data I'm uploading plus ~10% overhead.  At the time I ran ps above there was still ~600 MiB to go.

Comment 1 Felipe Lessa 2013-12-09 23:45:55 UTC

So, about ten minutes later the memory usage still is climbing:

$ ps up 23510
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
felipe   23510  2.8 29.6 7885700 4870068 pts/1 SLl+ 09:43  20:21 frogr

That's perhaps ~5 MiB/min?

It still is possible that it's related to my uploads (perhaps it's keeping a reference to every buffer ever sent?), especially since I remember that some files had to be sent more than once due to connection problems.  OTOH, I wouldn't rule the possibility of another kind of leak.

Comment 2 Felipe Lessa 2013-12-10 00:29:29 UTC

It seems to have topped at almost 5 GiB:

$ ps up 23510
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
felipe   23510  2.7 31.4 8297476 5154984 pts/1 SLl+ 09:43  20:56 frogr

And now it's telling me "Error reading file for upload." while it prints at the terminal:

** (frogr:23510): WARNING **: Unable to get contents for file

Now it really looks like a leak while uploading, maybe a missing free() somewhere.

I've tried to attach gdb to the running process and called malloc_stats():

Arena 0:
system bytes     =  841326592
in use bytes     =  833919648
Arena 1:
system bytes     =    3710976
in use bytes     =     388448
Arena 2:
system bytes     =     368640
in use bytes     =      68640
Arena 3:
system bytes     =     139264
in use bytes     =      15904
Arena 4:
system bytes     =    3948544
in use bytes     =     247120
Arena 5:
system bytes     =     139264
in use bytes     =      10336
Arena 6:
system bytes     =     643072
in use bytes     =      76416
Arena 7:
system bytes     =     716800
in use bytes     =     182208
Arena 8:
system bytes     =     434176
in use bytes     =      55632
Arena 9:
system bytes     =     139264
in use bytes     =      29808
Arena 10:
system bytes     =     294912
in use bytes     =     170784
Arena 11:
system bytes     =     139264
in use bytes     =      20912
Arena 12:
system bytes     =     462848
in use bytes     =      99776
Arena 13:
system bytes     =   33619968
in use bytes     =    2943280
Total (incl. mmap):
system bytes     = 2731618304
in use bytes     = 2683763632
max mmap regions =         10
max mmap bytes   = 1845534720

However, it seems I must have stepped on frogr's toes while doing so because it segfaulted :).  It seems that systemd didn't save the coredump, though, probably because of its sheer size:

$ LC_ALL=C sudo systemd-coredumpctl gdb
TIME                                         PID   UID   GID SIG EXE
             Mon 2013-12-09 22:23:37 BRST  23510  1000   100  11 /usr/bin/frogr
Failed to retrieve COREDUMP field: No such file or directory

Comment 3 Felipe Lessa 2013-12-10 02:49:31 UTC

After a run of 74 images:

$ valgrind frogr
==3897== Memcheck, a memory error detector
==3897== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==3897== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==3897== Command: frogr
==3897== 
==3897== Invalid write of size 4
==3897==    at 0x8674F5B: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8675581: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x861C076: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x868FAF7: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8662833: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8667401: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8667DB6: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8662833: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x862432B: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x861D808: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8616EA4: cairo_fill (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x4F5A781: ??? (in /usr/lib/libgtk-3.so.0.1000.6)
==3897==  Address 0xffeffc5d8 is on thread 1's stack
==3897== 
==3897== Invalid read of size 4
==3897==    at 0x86724CE: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8674243: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8674F8B: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8675581: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x861C076: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x868FAF7: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8662833: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8667401: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8667DB6: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x8662833: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x862432B: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==    by 0x861D808: ??? (in /usr/lib/libcairo.so.2.11200.16)
==3897==  Address 0xffeffc5d8 is on thread 1's stack
==3897== 
==3897== 
==3897== HEAP SUMMARY:
==3897==     in use at exit: 291,590,185 bytes in 118,384 blocks
==3897==   total heap usage: 5,001,589 allocs, 4,883,205 frees, 6,076,381,277 bytes allocated
==3897== 
==3897== LEAK SUMMARY:
==3897==    definitely lost: 69,632 bytes in 264 blocks
==3897==    indirectly lost: 40,643,754 bytes in 2,994 blocks
==3897==      possibly lost: 243,363,785 bytes in 1,412 blocks
==3897==    still reachable: 7,218,190 bytes in 112,033 blocks
==3897==         suppressed: 0 bytes in 0 blocks
==3897== Rerun with --leak-check=full to see details of leaked memory
==3897== 
==3897== For counts of detected and suppressed errors, rerun with: -v
==3897== ERROR SUMMARY: 304 errors from 2 contexts (suppressed: 2 from 2)

Comment 4 Mario Sánchez Prada 2013-12-21 08:53:22 UTC

After investigating this issue quite deeply, I think the main reason for this memory leak was that the SoupBuffer that was being created to form the multipart message was not being freed, causing that memory to be lost forever :/

Fortunately, it should be fixed now:
https://git.gnome.org/browse/frogr/commit/?id=03889efc5aafbc60505e57c159bcd3ef2961ac87

However, that was not the only issue. I also found some ref counting problems with the pictures that was causing that many times those instances of FrogrPicture never reached ref count 0 when being removed from the UI (either manually or as a result of uploading them), which certainly was aanother important problem in terms of memory management.

Again, this should be fixed now too:
https://git.gnome.org/browse/frogr/commit/?id=77709a158f34ec5a2044b7b46a0a600b645fd296

Last, there was another problem with the ref counting of photosets and groups, which were not unreffed when closing the related dialogs. That should be fixed now too as well:
https://git.gnome.org/browse/frogr/commit/?id=e3f5863c3c2823d009b01290136fee22766998fc

So, I'm resolving this bug now because I can not spot that memory problem anymore after testing frogr with ~100 pictures.

Thanks a lot for this bug report and apologies both for the delay fixing this issue and also for the issues themselves. Good news is that the next stable release of frogr (which I hope to make in 2-3 weeks from now) will hobefully be better than ever, at least in terms of memory management :)

Thanks!