GNOME Bugzilla – Bug 577002
Camel leaks descriptors for .ev-journal files
Last modified: 2009-04-07 12:15:00 UTC
Please describe the problem: There appears to be a file descriptor leak which causes Evolution to fail between 1 and 2 days. Evolution has over 1300 open files when it fails. At this point it is impossible to exit Evolution (it has to be killed manually). Depending on the last action one may get an error message, e.g. Error saving to autosave because "Too many open files". I have one imap account, one local mail spool and a news (nntp) account. This is with Evolution 2.24.3 and Gnome 2.24.1 on Ubuntu Intrepid Ibex 8.10. Many of the open files appear to be on "*.ev-journal" files. These seem to account for roughly half the number of open files. I realize this is limited information, but I can reproduce the problem at will. Just tell me how to debug this and I will add that information. Steps to reproduce: 1. 2. 3. Actual results: Expected results: Does this happen every time? Yes, between 1 to 2 days probably depending on the amount of mail I receive. Other information: 499 pilux:~$ ps -ef | grep evol han 7161 1 0 10:17 ? 00:00:00 /opt/evo/libexec/evolution/2.24/evolution-exchange-storage --oaf-activate-iid=OAFIID:GNOME_Evolution_Exchange_Connector_CalFactory:1.2 --oaf-ior-fd=38 han 7217 1 0 10:17 ? 00:00:03 /opt/evo/libexec/evolution-data-server-2.24 --oaf-activate-iid=OAFIID:GNOME_Evolution_DataServer_CalFactory:1.2 --oaf-ior-fd=39 han 8433 1 0 10:20 ? 00:01:24 /usr/bin/evolution --offline han 8478 1 0 10:20 ? 00:00:00 /opt/evo/libexec/evolution/2.24/evolution-alarm-notify --oaf-activate-iid=OAFIID:GNOME_Evolution_Calendar_AlarmNotify_Factory:2.24 --oaf-ior-fd=35 han 31351 7408 0 16:29 pts/4 00:00:00 grep evol 500 pilux:~$ lsof -p 8433 | wc -l 438 501 pilux:~$ lsof -p 8433 | wc -l 480 502 pilux:~$ lsof -p 8433 | wc -l 515 503 pilux:~$ date Wed Mar 25 20:06:04 CET 2009 504 pilux:~$ date Wed Mar 25 20:33:25 CET 2009 505 pilux:~$ lsof -p 8433 | wc -l 515 506 pilux:~$ lsof -p 8433 | wc -l 515 507 pilux:~$ lsof -p 8433 | wc -l 531 508 pilux:~$ lsof -p 8433 | wc -l 531 509 pilux:~$ lsof -p 8433 | wc -l 531 510 pilux:~$ lsof -p 8433 | wc -l 531 511 pilux:~$ lsof -p 8433 | wc -l 568 512 pilux:~$ lsof -p 8433 | wc -l 568 513 pilux:~$ lsof -p 8433 | wc -l 566 514 pilux:~$ 514 pilux:~$ lsof -p 8433 | wc -l 566 515 pilux:~$ lsof -p 8433 | wc -l 643 516 pilux:~$ lsof -p 8433 | wc -l 700 517 pilux:~$ lsof -p 8433 | wc -l 735 518 pilux:~$ lsof -p 8433 | wc -l 769 519 pilux:~$ lsof -p 8433 | wc -l 787 520 pilux:~$ lsof -p 8433 | wc -l 807 521 pilux:~$ lsof -p 8433 | wc -l 877 522 pilux:~$ lsof -p 8433 | wc -l 881 523 pilux:~$ lsof -p 8433 | wc -l 906 524 pilux:~$ lsof -p 8433 | wc -l 903 525 pilux:~$ lsof -p 8433 | wc -l 905 526 pilux:~$ lsof -p 8433 | wc -l 924 527 pilux:~$ lsof -p 8433 | wc -l 942 528 pilux:~$ lsof -p 8433 | wc -l 940 529 pilux:~$ lsof -p 8433 | grep ev-journal | wc -l 594 530 pilux:~$ lsof -p 8433 | wc -l 991 531 pilux:~$ lsof -p 8433 | grep ev-journal | wc -l 645 532 pilux:~$ lsof -p 8433 | wc -l 1030 533 pilux:~$ lsof -p 8433 | grep ev-journal | wc -l 682 534 pilux:~$ lsof -p 8433 | wc -l 1045 535 pilux:~$ lsof -p 8433 | wc -l 1046
Thanks for the bug report. There was committed this change two weeks ago: http://svn.gnome.org/viewvc/evolution-data-server?view=revision&revision=10162 though it was about leaking threads, whereas yours is about ev-journal files. It seems to me this is related to your nntp account(s). Which files and after which operation are they increasing, I do not see this when trying to reproduce on 2.26.0.
I do not know exactly which account causes this. I've suspected the NNTP account as well. I'll try running with that account disabled for a few days to see what happens. I think it happens on incoming mail, but I haven't been able to determine which account or event a direct relationship with the number of mails received.
Sounds like this is a Camel issue. Changing component and clarifying bug summary.
(In reply to comment #2) NNTP accounts in Evolution are known to consume large quantities of RAM, so it's certainly possible it could be leaking file descriptors as well. If you could isolate the account type that's leaking that would be very helpful.
As the first step, what about trying to have accounts enabled as usually, and when you'll see more than say 20 ev-journal files opened calling your command without the count command? It'll show path to those files and from that is recognizable which account uses it. (I'm not sure whether it's understandable, I meant just running $ "lsof -p 8433 | grep ev-journal" without that "| wc -l" at the end.) Thanks in advance. Note: I see the disco->diary is not already unreffed in nntp, but that only when closing the store, which is not supposed to happen often. I think.
Created attachment 131521 [details] test patch for evolution-data-server; hmm, one option can be when you go offline. Do you go offline often/intentionally/unintentionally/at all?
> hmm, one option can be when you go offline. Do you go offline > often/intentionally/unintentionally/at all? > I might be (unintentionally). The NNTP server is behind a firewall and I connect to it using a stunnel through a proxy server. The reason why I didn't display which files were open is because it showed the name of the NNTP server. I'll add the requested information later.
(In reply to comment #7) > The reason why I didn't display which files were open is because it showed the > name of the NNTP server. I'll add the requested information later. That's fine, I didn't want the exact files if you can see there from which account it is. And if it's only from NNTP, then fine, we know where to look.
(In reply to comment #8) > That's fine, I didn't want the exact files if you can see there from which > account it is. And if it's only from NNTP, then fine, we know where to look. > I ran with the NNTP account disabled since Friday afternoon (3 days now). The number of file descriptors has not increased during that period. Therefor I believe that it is save to assume that problem is limited to the NNTP account. The leaked file descriptors all looked like: /home/han/.evolution/mail/nntp/@news.xyz.com/.ev-journal Thanks for your help.
I will build a test package for Ubuntu 8.10.
I have it queued for build: evolution (2.24.3-0ubuntu1ppa1) intrepid; urgency=low * debian/patches/50_evo-fd-leak-on-nntp.patch test upstream fix for http://bugzilla.gnome.org/show_bug.cgi?id=577002 -- C de-Avillez < hggdh2@gmail.com> Mon, 30 Mar 2009 13:19:14 -0500 It should be available in a few hours (currently expected to begin build in 2 hours). It can be installed, for Ubuntu 8.10 users, from https://launchpad.net/~hggdh2/+archive/ppa.
Han, were you able to test the above package, please?
an update: the test package is ready for amd64 and lpia, but the build process is (currently) stuck on the i386. I have asked for admin intervention, and I am waiting on it to (perhaps) resubmit the i386 build.
(In reply to comment #12) > Han, were you able to test the above package, please? > No, not yet due to the issue with the build. I'm in contact with hggdh offline about that. I will test the build as soon as it becomes available.
All: the Ubuntu builds have successfully completed, so please feel free to test.
I don't think the patch is working. When I started Evolution and enabled the NNTP account, there was one file descriptor, i.e.: lsof -p 29083 | grep journal evolution 29083 han 66u REG 8,2 0 1710279 /home/han/.evolution/mail/nntp/@news.xyz.com/.ev-journal Now, a few hours later, there are 37 file descriptors in use for this: lsof -p 29083 | grep ev-journal | wc -l 37 This is while running: 2.24.3-0ubuntu1ppa1
strange, that works fine for me. Before the patch, intentionally going offline and online will increase file descriptors by one. After the patch, it works as expected, only one file descriptor per nntp account. (In reply to comment #14) > No, not yet due to the issue with the build. I'm in contact with hggdh offline > about that. I will test the build as soon as it becomes available. Oops, I didn't know, I'm sorry.
Sigh. I messed up when preparing the test package... @Han: a new version of the package is being build now, with version 2.24.3-0ubuntuppa2. Sorry.
(In reply to comment #18) > Sigh. I messed up when preparing the test package... > > @Han: a new version of the package is being build now, with version > 2.24.3-0ubuntuppa2. Sorry. > I installed the new package, but it doesn't seem to work. $ lsof -p 23369 | grep ev-journal | wc -l 126 I think I do have the correct packages installed (I checked with hggdh and ppa1 is the correct one after all): $ dpkg -l | grep ppa1 | awk '{ print $2 "\t\t" $3}' evolution 2.24.3-0ubuntu1ppa1 evolution-common 2.24.3-0ubuntu1ppa1 evolution-data-server 2.24.3-0ubuntu1ppa1 evolution-data-server-common 2.24.3-0ubuntu1ppa1 evolution-dev 2.24.3-0ubuntu1ppa1 evolution-plugins 2.24.3-0ubuntu1ppa1 libcamel1.2-14 2.24.3-0ubuntu1ppa1 libcamel1.2-dev 2.24.3-0ubuntu1ppa1 libebackend1.2-0 2.24.3-0ubuntu1ppa1 libebook1.2-9 2.24.3-0ubuntu1ppa1 libecal1.2-7 2.24.3-0ubuntu1ppa1 libedata-book1.2-2 2.24.3-0ubuntu1ppa1 libedata-cal1.2-6 2.24.3-0ubuntu1ppa1 libedataserver1.2-11 2.24.3-0ubuntu1ppa1 libedataserver1.2-dev 2.24.3-0ubuntu1ppa1 libedataserverui1.2-8 2.24.3-0ubuntu1ppa1 libegroupwise1.2-13 2.24.3-0ubuntu1ppa1 libexchange-storage1.2-3 2.24.3-0ubuntu1ppa1 libgdata-google1.2-1 2.24.3-0ubuntu1ppa1 libgdata1.2-1 2.24.3-0ubuntu1ppa1 And assuming that the fix has gone into the camel library: $ ldd /usr/bin/evolution|grep camel libcamel-provider-1.2.so.14 => /usr/lib/libcamel-provider-1.2.so.14 (0xb7e51000) libcamel-1.2.so.14 => /usr/lib/libcamel-1.2.so.14 (0xb7bcb000) $ ls -lL /usr/lib/libcamel-1.2.so.14 -rw-r--r-- 1 root root 422928 2009-04-02 16:12 /usr/lib/libcamel-1.2.so.14 $ ls -lL /usr/lib/libcamel-provider-1.2.so.14 -rw-r--r-- 1 root root 377248 2009-04-02 16:12 /usr/lib/libcamel-provider-1.2.so.14 So what would be the next step in debugging this? I'm willing to try instrumentation or whatever suggestions you may have.
This exact patch touched $PREFIX/lib/evolution-data-server-1.2/camel-providers/libcamelnntp.* it should be enough to just close and open evolution again, though you can try to close all evolution related processes (by evolution --force-shutdown) and start it again. Other possibility, I can think of, is to have installed also packages which allow you to list source files and then try to put a breakpoint on the line where the change landed and look whether your actual package really has the change there. (I've no idea what it'll do when you would have different source package and binary package, maybe some warning would be shown.) As I expect that all those files are opened on some unexpected disconnection, then I would suggest running evolution on console as this: $ gdb --ex "b nntp_connect_offline" --ex "b nntp_store_finalize" --ex r evolution It'll ask you twice whether you want to put breakpoint as pending, both answers should be 'y'. After that it'll stop everytime those two functions will be called (which should be for you quite often). Nonetheless, this is easier to catch on some live session, than through the bugzilla, so if you can, try to catch me on evolution's IRC channel.
Created attachment 132025 [details] test patch ][ for evolution-data-server; This will stop leaking file descriptors even for you, as we found where it was done finally, but, more interestingly, it'll show some information on evolution's console what it is doing, which is information I'm really interested in. With some luck it'll show what it does for you, as we were unable to track online the exact cause. Instead of leaking the file descriptor, it'll show on console a line containing "had diary already". I would like to ask you paste here everything above this line (and this line as well). Feel free to strip from there any information you might consider private. Thanks in advance.
A new build -- ppa2 -- is currently in process for Ubuntu.
I've only run ppa2 for a couple of minutes, but it does seem to work. There is only one open file descriptor now for the ev-journal file on the NNTP account. This is the debug information: evolution-shell-Message: Killing old version of evolution-data-server... ** (evolution:16500): DEBUG: mailto URL command: evolution %s ** (evolution:16500): DEBUG: mailto URL program: evolution 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:264: created new diary 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 (evolution:16500): evolution-mail-WARNING **: Failed to refresh folders: no such table: xyz.coe.ux 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 (evolution:16500): evolution-mail-WARNING **: Failed to refresh folders: no such table: xyz.databases 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 (evolution:16500): evolution-mail-WARNING **: Failed to refresh folders: no such table: xyz.emacs 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 (evolution:16500): evolution-mail-WARNING **: Failed to refresh folders: no such table: xyz.emc 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 (evolution:16500): evolution-mail-WARNING **: Failed to refresh folders: no such table: xyz.infocenters 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x9b9d700 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x999ee10 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x999ee50 0x8ffc800 connect_to_server:258: had diary already 0x9bab850 0x8ffc800 nntp_disconnect_online:388 (evolution:16500): evolution-mail-WARNING **: Failed to refresh folders: no such table: xyz.unix.admin 0x8ffc800 connect_to_server:207: old stream: (nil) new stream:0x999ee90 0x8ffc800 connect_to_server:258: had diary already 0x9bab850
Han, could you start evolution in gdb like this, please: $ gdb evolution --ex "b camel-nntp-store.c:388" --ex r and paste here "t a a bt" command result when it stops for the first and the second time, please? I would like to know why it disconnects so often for you, because it doesn't do that for me. Thanks in advance.
As requested: han@pilux:~$ gdb evolution --ex "b camel-nntp-store.c:388" --ex r GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"... No source file named camel-nntp-store.c. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (camel-nntp-store.c:388) pending. Starting program: /usr/bin/evolution [Thread debugging using libthread_db enabled] [New Thread 0xb62de940 (LWP 18911)] evolution-shell-Message: Killing old version of evolution-data-server... ** (evolution:18911): DEBUG: mailto URL command: evolution %s ** (evolution:18911): DEBUG: mailto URL program: evolution [New Thread 0xb48d7b90 (LWP 18954)] [Thread 0xb48d7b90 (LWP 18954) exited] [New Thread 0xb40d6b90 (LWP 18955)] [Thread 0xb40d6b90 (LWP 18955) exited] [New Thread 0xb40d6b90 (LWP 18956)] [New Thread 0xb48d7b90 (LWP 18957)] [Thread 0xb40d6b90 (LWP 18956) exited] [Thread 0xb48d7b90 (LWP 18957) exited] [New Thread 0xb48d7b90 (LWP 18962)] [New Thread 0xb40d6b90 (LWP 18963)] [New Thread 0xb36d4b90 (LWP 18964)] [New Thread 0xb2ed3b90 (LWP 18965)] [New Thread 0xb26d2b90 (LWP 18966)] [Thread 0xb2ed3b90 (LWP 18965) exited] [Thread 0xb36d4b90 (LWP 18964) exited] [Thread 0xb48d7b90 (LWP 18962) exited] [Thread 0xb40d6b90 (LWP 18963) exited] [Thread 0xb26d2b90 (LWP 18966) exited] [New Thread 0xb26d2b90 (LWP 18987)] [New Thread 0xb40d6b90 (LWP 18988)] [New Thread 0xb48d7b90 (LWP 18992)] [New Thread 0xb36d4b90 (LWP 18993)] [Thread 0xb40d6b90 (LWP 18988) exited] [Thread 0xb26d2b90 (LWP 18987) exited] [Thread 0xb36d4b90 (LWP 18993) exited] [New Thread 0xb36d4b90 (LWP 18997)] [Thread 0xb48d7b90 (LWP 18992) exited] [New Thread 0xb48d7b90 (LWP 18998)] [Thread 0xb36d4b90 (LWP 18997) exited] [New Thread 0xb36d4b90 (LWP 18999)] [New Thread 0xb26d2b90 (LWP 19000)] [New Thread 0xb2e58b90 (LWP 19001)] [Thread 0xb36d4b90 (LWP 18999) exited] [Thread 0xb48d7b90 (LWP 18998) exited] [New Thread 0xb48d7b90 (LWP 19040)] [New Thread 0xb36d4b90 (LWP 19041)] [New Thread 0xb40d6b90 (LWP 19042)] [New Thread 0xb18ffb90 (LWP 19043)] [New Thread 0xb10feb90 (LWP 19044)] [New Thread 0xb08fdb90 (LWP 19045)] [Thread 0xb08fdb90 (LWP 19045) exited] [New Thread 0xb00fcb90 (LWP 19046)] [Thread 0xb00fcb90 (LWP 19046) exited] 0x8baf800 connect_to_server:207: old stream: (nil) new stream:0x95b0710 [Thread 0xb40d6b90 (LWP 19042) exited] 0x8baf800 connect_to_server:264: created new diary 0x959c7b8 0x8baf800 nntp_disconnect_online:388 [Switching to Thread 0xb10feb90 (LWP 19044)] Breakpoint 1, nntp_disconnect_online (service=0x8baf800, clean=0, ex=0x0) at camel-nntp-store.c:389 389 camel-nntp-store.c: No such file or directory. in camel-nntp-store.c (gdb) t a a bt
+ Trace 214227
Thread 1 (Thread 0xb62de940 (LWP 18911))
(gdb) c Continuing. [New Thread 0xb40d6b90 (LWP 19102)] [Thread 0xb40d6b90 (LWP 19102) exited] 0x8baf800 connect_to_server:207: old stream: (nil) new stream:0x95b0710 0x8baf800 connect_to_server:258: had diary already 0x959c7b8 0x8baf800 nntp_disconnect_online:388
OK, both are similar
+ Trace 214229
Created attachment 132205 [details] [review] proposed eds patch for evolution-data-server; Thank you both for all the investigation. I guess it's fine to use the same ".ev-journal" file even after disconnect, thus the final patch looks like this. If you can make the final test and report back it'll be great.
EDS-PPA3 should be available in a few with the proposed fix.
On a review, also it looks fine to me. Once fine after the test, please commit.
(In reply to comment #29) > On a review, also it looks fine to me. Once fine after the test, please commit. > I've worked with PPA3 all morning and there is only one open file descriptor for ev-journal. Thanks all for getting this fixed so quickly.
Committed to trunk. Committed revision 10200. Thanks again for all the testing.