GNOME Bugzilla – Bug 348888
evolution won't start and prints "too many open files error"
Last modified: 2013-09-10 13:42:22 UTC
(taken from downstream report: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=198935) sometimes when I start it I get this error: [rstrode@halflap] (~) <01:05 PM> $ evolution CalDAV Eplugin starting up ... evolution-shell-Message: Killing old version of evolution-data-server... (evolution-2.8:4577): camel-WARNING **: camel_exception_get_id called with NULL parameter. (evolution-2.8:4577): Pango-WARNING **: shape engine failure, expect ugly output. the offending font is 'Bitstream Vera Sans Bold 9' (evolution-2.8:4577): Pango-WARNING **: pango_font_get_glyph_extents called with bad font, expect ugly output I/O error : Too many open files I/O warning : failed to load external entity "/usr/share/evolution/2.8/ui/evolution-mail-global.xml" (evolution-2.8:4577): Bonobo-CRITICAL **: bonobo_ui_node_to_string: assertion `node != NULL' failed I/O error : Too many open files I/O warning : failed to load external entity "/usr/share/evolution/2.8/ui/evolution-mail-list.xml" (evolution-2.8:4577): Bonobo-CRITICAL **: bonobo_ui_node_to_string: assertion `node != NULL' failed I/O error : Too many open files I/O warning : failed to load external entity "/usr/share/evolution/2.8/ui/evolution-mail-message.xml" (evolution-2.8:4577): Bonobo-CRITICAL **: bonobo_ui_node_to_string: assertion `node != NULL' failed I/O error : Too many open files I/O warning : failed to load external entity "/usr/lib/evolution/2.8/plugins/org-gnome-mailing-list-actions.xml" (evolution-2.8:4577): Bonobo-CRITICAL **: bonobo_ui_node_to_string: assertion `node != NULL' failed I/O error : Too many open files I/O warning : failed to load external entity "/usr/lib/evolution/2.8/plugins/org-gnome-mail-to-task.xml" (evolution-2.8:4577): Bonobo-CRITICAL **: bonobo_ui_node_to_string: assertion `node != NULL' failed (evolution-2.8:4577): GLib-WARNING **: GError set over the top of a previous GError or uninitialized memory. This indicates a bug in someone's code. You must ensure an error is NULL before it's set. The overwriting error message was: Error opening directory '/local-users/rstrode/.evolution/mail/imap/rstrode@mail.boston.redhat.com/folders/INBOX': Too many open files Note if i try enough times evolution eventually starts up, although sometimes it looks rather broken, like no icons showing up (just big red X's)
Alex and I had a look at this today. We think the problem is I just have too many vfolders. Each vfolder spawns several asynchronomous operations, each creating a pipe (and 2 file descriptors per pipe). I have over a hundred vfolders, so i hit the default hard limit (of 1024) pretty fast. Here's an example stack trace showing one of the hundreds of pipes it creates:
+ Trace 69673
I guess you just need to up your max fd count
Yea, that's how I'm working around the problem right now. Ultimately, of course, evolution should cap how many simulataneous asynchronous operations it starts and queue the rest for later.
I'm getting a similiar error and I have only one vfolder -- the default "Unmatched" one. The messages are: === warnings === (evolution-2.6:24636): camel-local-provider-WARNING **: Could not open/create index file: Too many open files: indexing not performed (evolution-2.6:24636): camel-WARNING **: Could not save object state file to '/home/edhill/.evolution/mail/local/MAIL_folders.sbd/F--G--H--I.sbd/.#ieee_cs.cmeta': Too many open files === warnings === and it results in complete crashes every time evolution is started. I've been using evolution as my only email reader since the late-90s and if I can't figure out how to fix this problem I will probably have to switch to a different MUA. Does anyone know of any fixes, work-arounds, etc.? The only thing I've been able to do is to move all of my ~/.evolution folder away but then of course I can no longer access all my old emails -- which is highly inconvenient. I'll gladly install the -debug packages and provide back-traces if it helps...
I installed the evolution-debuginfo package and here is the back-trace: Program received signal SIGABRT, Aborted.
+ Trace 71040
Thread NaN (LWP 27113)
As fejj pointed out in comment 2, you can work around the problem by upping your max open fd count. You can do this by modifying /etc/security/limits.conf or by doing something like su -c "ulimit -n 1000; su $LOGNAME -c 'evolution'" (you'll have to enter a root password or put sudo in front of it if you go that route though)
i meant to have 10000 in that line. The default is ~1000 already.
Hi Ray, thank you for the prompt reply and the fd limit workaround but even at 10000 evolution is still crashing on startup. Is it possible to have evolution crash from (somewhat?) corrupted mail dir data or other local ~/.evolution/* files?
(In reply to comment #8) > Hi Ray, thank you for the prompt reply and the fd limit workaround but > even at 10000 evolution is still crashing on startup. Is it possible > to have evolution crash from (somewhat?) corrupted mail dir data or > other local ~/.evolution/* files? > Do you have debuginfo packages installed? Can you attach stack traces for your crash? TIA
(In reply to comment #4) > I'm getting a similiar error and I have only one vfolder -- the default > "Unmatched" one. The messages are: > > === warnings === > (evolution-2.6:24636): camel-local-provider-WARNING **: Could not open/create > index file: Too many open files: indexing not performed > [snip..] > Does anyone know of any fixes, work-arounds, etc.? The only thing I've > been able to do is to move all of my ~/.evolution folder away but then > of course I can no longer access all my old emails -- which is highly > inconvenient. > > I'll gladly install the -debug packages and provide back-traces if it > helps... > Ray, Ed: can you try the fix attached to bug#350907 and update the bug? TIA.
Hi, The patch in bug 350907 seems to make evolution start up consisently without crashing. Thanks! *** This bug has been marked as a duplicate of 350907 ***
I think I was a bit premature in my testing. Dave Malcolm reports still seeing a problem even with the patched e-d-s.
(In reply to comment #12) > I think I was a bit premature in my testing. > > Dave Malcolm reports still seeing a problem even with the patched e-d-s. > Ray, try the latest patched attached to bug#350907. Though, that patch fixes couple of fd-leaks, there is another set of fds being leaked in the same code, that I am currently digging into.
I believe I have found the root cause of this problem. The EMsgPort data structure in evolution-data-server is an asynchronous queue for passing messages between threads. But this data structure also defines two pipes per instance (or only one if NSS is disabled). These pipes have two use cases: 1) To register a GSource that polls the "read" end of the pipe and calls a function when the pipe indicates that a new message has arrived on the asynchronous queue. (It seems like this could just as easily be done with a thread and a simple wait-for-message-and-dispatch loop.) This use case is seen in several places in evolution. 2) Cancelling synchronous I/O operations. The "read" end of the pipe is added to the file descriptor set when calling poll() or select(). This way when a message arrives on an asynchronous queue, a byte is written to the EMsgPort pipe. This interrupts the blocking I/O operation, and allows the pipe's descriptor to be tested for a cancellation request. It's kind of a poor-man's asynchronous I/O. This use case is a tricky one, and is seen in several places in Camel. Each EMsgPort instance is consuming as many as four file descriptors. Obviously this approach does not scale well, as this bug report shows. But the second use case makes it difficult to just simply stop using pipes. So the question becomes, how many of these EMsgPort pipes are really needed? I compiled some simple statistical analysis into the EMsgPort code and ran a typical Evolution session for me (two IMAP accounts and about 100 search folders). During my session, 1592 EMsgPort instances were created. Of the pipes created for each of those instances, less than 3% were actually being used for one of the use cases described above. (Also, 1585 of those instances were properly destroyed, so the file descriptor leaks are not as severe as some may have suspected.) Since we can't easily get away from using pipes for intra-process communication, I think the solution is to create them on-demand rather than up-front. If we wait to create the pipes until the first time e_msgport_fd() or e_msgport_prfd() is called for a particular EMsgPort instance, rather than creating them in e_msgport_new(), I think that will dramatically reduce the number of file descriptors than evolution consumes. I'll cook up a patch for this.
Reopening, because I think this is a different issue than bug #350907.
Created attachment 72602 [details] [review] Proposed patch Keeps file descriptor usage under control by only creating pipes when they are actually needed, rather than for every EMsgPort instance.
(In reply to comment #14) > I believe I have found the root cause of this problem. > > The EMsgPort data structure in evolution-data-server is an asynchronous queue > for passing messages between threads. But this data structure also defines two > pipes per instance (or only one if NSS is disabled). These pipes have two use > cases: Nice observation, Mathew. Can you attach the code that you used to generate this analysis? TIA. :-) The current implementation of not-creating-pipes-on-demand was a fix to one of the novell bugs. Have a look at http://cvs.gnome.org/viewcvs/evolution-data-server/libedataserver/e-msgport.c?r1=1.6&r2=1.7 commit log. CC'ing fejj as he can shed some light on us w.r.t this bug/fix.
matthew: that's how the original code worked, but it caused hangs. the pipes NEED to exist from start to finish or problems arise.
https://bugzilla.novell.com/show_bug.cgi?id=176277
Then perhaps we should add a e_msgport_new_with_pipes(). According to my measurements, pipes are only needed for a small percentage of the total EMsgPorts created during a typical Evolution session. And because they're now being created for ALL EMsgPorts, we're effectively using pipes instead of condition variables to synchronize message passing. Pipes are of course much slower, which makes Evolution feel more sluggish overall. All I know is that with my current mail and search folder configuration, which is fairly modest, the current EDS code tries to open over 6000 file descriptors on startup. Obviously that's not acceptable. (BTW, I don't seem to have access to Novell's Bugzilla.)
perhaps that would be a workable solution... need to first find out if it's possible to know ahead of time what EMsgPorts are going to need them and which aren't. another solution might be to change all code to use PRFileDesc rather than a mix of both fds - I believe then you'd be able to create them on-the-fly without problems arising such as the one described in the novell bugzilla.
actually, come to think of it I think I'm wrong on my last post. even reducing it to 1 type I think might still cause problems as there was a race between msgs being queued and pulled.
since I can't figure out how to make that novell bug public, I'll post the info: Comment #7 From Michael Meeks 2006-05-26 08:48:42 MST [reply] Having said that - I just got another camel deadlock like this:
+ Trace 72350
So - 2 deadlocks in 2 weeks ... Comment #8 From Jeff Stedfast 2006-05-30 14:13:00 MST [reply] threads 1 & 6 are waiting on mail_msg_lock which thread 3 seems to be holding. thread 3 is probably hanging in a read() call? *shrug* Comment #9 From Jeff Stedfast 2006-05-31 13:52:19 MST [reply] Created an attachment (id=86349) [edit] bnc-176277-2.patch creates both pipes in e_msgport_new() so there's no race between EMsgs being queued and pulled.
(In reply to comment #21) > perhaps that would be a workable solution... need to first find out if it's > possible to know ahead of time what EMsgPorts are going to need them and which > aren't. Right, that bit me pretty quick after I started down that path. Let me run another idea by you, Jeff: I think the race condition is that there may already be messages in the queue when the pipe is created, so the pipe and the queue are thrown out-of-sync. If that's true, then it would hang on the next e_msgport_wait() because the logic uses the *current* state of the pipe to decide whether to read from it. Instead, what if we tag the message itself to indicate the state of the pipe when the message was put on the queue? The tag can be applied to the message in e_msgport_put(). e_msgport_wait() will have to be restructured to *always* wait on the condition variable first in order to obtain the message from the queue, and then wait on the appropriate pipe if the message is tagged. This should solve the race condition, since messages already on the queue when the pipe was created will not be tagged.
I considered that when I first implemented a fix for that race... I can't remember why I didn't go with that. mighta been "too hard" or something :) oh, wait... it was probably ABI breakage that I was trying to avoid. anyways, sure, I think that solution would be ok with me (other than ABI breakage, that's something the gnome/evo team has to be ok with - if they're ok with it, I'm ok with it)
*** Bug 346443 has been marked as a duplicate of this bug. ***
Created attachment 72866 [details] [review] Proposed patch This patch restores on-demand pipe creation to keep file descriptor usage under control, but without the race condition discussed above. Unfortunately it does change libedataserver's ABI. I don't see a way around that. I was a little more ambitious this time in that I wound up rewriting most of the EMsgPort implementation using a GAsyncQueue. I've been running Evo 2.8 with the patch applied for a few hours now and it seems stable. I think the logic is easier to understand too.
Created attachment 73106 [details] [review] Proposed patch (revised) There was a small typo in my previous patch.
Created attachment 73155 [details] [review] Proposed patch (revised, again) I missed a couple places in e_thread_put() where it peeks at the length of the EMsgPort queue using e_dlist_length(). Now it uses g_async_queue_length(). I also added warnings when pipe I/O fails.
Attachment #74144 [details] of bug #359979 shows the full EMsgPort implementation that I'm proposing here. It's easier to read than a patch, since the changes to the existing implementation are extensive.
Patch reviewed and committed to HEAD http://cvs.gnome.org/viewcvs/evolution-data-server/ChangeLog?r1=1.424&r2=1.425
Excellent, thanks!
Sanka, Could you also committed it to branch 2.16?
Jeff: No. It cannot be committed to the stable branch. Breaks ABI
Any reason, why this bug is kept open? Can it be closed?
No, please close it. I don't have permissions to do it myself.
closing as per last comment.
*** Bug 356913 has been marked as a duplicate of this bug. ***