Bug 348888 – evolution won't start and prints "too many open files error"

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 348888 - evolution won't start and prints "too many open files error"


Summary:	evolution won't start and prints "too many open files error"


Status:	RESOLVED FIXED

Product:	evolution-data-server
Classification:	Platform
Component:	Mailer
Version:	1.8.x (obsolete)
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	evolution-mail-maintainers
QA Contact:	Evolution QA team

URL:
Whiteboard:

Duplicates:	346443 356913 (view as bug list)
Depends on:
Blocks:	359979

Reported:	2006-07-27 03:06 UTC by Ray Strode [halfline]
Modified:	2013-09-10 13:42 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Proposed patch (1.92 KB, patch) 2006-09-12 13:54 UTC, Matthew Barnes	none	Details \| Review
Proposed patch (10.26 KB, patch) 2006-09-15 17:16 UTC, Matthew Barnes	none	Details \| Review
Proposed patch (revised) (10.26 KB, patch) 2006-09-20 19:50 UTC, Matthew Barnes	none	Details \| Review
Proposed patch (revised, again) (11.49 KB, patch) 2006-09-21 17:39 UTC, Matthew Barnes	committed	Details \| Review

Description Ray Strode [halfline] 2006-07-27 03:06:09 UTC

(taken from downstream report: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=198935)

sometimes when I start it I get this error:

[rstrode@halflap] (~) <01:05 PM>
$ evolution
CalDAV Eplugin starting up ...
evolution-shell-Message: Killing old version of evolution-data-server...

(evolution-2.8:4577): camel-WARNING **: camel_exception_get_id called with NULL
parameter.

(evolution-2.8:4577): Pango-WARNING **: shape engine failure, expect ugly
output. the offending font is 'Bitstream Vera Sans Bold 9'

(evolution-2.8:4577): Pango-WARNING **: pango_font_get_glyph_extents called with
bad font, expect ugly output
I/O error : Too many open files
I/O warning : failed to load external entity
"/usr/share/evolution/2.8/ui/evolution-mail-global.xml"

(evolution-2.8:4577): Bonobo-CRITICAL **: bonobo_ui_node_to_string: assertion
`node != NULL' failed
I/O error : Too many open files
I/O warning : failed to load external entity
"/usr/share/evolution/2.8/ui/evolution-mail-list.xml"

(evolution-2.8:4577): Bonobo-CRITICAL **: bonobo_ui_node_to_string: assertion
`node != NULL' failed
I/O error : Too many open files
I/O warning : failed to load external entity
"/usr/share/evolution/2.8/ui/evolution-mail-message.xml"

(evolution-2.8:4577): Bonobo-CRITICAL **: bonobo_ui_node_to_string: assertion
`node != NULL' failed
I/O error : Too many open files
I/O warning : failed to load external entity
"/usr/lib/evolution/2.8/plugins/org-gnome-mailing-list-actions.xml"

(evolution-2.8:4577): Bonobo-CRITICAL **: bonobo_ui_node_to_string: assertion
`node != NULL' failed
I/O error : Too many open files
I/O warning : failed to load external entity
"/usr/lib/evolution/2.8/plugins/org-gnome-mail-to-task.xml"

(evolution-2.8:4577): Bonobo-CRITICAL **: bonobo_ui_node_to_string: assertion
`node != NULL' failed

(evolution-2.8:4577): GLib-WARNING **: GError set over the top of a previous
GError or uninitialized memory.
This indicates a bug in someone's code. You must ensure an error is NULL before
it's set.
The overwriting error message was: Error opening directory
'/local-users/rstrode/.evolution/mail/imap/rstrode@mail.boston.redhat.com/folders/INBOX':
Too many open files

Note if i try enough times evolution eventually starts up, although sometimes
it looks rather broken, like no icons showing up (just big red X's)

Comment 1 Ray Strode [halfline] 2006-07-27 03:07:29 UTC

Alex and I had a look at this today.  We think the problem is I just have too
many vfolders. Each vfolder spawns several asynchronomous operations, each
creating a pipe (and 2 file descriptors per pipe).  I have over a hundred
vfolders, so i hit the default hard limit (of 1024) pretty fast.



Here's an example stack trace showing one of the hundreds of pipes it creates:

+ Trace 69673

#0 pipe
from /lib/libc.so.6
#1 PR_CreatePipe
from /usr/lib/libnspr4.so
#2 e_msgport_new
at e-msgport.c line 521
#3 camel_operation_new
at camel-operation.c line 136
#4 mail_msg_new
from /usr/lib/evolution/2.8/components/libevolution-mail.so
#5 mail_async_event_emit
from /usr/lib/evolution/2.8/components/libevolution-mail.so
#6 em_folder_tree_model_get_expanded
from /usr/lib/evolution/2.8/components/libevolution-mail.so
#7 camel_object_trigger_event
at camel-object.c line 1504
#8 change_folder
at camel-vee-store.c line 169
#9 vee_get_folder
at camel-vee-store.c line 202
#10 camel_store_get_folder
at camel-store.c line 260
#11 vfolder_revert
from /usr/lib/evolution/2.8/components/libevolution-mail.so
#12 vfolder_load_storage
from /usr/lib/evolution/2.8/components/libevolution-mail.so
#13 mail_component_load_store_by_uri
from /usr/lib/evolution/2.8/components/libevolution-mail.so
#14 _ORBIT_skel_small_GNOME_Evolution_Component_createView
from /usr/lib/evolution/2.8/libeshell.so.0
#15 ORBit_c_stub_invoke
from /usr/lib/libORBit-2.so.0
#16 GNOME_Evolution_Component_createView
from /usr/lib/evolution/2.8/libeshell.so.0
#17 POA_GNOME_Evolution_DataServer_InterfaceCheck__fini
#18 g_cclosure_marshal_VOID
from /lib/libgobject-2.0.so.0
#19 g_closure_invoke
from /lib/libgobject-2.0.so.0
#20 g_signal_override_class_closure
from /lib/libgobject-2.0.so.0
#21 g_signal_emit_valist
from /lib/libgobject-2.0.so.0
#22 g_signal_emit
from /lib/libgobject-2.0.so.0
#23 POA_GNOME_Evolution_DataServer_InterfaceCheck__fini
#24 POA_GNOME_Evolution_DataServer_InterfaceCheck__fini
#25 POA_GNOME_Evolution_DataServer_InterfaceCheck__fini
#26 POA_GNOME_Evolution_DataServer_InterfaceCheck__fini
#27 POA_GNOME_Evolution_DataServer_InterfaceCheck__fini
#28 g_source_is_destroyed
from /lib/libglib-2.0.so.0
#29 g_main_context_dispatch
from /lib/libglib-2.0.so.0
#30 g_main_context_check
from /lib/libglib-2.0.so.0
#31 g_main_loop_run
from /lib/libglib-2.0.so.0
#32 bonobo_main
from /usr/lib/libbonobo-2.so.0
#33 POA_GNOME_Evolution_DataServer_InterfaceCheck__fini
#34 __libc_start_main
from /lib/libc.so.6
#35 ??

Comment 2 Jeffrey Stedfast 2006-07-27 13:54:40 UTC

I guess you just need to up your max fd count

Comment 3 Ray Strode [halfline] 2006-07-27 14:05:57 UTC

Yea, that's how I'm working around the problem right now.

Ultimately, of course, evolution should cap how many simulataneous asynchronous operations it starts and queue the rest for later.

Comment 4 ed 2006-08-26 15:10:25 UTC

I'm getting a similiar error and I have only one vfolder -- the default "Unmatched" one.  The messages are:

=== warnings ===
(evolution-2.6:24636): camel-local-provider-WARNING **: Could not open/create index file: Too many open files: indexing not performed

(evolution-2.6:24636): camel-WARNING **: Could not save object state file to '/home/edhill/.evolution/mail/local/MAIL_folders.sbd/F--G--H--I.sbd/.#ieee_cs.cmeta': Too many open files
=== warnings ===

and it results in complete crashes every time evolution is started.

I've been using evolution as my only email reader since the late-90s and 
if I can't figure out how to fix this problem I will probably have to 
switch to a different MUA.

Does anyone know of any fixes, work-arounds, etc.?  The only thing I've 
been able to do is to move all of my ~/.evolution folder away but then 
of course I can no longer access all my old emails -- which is highly 
inconvenient.

I'll gladly install the -debug packages and provide back-traces if it 
helps...

Comment 5 ed 2006-08-26 15:27:54 UTC

I installed the evolution-debuginfo package and here is the back-trace:

Program received signal SIGABRT, Aborted.

+ Trace 71040

Thread NaN (LWP 27113)

#0 __kernel_vsyscall
#1 raise
from /lib/libc.so.6
#2 abort
from /lib/libc.so.6
#3 g_logv
from /usr/lib/libglib-2.0.so.0
#4 g_log
from /usr/lib/libglib-2.0.so.0
#5 g_assert_warning
from /usr/lib/libglib-2.0.so.0
#6 camel_mbox_folder_new
from /usr/lib/evolution-data-server-1.2/camel-providers/libcamellocal.so
#7 camel_folder_get_message
from /usr/lib/libcamel-provider-1.2.so.8
#8 mail_get_folderinfo
from /usr/lib/evolution/2.6/components/libevolution-mail.so
#9 mail_enable_stop
from /usr/lib/evolution/2.6/components/libevolution-mail.so
#10 e_msgport_wait
from /usr/lib/libedataserver-1.2.so.7
#11 start_thread
from /lib/libpthread.so.0
#12 clone
from /lib/libc.so.6

Comment 6 Ray Strode [halfline] 2006-08-26 23:15:23 UTC

As fejj pointed out in comment 2, you can work around the problem by upping your max open fd count.

You can do this by modifying /etc/security/limits.conf or by doing something like

su -c "ulimit -n 1000; su $LOGNAME -c 'evolution'"

(you'll have to enter a root password or put sudo in front of it if you go that route though)

Comment 7 Ray Strode [halfline] 2006-08-26 23:16:13 UTC

i meant to have 10000 in that line.  The default is ~1000 already.

Comment 8 ed 2006-08-27 00:18:22 UTC

Hi Ray, thank you for the prompt reply and the fd limit workaround but 
even at 10000 evolution is still crashing on startup.  Is it possible 
to have evolution crash from (somewhat?) corrupted mail dir data or 
other local ~/.evolution/* files?

Comment 9 Veerapuram Varadhan 2006-08-28 11:56:57 UTC

(In reply to comment #8)
> Hi Ray, thank you for the prompt reply and the fd limit workaround but 
> even at 10000 evolution is still crashing on startup.  Is it possible 
> to have evolution crash from (somewhat?) corrupted mail dir data or 
> other local ~/.evolution/* files?
> 
Do you have debuginfo packages installed?  Can you attach stack traces for your crash?

TIA

Comment 10 Veerapuram Varadhan 2006-08-28 13:31:22 UTC

(In reply to comment #4)
> I'm getting a similiar error and I have only one vfolder -- the default
> "Unmatched" one.  The messages are:
> 
> === warnings ===
> (evolution-2.6:24636): camel-local-provider-WARNING **: Could not open/create
> index file: Too many open files: indexing not performed
>
[snip..] 
> Does anyone know of any fixes, work-arounds, etc.?  The only thing I've 
> been able to do is to move all of my ~/.evolution folder away but then 
> of course I can no longer access all my old emails -- which is highly 
> inconvenient.
> 
> I'll gladly install the -debug packages and provide back-traces if it 
> helps...
> 
Ray, Ed: can you try the fix attached to bug#350907 and update the bug?  TIA.

Comment 11 Ray Strode [halfline] 2006-08-28 15:36:33 UTC

Hi,

The patch in bug 350907 seems to make evolution start up consisently without crashing.  Thanks!

*** This bug has been marked as a duplicate of 350907 ***

Comment 12 Ray Strode [halfline] 2006-08-28 17:57:32 UTC

I think I was a bit premature in my testing.

Dave Malcolm reports still seeing a problem even with the patched e-d-s.

Comment 13 Veerapuram Varadhan 2006-08-29 20:08:57 UTC

(In reply to comment #12)
> I think I was a bit premature in my testing.
> 
> Dave Malcolm reports still seeing a problem even with the patched e-d-s.
> 
Ray, try the latest patched attached to bug#350907.  Though, that patch fixes couple of fd-leaks, there is another set of fds being leaked in the same code, that I am currently digging into.

Comment 14 Matthew Barnes 2006-09-11 15:56:00 UTC

I believe I have found the root cause of this problem.

The EMsgPort data structure in evolution-data-server is an asynchronous queue for passing messages between threads.  But this data structure also defines two pipes per instance (or only one if NSS is disabled).  These pipes have two use cases:

1) To register a GSource that polls the "read" end of the pipe and calls
   a function when the pipe indicates that a new message has arrived on
   the asynchronous queue.  (It seems like this could just as easily be
   done with a thread and a simple wait-for-message-and-dispatch loop.)
   This use case is seen in several places in evolution.

2) Cancelling synchronous I/O operations.  The "read" end of the pipe is
   added to the file descriptor set when calling poll() or select().  This
   way when a message arrives on an asynchronous queue, a byte is written
   to the EMsgPort pipe.  This interrupts the blocking I/O operation, and
   allows the pipe's descriptor to be tested for a cancellation request.
   It's kind of a poor-man's asynchronous I/O.  This use case is a tricky
   one, and is seen in several places in Camel.

Each EMsgPort instance is consuming as many as four file descriptors.  Obviously this approach does not scale well, as this bug report shows.  But the second use case makes it difficult to just simply stop using pipes.  So the question becomes, how many of these EMsgPort pipes are really needed?

I compiled some simple statistical analysis into the EMsgPort code and ran a typical Evolution session for me (two IMAP accounts and about 100 search folders).  During my session, 1592 EMsgPort instances were created.  Of the pipes created for each of those instances, less than 3% were actually being used for one of the use cases described above.  (Also, 1585 of those instances were properly destroyed, so the file descriptor leaks are not as severe as some may have suspected.)

Since we can't easily get away from using pipes for intra-process communication, I think the solution is to create them on-demand rather than up-front.  If we wait to create the pipes until the first time e_msgport_fd() or e_msgport_prfd() is called for a particular EMsgPort instance, rather than creating them in e_msgport_new(), I think that will dramatically reduce the number of file descriptors than evolution consumes.

I'll cook up a patch for this.

Comment 15 Matthew Barnes 2006-09-11 15:58:01 UTC

Reopening, because I think this is a different issue than bug #350907.

Comment 16 Matthew Barnes 2006-09-12 13:54:00 UTC

Created attachment 72602 [details] [review]
Proposed patch

Keeps file descriptor usage under control by only creating pipes when they are actually needed, rather than for every EMsgPort instance.

Comment 17 Veerapuram Varadhan 2006-09-14 14:51:04 UTC

(In reply to comment #14)
> I believe I have found the root cause of this problem.
> 
> The EMsgPort data structure in evolution-data-server is an asynchronous queue
> for passing messages between threads.  But this data structure also defines two
> pipes per instance (or only one if NSS is disabled).  These pipes have two use
> cases:
Nice observation, Mathew.  Can you attach the code that you used to generate this analysis?  TIA. :-)

The current implementation of not-creating-pipes-on-demand was a fix to one of the novell bugs.  Have a look at http://cvs.gnome.org/viewcvs/evolution-data-server/libedataserver/e-msgport.c?r1=1.6&r2=1.7 commit log.

CC'ing fejj as he can shed some light on us w.r.t this bug/fix.

Comment 18 Jeffrey Stedfast 2006-09-14 15:56:21 UTC

matthew: that's how the original code worked, but it caused hangs.

the pipes NEED to exist from start to finish or problems arise.

Comment 19 Jeffrey Stedfast 2006-09-14 15:59:34 UTC

https://bugzilla.novell.com/show_bug.cgi?id=176277

Comment 20 Matthew Barnes 2006-09-14 16:58:44 UTC

Then perhaps we should add a e_msgport_new_with_pipes().

According to my measurements, pipes are only needed for a small percentage of the total EMsgPorts created during a typical Evolution session.  And because they're now being created for ALL EMsgPorts, we're effectively using pipes instead of condition variables to synchronize message passing.  Pipes are of course much slower, which makes Evolution feel more sluggish overall.

All I know is that with my current mail and search folder configuration, which is fairly modest, the current EDS code tries to open over 6000 file descriptors on startup.  Obviously that's not acceptable.

(BTW, I don't seem to have access to Novell's Bugzilla.)

Comment 21 Jeffrey Stedfast 2006-09-14 17:25:17 UTC

perhaps that would be a workable solution... need to first find out if it's possible to know ahead of time what EMsgPorts are going to need them and which aren't.

another solution might be to change all code to use PRFileDesc rather than a mix of both fds - I believe then you'd be able to create them on-the-fly without problems arising such as the one described in the novell bugzilla.

Comment 22 Jeffrey Stedfast 2006-09-14 17:29:54 UTC

actually, come to think of it I think I'm wrong on my last post. even reducing it to 1 type I think might still cause problems as there was a race between msgs being queued and pulled.

Comment 23 Jeffrey Stedfast 2006-09-14 17:32:36 UTC

since I can't figure out how to make that novell bug public, I'll post the info:


Comment #7 From Michael Meeks 2006-05-26 08:48:42 MST [reply]

Having said that - I just got another camel deadlock like this:

+ Trace 72350


So - 2 deadlocks in 2 weeks ...

Comment #8 From Jeff Stedfast 2006-05-30 14:13:00 MST [reply]

threads 1 & 6 are waiting on mail_msg_lock which thread 3 seems to be holding.

thread 3 is probably hanging in a read() call? *shrug*

Comment #9 From Jeff Stedfast 2006-05-31 13:52:19 MST [reply]

Created an attachment (id=86349) [edit]
bnc-176277-2.patch

creates both pipes in e_msgport_new() so there's no race between EMsgs being
queued and pulled.

Comment 24 Matthew Barnes 2006-09-14 18:23:54 UTC

(In reply to comment #21)
> perhaps that would be a workable solution... need to first find out if it's
> possible to know ahead of time what EMsgPorts are going to need them and which
> aren't.

Right, that bit me pretty quick after I started down that path.

Let me run another idea by you, Jeff:

I think the race condition is that there may already be messages in the queue when the pipe is created, so the pipe and the queue are thrown out-of-sync.  If that's true, then it would hang on the next e_msgport_wait() because the logic uses the *current* state of the pipe to decide whether to read from it.

Instead, what if we tag the message itself to indicate the state of the pipe when the message was put on the queue?  The tag can be applied to the message in e_msgport_put().

e_msgport_wait() will have to be restructured to *always* wait on the condition variable first in order to obtain the message from the queue, and then wait on the appropriate pipe if the message is tagged.

This should solve the race condition, since messages already on the queue when the pipe was created will not be tagged.

Comment 25 Jeffrey Stedfast 2006-09-14 18:53:06 UTC

I considered that when I first implemented a fix for that race... I can't remember why I didn't go with that. mighta been "too hard" or something :)

oh, wait... it was probably ABI breakage that I was trying to avoid.

anyways, sure, I think that solution would be ok with me (other than ABI breakage, that's something the gnome/evo team has to be ok with - if they're ok with it, I'm ok with it)

Comment 26 Jeff Cai 2006-09-15 03:00:59 UTC

*** Bug 346443 has been marked as a duplicate of this bug. ***

Comment 27 Matthew Barnes 2006-09-15 17:16:27 UTC

Created attachment 72866 [details] [review]
Proposed patch

This patch restores on-demand pipe creation to keep file descriptor usage under control, but without the race condition discussed above.  Unfortunately it does change libedataserver's ABI.  I don't see a way around that.

I was a little more ambitious this time in that I wound up rewriting most of the EMsgPort implementation using a GAsyncQueue.  I've been running Evo 2.8 with the patch applied for a few hours now and it seems stable.  I think the logic is easier to understand too.

Comment 28 Matthew Barnes 2006-09-20 19:50:25 UTC

Created attachment 73106 [details] [review]
Proposed patch (revised)

There was a small typo in my previous patch.

Comment 29 Matthew Barnes 2006-09-21 17:39:31 UTC

Created attachment 73155 [details] [review]
Proposed patch (revised, again)

I missed a couple places in e_thread_put() where it peeks at the length of the EMsgPort queue using e_dlist_length().  Now it uses g_async_queue_length().

I also added warnings when pipe I/O fails.

Comment 30 Matthew Barnes 2006-10-06 15:26:53 UTC

Attachment #74144 [details] of bug #359979 shows the full EMsgPort implementation that I'm proposing here.  It's easier to read than a patch, since the changes to the existing implementation are extensive.

Comment 31 Sankar P 2006-10-16 12:37:17 UTC

Patch reviewed and committed to HEAD 

http://cvs.gnome.org/viewcvs/evolution-data-server/ChangeLog?r1=1.424&r2=1.425

Comment 32 Matthew Barnes 2006-10-16 14:54:16 UTC

Excellent, thanks!

Comment 33 Jeff Cai 2006-10-17 04:42:43 UTC

Sanka, Could you also committed it to branch 2.16?

Comment 34 Sankar P 2006-10-17 06:22:30 UTC

Jeff: No. It cannot be committed to the stable branch. Breaks ABI

Comment 35 Srinivasa Ragavan 2006-10-17 17:47:26 UTC

Any reason, why this bug is kept open? Can it be closed?

Comment 36 Matthew Barnes 2006-10-17 18:56:47 UTC

No, please close it.  I don't have permissions to do it myself.

Comment 37 André Klapper 2006-10-17 22:55:50 UTC

closing as per last comment.

Comment 38 André Klapper 2006-10-24 22:25:24 UTC

*** Bug 356913 has been marked as a duplicate of this bug. ***