After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 466891 - Too many open files Exception with following Segmentation Fault
Too many open files Exception with following Segmentation Fault
Status: RESOLVED FIXED
Product: beagle
Classification: Other
Component: General
0.2.14
Other All
: High critical
: ---
Assigned To: Beagle Bugs
Beagle Bugs
Depends on:
Blocks:
 
 
Reported: 2007-08-15 09:20 UTC by Enrico Minack
Modified: 2007-08-15 19:23 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch that fixes the bug (330 bytes, patch)
2007-08-15 09:24 UTC, Enrico Minack
none Details | Review

Description Enrico Minack 2007-08-15 09:20:52 UTC
Steps to reproduce:
1. start beagle-build-index with many many email files
2. after some hundred emails where indexed, the first IOExceptions (too many files open) are thrown
3. some files later a segmentation fault stops beagle-build-index


Stack trace:
NOTE, the email path and file names had to be anonymized due to privacy issues
==============================================================================

Debug: +file:///PATH/EMAIL.eml
Warn: Unable to filter PATH/EMAIL.eml:
System.IO.IOException: Unable to read PATH/EMAIL.eml for parsing mail
  at Beagle.Filters.FilterMail.DoOpen (System.IO.FileInfo info) [0x00000] 
  at Beagle.Daemon.Filter.DoOpen (System.IO.FileSystemInfo info) [0x00000] 
  at Beagle.Daemon.Filter.Open (System.IO.FileSystemInfo info) [0x00000] 
Debug: First attempt to index file:///PATH/EMAIL.eml failed
System.IO.IOException: Too many open files
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at Beagle.Indexable.StreamFromUri (System.Uri uri) [0x00000] 
  at Beagle.Indexable.ReaderFromUri (System.Uri uri) [0x00000] 
  at Beagle.Indexable.GetTextReader () [0x00000] 
  at Beagle.Daemon.LuceneCommon.BuildDocuments (Beagle.Indexable indexable, Lucene.Net.Documents.Document& primary_doc, Lucene.Net.Documents.Document& secondary_doc) [0x00000] 
  at Beagle.Daemon.LuceneIndexingDriver.Flush_Unlocked (Beagle.Daemon.IndexerRequest request) [0x00000] 
Debug: +file:///PATH/EMAIL2.eml#0
Error: Unable to filter file:///PATH/EMAIL1.eml#0 (mimetype=text/plain)
System.IO.IOException: Too many open files
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at Beagle.Daemon.Filter.Open (System.IO.FileSystemInfo info) [0x00000] 
  at Beagle.Daemon.Filter.Open (System.String path) [0x00000] 
  at Beagle.Daemon.FilterFactory.FilterIndexable (Beagle.Indexable indexable, Beagle.Daemon.TextCache text_cache, Beagle.Daemon.Filter& filter) [0x00000] 
  at Beagle.Daemon.LuceneIndexingDriver.Flush_Unlocked (Beagle.Daemon.IndexerRequest request) [0x00000] 
Debug: +file:///PATH/EMAIL2.eml
Debug: No filter for file:///PATH/EMAIL2.eml (PATH/EMAIL2.eml) [application/octet-stream]
Debug: +file:///PATH/EMAIL2.eml#2
Error: Unable to filter file:///PATH/EMAIL2.eml#2 (mimetype=application/pdf)
System.IO.IOException: Too many open files
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at Beagle.Daemon.Filter.Open (System.IO.FileSystemInfo info) [0x00000] 
  at Beagle.Daemon.Filter.Open (System.String path) [0x00000] 
  at Beagle.Daemon.FilterFactory.FilterIndexable (Beagle.Indexable indexable, Beagle.Daemon.TextCache text_cache, Beagle.Daemon.Filter& filter) [0x00000] 
  at Beagle.Daemon.LuceneIndexingDriver.Flush_Unlocked (Beagle.Daemon.IndexerRequest request) [0x00000] 
Debug: +file:///PATH/EMAIL3.eml
Debug: No filter for file:///PATH/EMAIL3.eml (PATH/EMAIL3.eml) [application/octet-stream]
Debug: +file:///PATH/EMAIL4.eml
Warn: Unable to filter PATH/EMAIL4.eml:
System.IO.IOException: Unable to read PATH/EMAIL4.eml for parsing mail
  at Beagle.Filters.FilterMail.DoOpen (System.IO.FileInfo info) [0x00000] 
  at Beagle.Daemon.Filter.DoOpen (System.IO.FileSystemInfo info) [0x00000] 
  at Beagle.Daemon.Filter.Open (System.IO.FileSystemInfo info) [0x00000] 
Debug: First attempt to index file:///PATH/EMAIL4.eml failed
System.IO.IOException: Too many open files
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at Beagle.Indexable.StreamFromUri (System.Uri uri) [0x00000] 
  at Beagle.Indexable.ReaderFromUri (System.Uri uri) [0x00000] 
  at Beagle.Indexable.GetTextReader () [0x00000] 
  at Beagle.Daemon.LuceneCommon.BuildDocuments (Beagle.Indexable indexable, Lucene.Net.Documents.Document& primary_doc, Lucene.Net.Documents.Document& secondary_doc) [0x00000] 
  at Beagle.Daemon.LuceneIndexingDriver.Flush_Unlocked (Beagle.Daemon.IndexerRequest request) [0x00000] 
Debug: Second attempt to index file:///PATH/EMAIL4.eml failed, giving up...
System.IO.IOException: Too many open files
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at Lucene.Net.Store.FSIndexOutput..ctor (System.IO.FileInfo path) [0x00000] 
  at Lucene.Net.Store.FSDirectory.CreateOutput (System.String name) [0x00000] 
  at Lucene.Net.Index.FieldInfos.Write (Lucene.Net.Store.Directory d, System.String name) [0x00000] 
  at Lucene.Net.Index.SegmentMerger.MergeFields () [0x00000] 
  at Lucene.Net.Index.SegmentMerger.Merge () [0x00000] 
  at Lucene.Net.Index.IndexWriter.MergeSegments (Int32 minSegment, Int32 end) [0x00000] 
  at Lucene.Net.Index.IndexWriter.MergeSegments (Int32 minSegment) [0x00000] 
  at Lucene.Net.Index.IndexWriter.MaybeMergeSegments () [0x00000] 
  at Lucene.Net.Index.IndexWriter.AddDocument (Lucene.Net.Documents.Document doc, Lucene.Net.Analysis.Analyzer analyzer) [0x00000] 
  at Lucene.Net.Index.IndexWriter.AddDocument (Lucene.Net.Documents.Document doc) [0x00000] 
  at Beagle.Daemon.LuceneIndexingDriver.Flush_Unlocked (Beagle.Daemon.IndexerRequest request) [0x00000] 
Debug: Encountered exception while indexing: System.IO.IOException: Too many open files
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) [0x00000] 
  at System.IO.FileStream..ctor (System.String name, FileMode mode, FileAccess access, FileShare share) [0x00000] 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare)
  at Lucene.Net.Store.FSIndexOutput..ctor (System.IO.FileInfo path) [0x00000] 
  at Lucene.Net.Store.FSDirectory.CreateOutput (System.String name) [0x00000] 
  at Lucene.Net.Index.FieldInfos.Write (Lucene.Net.Store.Directory d, System.String name) [0x00000] 
  at Lucene.Net.Index.SegmentMerger.MergeFields () [0x00000] 
  at Lucene.Net.Index.SegmentMerger.Merge () [0x00000] 
  at Lucene.Net.Index.IndexWriter.MergeSegments (Int32 minSegment, Int32 end) [0x00000] 
  at Lucene.Net.Index.IndexWriter.MergeSegments (Int32 minSegment) [0x00000] 
  at Lucene.Net.Index.IndexWriter.FlushRamSegments () [0x00000] 
  at Lucene.Net.Index.IndexWriter.Close () [0x00000] 
  at Beagle.Daemon.LuceneIndexingDriver.Flush_Unlocked (Beagle.Daemon.IndexerRequest request) [0x00000] 
  at Beagle.Daemon.LuceneIndexingDriver.Flush (Beagle.Daemon.IndexerRequest request) [0x00000] 
  at Beagle.Daemon.BuildIndex.FlushIndexer (IIndexer indexer, Beagle.Daemon.IndexerRequest request) [0x00000] 
  at Beagle.Daemon.BuildIndex.AddToRequest (Beagle.Daemon.IndexerRequest request, Beagle.Indexable indexable) [0x00000] 
  at Beagle.Daemon.BuildIndex.DoIndexing () [0x00000] 
  at Beagle.Daemon.BuildIndex.IndexWorker () [0x00000] 
Debug: IndexWorker Done
libgcc_s.so.1 must be installed for pthread_cancel to work

=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

Stacktrace:



Other information:
The problem is caused since the filter opens the content file via a system call open but does not call the corresponding system call close.
Comment 1 Enrico Minack 2007-08-15 09:24:03 UTC
Created attachment 93709 [details] [review]
patch that fixes the bug

This fix simply closes the file using the corresponding system call close, the exception and the segmentation fault disappears.
Comment 2 Enrico Minack 2007-08-15 09:25:02 UTC
By the way, the code of the FilterMail.cs file did not change between 0.2.14 and 0.2.17 in respect to this bug, so it can be seen as still existing.
Comment 3 Lukas Lipka 2007-08-15 10:43:47 UTC
Nice catch! ;-) Fixed in r3855.
Comment 4 Joe Shaw 2007-08-15 12:41:56 UTC
I don't think this patch is correct; stream.Dispose() should take care of closing the file descriptor.  Something else is going on.
Comment 5 Joe Shaw 2007-08-15 12:42:59 UTC
Are there any exceptions prior to the "too many open files" error?
Comment 6 Enrico Minack 2007-08-15 13:01:12 UTC
nope, before the first IOException is thrown by the FilterMail.cs, indexing is just fine. There are some arbitrary exceptions from various filters complaining about wrong gzip compression or images of wrong format. I don't think that is related?! However, without that fix, beagle-build-index crashes at the very same file, with it, it completes indexing without problems (except the mentioned parsing issues).
Comment 7 Max Wiehle 2007-08-15 13:16:34 UTC
Took a look at this. Dispose does not seem to be overwritten for StreamFs nor Stream. So from what i can tell this will call the GObject.Dispose(). Don't know if this calls anything that would close the file though. I am not familiar with glib object destruction process.
Comment 8 Joe Shaw 2007-08-15 14:32:51 UTC
I backed out the change in SVN, r3856.  I should describe how things are supposed to work:

Beagle uses GMime for mail parsing, which is written in C.  It uses objects and uses reference counting for memory management.

When we create a GMime.StreamFS and pass in the file descriptor, we're passing ownership of that file descriptor to the stream.  We no longer have ownership of that fd, so we can't close it.  That's why the patch isn't right, and it can have serious side effects.  The created GMime.StreamFS has a ref count of 1.

When we create a GMime.Parser and pass in the GMime.StreamFS, the parser has a ref count of 1, and the stream's count is increased to 2.

When we construct a GMime.Message from the parser, the message gets a ref count of 1, and the parser's count is increased to 2.

So if you're keeping score at home, GMime.StreamFS has 2, GMime.Parser has 2, GMime.Message has 1.

Calling Dispose() on the stream decreases its ref count to 1, and calling it on the parser decreases its ref count to 1.

This is where the indexing process takes place.  When it's finished, DoClose() is supposed to be called.  That disposes of the GMime.Message, which decreases its ref count to 0.  That causes a chain reaction.  When GMime.Message's refcount reaches 0, it releases its reference on GMime.Parser which drops to 0, which releases its reference on GMime.StreamFS.  When GMime.StreamFS's refcount reaches 0, it closes the file descriptor.

So, if file descriptors are being leaked, there's also a very good chance that tons and tons of memory is being leaked as well.

By closing the file descriptor early like that, you're probably not actually getting any mail data in the index or possibly worse: random data.  Likewise, if the close process actually was working you would be closing a random file (since file descriptors are reused)... you could be closing an important index file.  The patch in essence treats the symptom but not the disease.  It'd be like giving aspirin to someone with encephalitis... It might make their headache go away, but they're still going to die from a swollen brain. :) 

I didn't notice originally that you were talking about beagle-build-index; my guess is that it simply isn't doing the Close() process correctly.  I'll see if I can duplicate the issue locally.
Comment 9 Debajyoti Bera 2007-08-15 15:24:35 UTC
Putting a console.readline before extract-content returns, and adding print statements in DoClose(), it looks like
- DoClose() is called
- message.dispose() is called
- lsof shows the file descriptor is not closed

Something wrong in gmime.
Comment 10 Debajyoti Bera 2007-08-15 15:25:23 UTC
Forgot to add, this is even for a valid email file.
Comment 11 Joe Shaw 2007-08-15 15:38:37 UTC
I see this too, although this might be a red herring: it's possible the GC or something didn't run and close the file.

I regularly run Beagle over several thousand emails without incident, so I think it's probably more than just something wrong with gmime.
Comment 12 Joe Shaw 2007-08-15 15:44:13 UTC
Looks like GLib.Object.Dispose() queues up object unrefs using GLib.Timeout, which means that if a main loop isn't running they'll never get triggered.  This seems to be the issue.
Comment 13 Joe Shaw 2007-08-15 17:28:33 UTC
This appears to have been fixed in gtk-sharp svn for some time, but I don't think there is a release which incorporates it.
Comment 14 Enrico Minack 2007-08-15 17:49:28 UTC
When I am indexing my files with beagled using the "--backend Files --indexing-test-mode" option, it is doing the same thing for me but now it succeeds.
Comment 15 Joe Shaw 2007-08-15 18:47:35 UTC
Yeah, beagled uses a main loop, so the objects are getting disposed of properly.

dBera mentioned that the file was staying open with beagle-extract-content.  There was a small inefficiency in GMime that would keep the file open longer than needed, and I just checked in a fix for that, but that's not the crux of the issue for you.

We'll probably need to add some GLib main loop action to beagle-build-index.  I think we can just do that in a separate thread.  I need to play around with that some more.
Comment 16 Joe Shaw 2007-08-15 19:23:07 UTC
I was able to duplicate the problem, and I've checked in a fix (a workaround, really) to both beagle-build-index and beagle-extract-content.  r3857.