GNOME Bugzilla – Bug 501803
IndexHelper freaking out on MS Office email
Last modified: 2018-07-03 09:50:42 UTC
The big bogey evaded me, when the helper used 66% of my memory, 1600M according to top. I killed it and on the next run it ramped up it's cpu/mem usage again and I caught this: 20071205 17:17:36.7207 14212 IndexH WARN: Filtering status (2m3s ago): determining filter and extracting properties for email://1161637339.14559.1@sams/INBOX;uid=1335 (file:///home/<censored>/folders/INBOX/1335.) Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS Stacktrace: at (wrapper managed-to-native) System.Object.__icall_wrapper_mono_object_new_fast (intptr) <0x00004> at (wrapper managed-to-native) System.Object.__icall_wrapper_mono_object_new_fast (intptr) <0xffffffff> at HtmlAgilityPack.HtmlDocument.AddError (HtmlAgilityPack.HtmlParseErrorCode,int,int,int,string,string) <0x00017> at HtmlAgilityPack.HtmlDocument.CloseCurrentNode () <0x0040d> at HtmlAgilityPack.HtmlDocument.PushNodeEnd (int,bool) <0x00222> at HtmlAgilityPack.HtmlDocument.DoParse (int) <0x0017b> at HtmlAgilityPack.HtmlDocument.Parse () <0x00121> at HtmlAgilityPack.HtmlDocument.Load (System.IO.TextReader) <0x00219> at HtmlAgilityPack.HtmlDocument.Load (System.IO.Stream,System.Text.Encoding) <0x00033> at Beagle.Filters.FilterHtml.DoPullProperties () <0x003ff> at Beagle.Daemon.Filter.Open (System.IO.Stream,bool) <0x000dd> at Beagle.Filters.FilterHtml.GetHtmlReader (System.IO.Stream,string) <0x000a7> at PartHandler.OnEachPart (GMime.Object) <0x00644> at Beagle.Filters.FilterMail.DoPullSetup () <0x0005d> at Beagle.Daemon.Filter.Open (System.IO.Stream,bool) <0x0012a> at Beagle.Daemon.Filter.Open (System.IO.FileSystemInfo) <0x002e0> at Beagle.Daemon.Filter.Open (string) <0x0004a> at Beagle.Daemon.FilterFactory.FilterIndexable (Beagle.Indexable,Beagle.Daemon.TextCache,Beagle.Daemon.Filter&) <0x006c1> at Beagle.Daemon.LuceneIndexingDriver.AddIndexableToIndex (Beagle.Indexable,Lucene.Net.Index.IndexWriter,Lucene.Net.Index.IndexWriter&,System.Collections.Hashtable) <0x0009b> at Beagle.Daemon.LuceneIndexingDriver.Flush_Unlocked (Beagle.Daemon.IndexerRequest) <0x00f24> at Beagle.Daemon.LuceneIndexingDriver.Flush (Beagle.Daemon.IndexerRequest) <0x0004c> at Beagle.IndexHelper.RemoteIndexerExecutor.Execute (Beagle.RequestMessage) <0x001dd> at Beagle.Daemon.ConnectionHandler.HandleConnection (System.IO.Stream) <0x0030a> at Beagle.Daemon.UnixConnectionHandler.HandleConnection () <0x0037c> at (wrapper delegate-invoke) System.MulticastDelegate.invoke_void () <0xffffffff> at Beagle.Util.ExceptionHandlingThread.ThreadStarted () <0x002d5> at (wrapper delegate-invoke) System.MulticastDelegate.invoke_void () <0xffffffff> at (wrapper runtime-invoke) System.Object.runtime_invoke_void (object,intptr,intptr,intptr) <0xffffffff> Native stacktrace: beagled-helper [0x8194ca6] beagled-helper [0x8177154] [0xffffe440] /lib/tls/i686/cmov/libc.so.6(abort+0x101) [0xb7d0f201] beagled-helper [0x8145cf5] beagled-helper [0x813de2d] beagled-helper [0x813e21b] beagled-helper [0x813e3c3] beagled-helper [0x813e57b] beagled-helper [0x814130b] beagled-helper [0x81421cc] beagled-helper [0x8149de0] beagled-helper(mono_object_new_fast+0x1a) [0x80b1bdd] [0xb73b299f] [0xb4ec4090] [0xb4ec2bfe] [0xb4ebc93b] [0xb4ebb8b4] [0xb4ebb4da] [0xb4ebacda] [0xb4eba8a4] [0xb4f2fce8] [0xb4f21a4e] [0xb4f2f720] [0xb4f25595] [0xb4f24d1e] [0xb4f21a9b] [0xb4f3cf11] [0xb4f3cbc3] [0xb4ffb9ea] [0xb4ffad3c] [0xb500c78d] [0xb500b4a5] [0xb64250e6] [0xb641d9c3] [0xb641b70d] [0xb67488ca] [0xb6748be6] [0xb67488ca] [0xb6f9901b] beagled-helper [0x8176f50] beagled-helper(mono_runtime_invoke+0x27) [0x80b0b2f] beagled-helper(mono_runtime_delegate_invoke+0x62) [0x80b0db7] beagled-helper [0x80e9354] beagled-helper [0x81316ad] beagled-helper [0x814ab16] /lib/tls/i686/cmov/libpthread.so.0 [0xb7e5746b] /lib/tls/i686/cmov/libc.so.6(clone+0x5e) [0xb7db66de] ================================================================= Got a SIGABRT while executing native code. This usually indicates a fatal error in the mono runtime or one of the native libraries used by your application. =================================================================
This is of course from .evolution. The content of the mail sans personal headers is: (crazy how much clutter they add for such a simple email!) Subject: Simon effect - Wikipedia, the free encyclopedia Date: Wed, 28 Feb 2007 21:56:36 +0100 Message-ID: <003901c75b7a$f06001f0$c5d7fea9@LHMobile> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_003A_01C75B83.522469F0" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Importance: Normal This is a multi-part message in MIME format. ------=_NextPart_000_003A_01C75B83.522469F0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit <http://en.wikipedia.org/wiki/Simon_effect> http://en.wikipedia.org/wiki/Simon_effect ------=_NextPart_000_003A_01C75B83.522469F0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html xmlns:o=3D"urn:schemas-microsoft-com:office:office" = xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns=3D"http://www.w3.org/TR/REC-html40"> <head> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; = charset=3Dus-ascii"> <meta name=3DProgId content=3DWord.Document> <meta name=3DGenerator content=3D"Microsoft Word 10"> <meta name=3DOriginator content=3D"Microsoft Word 10"> <link rel=3DFile-List href=3D"cid:filelist.xml@01C75B83.507E5B10"> <!--[if gte mso 9]><xml> <o:OfficeDocumentSettings> <o:DoNotRelyOnCSS/> </o:OfficeDocumentSettings> </xml><![endif]--><!--[if gte mso 9]><xml> <w:WordDocument> <w:SpellingState>Clean</w:SpellingState> <w:GrammarState>Clean</w:GrammarState> <w:DocumentKind>DocumentEmail</w:DocumentKind> <w:EnvelopeVis/> <w:Compatibility> <w:ApplyBreakingRules/> </w:Compatibility> </w:WordDocument> </xml><![endif]--> <style> <!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} a:link, span.MsoHyperlink {color:blue; text-decoration:underline; text-underline:single;} a:visited, span.MsoHyperlinkFollowed {color:blue; text-decoration:underline; text-underline:single;} p {mso-margin-top-alt:auto; margin-right:0cm; mso-margin-bottom-alt:auto; margin-left:0cm; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} span.EstiloDeEmail18 {mso-style-type:personal-compose; mso-style-noshow:yes; mso-ansi-font-size:10.0pt; mso-bidi-font-size:10.0pt; font-family:Arial; mso-ascii-font-family:Arial; mso-hansi-font-family:Arial; mso-bidi-font-family:Arial;} @page Section1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> </style> <!--[if gte mso 10]> <style> /* Style Definitions */=20 table.MsoNormalTable {mso-style-name:"Tabela normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman";} </style> <![endif]--> </head> <body lang=3DEN-US link=3Dblue vlink=3Dblue = style=3D'tab-interval:36.0pt'> <div class=3DSection1> <p><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt;font-family:Arial'><a href=3D"http://en.wikipedia.org/wiki/Simon_effect"><!-- Converted from = text/rtf format = -->http://en.wikipedia.org/wiki/Simon_effect</a></span></font> <o:p></o:p></p> </div> </body> </html> ------=_NextPart_000_003A_01C75B83.522469F0-- Subject: Simon effect - Wikipedia, the free encyclopedia Date: Wed, 28 Feb 2007 21:56:36 +0100 Message-ID: <003901c75b7a$f06001f0$c5d7fea9@LHMobile> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_003A_01C75B83.522469F0" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Importance: Normal This is a multi-part message in MIME format. ------=_NextPart_000_003A_01C75B83.522469F0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit <http://en.wikipedia.org/wiki/Simon_effect> http://en.wikipedia.org/wiki/Simon_effect ------=_NextPart_000_003A_01C75B83.522469F0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html xmlns:o=3D"urn:schemas-microsoft-com:office:office" = xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns=3D"http://www.w3.org/TR/REC-html40"> <head> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; = charset=3Dus-ascii"> <meta name=3DProgId content=3DWord.Document> <meta name=3DGenerator content=3D"Microsoft Word 10"> <meta name=3DOriginator content=3D"Microsoft Word 10"> <link rel=3DFile-List href=3D"cid:filelist.xml@01C75B83.507E5B10"> <!--[if gte mso 9]><xml> <o:OfficeDocumentSettings> <o:DoNotRelyOnCSS/> </o:OfficeDocumentSettings> </xml><![endif]--><!--[if gte mso 9]><xml> <w:WordDocument> <w:SpellingState>Clean</w:SpellingState> <w:GrammarState>Clean</w:GrammarState> <w:DocumentKind>DocumentEmail</w:DocumentKind> <w:EnvelopeVis/> <w:Compatibility> <w:ApplyBreakingRules/> </w:Compatibility> </w:WordDocument> </xml><![endif]--> <style> <!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} a:link, span.MsoHyperlink {color:blue; text-decoration:underline; text-underline:single;} a:visited, span.MsoHyperlinkFollowed {color:blue; text-decoration:underline; text-underline:single;} p {mso-margin-top-alt:auto; margin-right:0cm; mso-margin-bottom-alt:auto; margin-left:0cm; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} span.EstiloDeEmail18 {mso-style-type:personal-compose; mso-style-noshow:yes; mso-ansi-font-size:10.0pt; mso-bidi-font-size:10.0pt; font-family:Arial; mso-ascii-font-family:Arial; mso-hansi-font-family:Arial; mso-bidi-font-family:Arial;} @page Section1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> </style> <!--[if gte mso 10]> <style> /* Style Definitions */=20 table.MsoNormalTable {mso-style-name:"Tabela normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman";} </style> <![endif]--> </head> <body lang=3DEN-US link=3Dblue vlink=3Dblue = style=3D'tab-interval:36.0pt'> <div class=3DSection1> <p><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt;font-family:Arial'><a href=3D"http://en.wikipedia.org/wiki/Simon_effect"><!-- Converted from = text/rtf format = -->http://en.wikipedia.org/wiki/Simon_effect</a></span></font> <o:p></o:p></p> </div> </body> </html> ------=_NextPart_000_003A_01C75B83.522469F0--
It will be hard to create the exact copy of the email from the bug text, could you 'attach' the email (or email it to me ?). I dont mind if you replace (not remove, so that the email is still a valid message/rfc822 file) the personal headers. Thanks.
Five minutes later it's chewing on the next MS Outlook mail for minutes. Tell me if you want that too.
Ok, sending you personally, both mails.
Could you test one more thing ? Save the email on the disk and run $ beagle-extract-content --mimetype=message/rfc822 --show-generated /path/to/email And see if it runs correctly or has the same problem.
Heavy CPU usage. 30 secs of CPU time, then exitus. $ beagle-extract-content --mimetype=message/rfc822 --show-generated /tmp/1335. Filename: file:///tmp/1335. Debug: Done reading conf from ~/.beagle/config/Daemon.xml Debug: Done reading conf from /etc/beagle/config-files/Daemon.xml Debug: Loaded 58 filters from /usr/lib/beagle/Filters/Filters.dll Debug: Verifying filter_cache at ~/.beagle/filterver.dat ... cache is dirty ? False Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS Stacktrace: at (wrapper managed-to-native) System.Object.__icall_wrapper_mono_object_new_fast (intptr) <0x00004> at (wrapper managed-to-native) System.Object.__icall_wrapper_mono_object_new_fast (intptr) <0xffffffff> at HtmlAgilityPack.HtmlEntity.DeEntitize (string) <0x00084> at Beagle.Filters.FilterHtml.HandleTextNode (HtmlAgilityPack.HtmlNode) <0x00084> at Beagle.Filters.FilterHtml.HandleNodeEventHead (HtmlAgilityPack.HtmlNode) <0x00118> at (wrapper delegate-invoke) System.MulticastDelegate.invoke_bool_HtmlNode (HtmlAgilityPack.HtmlNode) <0xffffffff> at HtmlAgilityPack.HtmlDocument.PushNodeEnd (int,bool) <0x00039> at HtmlAgilityPack.HtmlDocument.NewCheck () <0x000f8> at HtmlAgilityPack.HtmlDocument.DoParse (int) <0x00071> at HtmlAgilityPack.HtmlDocument.Parse () <0x00121> at HtmlAgilityPack.HtmlDocument.Load (System.IO.TextReader) <0x00219> at HtmlAgilityPack.HtmlDocument.Load (System.IO.Stream,System.Text.Encoding) <0x00033> at Beagle.Filters.FilterHtml.DoPullProperties () <0x003ff> at Beagle.Daemon.Filter.Open (System.IO.Stream,bool) <0x000dd> at Beagle.Filters.FilterHtml.GetHtmlReader (System.IO.Stream,string) <0x000a7> at PartHandler.OnEachPart (GMime.Object) <0x00658> at Beagle.Filters.FilterMail.DoPullSetup () <0x0005d> at Beagle.Daemon.Filter.Open (System.IO.Stream,bool) <0x0012a> at Beagle.Daemon.Filter.Open (System.IO.FileSystemInfo) <0x002e0> at Beagle.Daemon.Filter.Open (string) <0x0004a> at Beagle.Daemon.FilterFactory.FilterIndexable (Beagle.Indexable,Beagle.Daemon.TextCache,Beagle.Daemon.Filter&) <0x006ea> at Beagle.Daemon.FilterFactory.FilterIndexable (Beagle.Indexable,Beagle.Daemon.Filter&) <0x0000f> at ExtractContentTool.Display (Beagle.Indexable) <0x00154> at ExtractContentTool.Main (string[]) <0x002ee> at (wrapper runtime-invoke) System.Object.runtime_invoke_int_string[] (object,intptr,intptr,intptr) <0xffffffff> Native stacktrace: beagle-extract-content [0x8194ca6] beagle-extract-content [0x8177154] [0xffffe440] /lib/tls/i686/cmov/libc.so.6(abort+0x101) [0xb7d58201] beagle-extract-content [0x8145cf5] beagle-extract-content [0x813de2d] beagle-extract-content [0x813e21b] beagle-extract-content [0x813e3c3] beagle-extract-content [0x813e57b] beagle-extract-content [0x814130b] beagle-extract-content [0x81421cc] beagle-extract-content [0x8149de0] beagle-extract-content(mono_object_new_fast+0x1a) [0x80b1bdd] [0xb73fc9a7] [0xb65e4aed] [0xb65e4855] [0xb65e4729] [0xb65e45c0] [0xb65e430a] [0xb65e4161] [0xb65e3362] [0xb65e3092] [0xb65e248a] [0xb65e201c] [0xb65e15e8] [0xb6dbf5a6] [0xb65e1018] [0xb65df8b1] [0xb65df036] [0xb6dbf5f3] [0xb6db7c01] [0xb6db78b3] [0xb6db52f3] [0xb6eaf398] [0xb6ec7b6d] [0xb73faacf] [0xb73fa086] beagle-extract-content [0x8176f50] beagle-extract-content(mono_runtime_invoke+0x27) [0x80b0b2f] beagle-extract-content(mono_runtime_exec_main+0x109) [0x80b534a] beagle-extract-content(mono_runtime_run_main+0x27e) [0x80b5631] beagle-extract-content(mono_jit_exec+0xbd) [0x805a4cb] beagle-extract-content [0x805a5a8] beagle-extract-content(mono_main+0x1683) [0x805bdc9] beagle-extract-content [0x8059636] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d42050] beagle-extract-content [0x80595b1] ================================================================= Got a SIGABRT while executing native code. This usually indicates a fatal error in the mono runtime or one of the native libraries used by your application. ================================================================= Aborted (core dumped)
Hey, are you running from source or package i.e. if I attach a patch, can you test it ? I figured out the problem, its somewhat in beagle's interaction with gmime. Previously, the HTML mimepart of an email was extracted and stored in a temporary file and indexed as an attachment; in the new one, the whole temporary file and attachment route is skipped. It so turns out that the HtmlFilter needs random access to the htmlpart-gmime-stream - however the htmlpart-stream the FilterMail gets back does not allow random access! That causes all kinds of trouble. I have a terrible hack which can be temporarily used; I am unsure what the right solution is - it could be done in gmime but I am not quite sure.
I got the package today from Kevin’s Ubuntu “Personal Package Archive” repo. If he can patch there I’d prefer that, otherwise, yes.
Created attachment 100364 [details] [review] Workaround a gmime bug or feature Try this patch. This removes the existing StreamWrapper and creates a new StreamWrapper that assume Position=0 to mean stream.Reset().
dBera_gone: it seems your gmime patch works so far. dBera_gone: i haven't seen capricious behaviour so far dBera_gone: and the problematic emails seem to be scanned regularly dBera_gone: i only see a line "DEBUG: +email ..." in the logs
Beagle is not under active development anymore and had its last code changes in early 2011. Its codebase has been archived (see bug 796735): https://gitlab.gnome.org/Archive/beagle/commits/master "tracker" is an available alternative. Closing this report as WONTFIX as part of Bugzilla Housekeeping to reflect reality. Please feel free to reopen this ticket (or rather transfer the project to GNOME Gitlab, as GNOME Bugzilla is deprecated) if anyone takes the responsibility for active development again.