Bug 485005
gmime causes endless 100% cpu utilization for some emails
Last modified: 2007-12-02 17:44:55 UTC
The file was a mail in a kmail folder. It's status was: Filtering status (2h35m26s ago): determining filter for file:///home/tamas/.kde/share/apps/kmail/mail/webmaster/cur/1191841883.13274.RPjw4 beagle-extract-content was also unable to parse the file. The file is as follows (base64 encoded): begin-base64 644 1191841883.13274.RPjw4 UmV0dXJuLVBhdGg6IDx3d3dydW5AYmlzbWFyY2subnltZS5odT4KWC1Pcmln aW5hbC1Ubzogd2VibWFzdGVyQG55bWUuaHUKRGVsaXZlcmVkLVRvOiBuaWNl QHRpdGFuaWMubnltZS5odQpSZWNlaXZlZDogZnJvbSB0aXRhbmljLm55bWUu aHUgKGxvY2FsaG9zdCBbMTI3LjAuMC4xXSkKCWJ5IHBvc3Rwcm9jZXNzLnNl ay5ueW1lLmh1IChQb3N0Zml4KSB3aXRoIFNNVFAgaWQgRTY5RTU1QTg4QQoJ Zm9yIDx3ZWJtYXN0ZXJAbnltZS5odT47IFR1ZSwgIDQgSnVsIDIwMDYgMTU6 Mjc6NTMgKzAyMDAgKENFU1QpClJlY2VpdmVkOiBmcm9tIGJpc21hcmNrLm55 bWUuaHUgKGJpc21hcmNrLm55bWUuaHUgWzE5My4yMjUuOTMuNzddKQoJYnkg dGl0YW5pYy5ueW1lLmh1IChQb3N0Zml4KSB3aXRoIEVTTVRQIGlkIDQ2NUJG NTZFMjkKCWZvciA8d2VibWFzdGVyQG55bWUuaHU+OyBUdWUsICA0IEp1bCAy MDA2IDE1OjI3OjQ5ICswMjAwIChDRVNUKQpSZWNlaXZlZDogYnkgYmlzbWFy Y2subnltZS5odSAoUG9zdGZpeCwgZnJvbSB1c2VyaWQgMzApCglpZCAwODRB MENCMzsgVHVlLCAgNCBKdWwgMjAwNiAxNToyNzoxMSArMDIwMCAoQ0VTVCkK VG86IHdlYm1hc3RlckBueW1lLmh1ClN1YmplY3Q6ID0/P1E/Sm9iX05vOl8x X0pvYl9iZWdpbj89CkZyb206IFdlYm1hc3RlciBo7XJsZXbpbCB0ZXN6dCA8 d2VibWFzdGVyQG55bWUuaHU+ClJlcGx5LVRvOgpNaW1lLVZlcnNpb246IDEu MApDb250ZW50LVR5cGU6IHRleHQvcGxhaW47CiAgY2hhcnNldD0iIgpDb250 ZW50LVRyYW5zZmVyLUVuY29kaW5nOiBxdW90ZWQtcHJpbnRhYmxlCk1lc3Nh Z2UtSWQ6IDwyMDA2MDcwNDEzMjcxMS4wODRBMENCM0BiaXNtYXJjay5ueW1l Lmh1PgpEYXRlOiBUdWUsICA0IEp1bCAyMDA2IDE1OjI3OjExICswMjAwIChD RVNUKQpTdGF0dXM6IFIKWC1TdGF0dXM6IE5DClgtS01haWwtRW5jcnlwdGlv blN0YXRlOiAgClgtS01haWwtU2lnbmF0dXJlU3RhdGU6ICAKWC1LTWFpbC1N RE4tU2VudDogIAoKSm9iIGJlZ2luOiAwNC0wNy0wNiAwMzoyNzoxMQo= ====
Can you try "beagle-extract-content --mimetype=message/rfc822 /path/to/file"? It worked for me here.
I tried the following: tamas@milleniumfalcon:~/beagle> beagle-extract-content --mimetype=message/rfc822 1191841883.13274.RPjw4 The output was: Filename: file:///home/tamas/beagle/1191841883.13274.RPjw4 Debug: Loaded 53 filters from /usr/lib/beagle/Filters/Filters.dll And it hang forever with 100% cpu utilization. By the way I use openSUSE 10.3.
Ok, I tried using the branch for 0.2.18 and still it runs fine. This could be some problem with the OpenSUSE gmime-sharp or some other local patch in the OpenSUSE 10.3. Joe, do you know anything offhand ? ----output of extract-content (0.2.18 is made out of 0.2.16 svn branch)----- Filename: file:///home/debajyoti/1191841883.13274.RPjw4 Debug: Loaded 53 filters from /usr/share/devel/beagle-gnome/branches/beagle-0.2.16/Filters/Filters.dll Filter: Beagle.Filters.FilterMail (determined in .60s) MimeType: message/rfc822 Properties: Timestamp = 2007-10-09 11:26:07 (Utc) dc:title = Job No: 1 Job begin fixme:date = 20060704132711 fixme:from = Webmaster hrlevl teszt <> fixme:from_address = fixme:from_name = Webmaster hrlevl teszt fixme:msgid = fixme:to = fixme:to_address = Content: Job begin: 04-07-06 03:27:11 HotContent: (no hot content) Text extracted in .05s
*** Bug 490074 has been marked as a duplicate of this bug. ***
I figured out that if I change the subject line from: Subject: =??Q?Job_No:_1_Job_begin?= to something like: Subject: Job_No:_1_Job_begin then beagle is able to index the file! I forgot to mention that i run openSUSE 10.3 on i386 (32 bit)
Markus, Nemeth, Do you use non-english language or encoding ? Nemeth, the "?" in the email is really english "?" or the base64 changed it to english "?". If there is no funky encoding involved, then I dont see any way beagle could be using gmime incorrectly. Then this bug should really be investigated by the gmime people. gmime contains several test programs, e.g. there is gmime/tests/test-parser.c. If you want, you can download that test program and run it with the path to the bad file as argument. Maybe that will help in figuring out why gmime is taking forever on these emails.
Created attachment 98660 [details] problematic file (mail)
I attached the file tu this bug report, but you also can extract it from byte to byte from the base64 encoded version above. As you can see this way, the question marks in the Subject line are really question marks (0x3f). The From line itself is actually coded 8 bit iso8859-2, because it is in hungarian. It does not show the encoding however, and I guess it may be incorrect this way, isn't it? I'm not an RFC822 expert at all, but I feel that this mail is malformed at many points. Maybe the program, which generated the mail was not capable of handling the iso8859-2 encoding.
Created attachment 98675 [details] [review] possible fix I wasn't able to reproduce the problem on my OpenSuSE 1.3 machine w/ gmime-2.2.10, so I'm not sure that this patch actually fixes the problem... but it's the only thing I can see in the code which /may/ be related.
that hould read OpenSuSE 10.3
Not sure but that could be the problem. If any of the reporters could try beagle with gmime+the above patch (gmime is extremely easy to build), we can try to fix this forever.
Dear Jeff! I did the following. I downloaded and installed in order to be able to reproduce the openSUSE gmime rpms. I did so, but I did't use the build script by SuSE, just the raw rpmbuild program like this. cd /usr/src/packages/SPECS rpmbuild -ba gmime.spec After building the rpms I replaced the already installed original gmime-sharp, libgmime-2_0-2 and gmime packages with the respective (and hopefully totally identical) ones generated by me just now. I tested beagle by copying my mail file and Markus Ehrnsperger's file from Bug 490074 to my home directory and the symptoms were the same as with the original openSUSE 10.3 version of gmime. After doing so, I modified /usr/src/packages/SPECS/gmime.spec the following way: -I increased the release number to 36. -I downloaded your patch as /usr/src/packages/SOURCES/gnome-bugzilla-id-485005.patch and added the following line ti gmime.spec. Patch2: gnome-bugzilla-id-485005.patch -At the %prep section I added: %patch2 -p0 After doing so I build the version 36 counterparts of my gmime rpm packages and replaced the old ones with the brand new ones (hopefully) including your patch. And now the symptoms are the same for both problematic file:(((( PS: To tell the truth, I haven't even heard about rpmbuild so far, so it's possible that I made a mistake. Whether or not is this the case, the building of the version 36 rpms went without problem and I attach the src rpm to this bug report for anyone to be able to reproduce the rpms with your patch.
Created attachment 98731 [details] [review] Patched rpm, but no positive results for me
Created attachment 98732 [details] strace beagle-extract-content /beagle/1191841883.13274.RPjw4
Created attachment 98733 [details] strace beagle-extract-content /beagle/1108158121.18206.VP5fq\:2\,S
Dear Jeff! I attached the outputs of stracing beagle-extract-content on both problematic files. By the way, how can it be possible that you can not reproduce the symptom on your openSUSE 10.3 machine??? Don't you run the 32 bit intel version? Or you have some piece of code compiled by hand, replacing some binaries of the original rpms? Or you have installed one of the RC versions instead of the final openSUSE 10.3?
aha, found the problem... it was a bug with the --enable-rfc2047-workarounds code path.
Created attachment 98977 [details] [review] 485005.2.patch this really fixes it
OK, I added both patches to the openSUSE src.rpm, built the actual rpms and installed them. Now beagled is able to index both problematic files, I can find them via the SuSE style KDE menu and via kerry beagle too. beagle-extract-content also easily parses the files. Great job! I say thank you in the name of the community. You may possibly change this bug's status to fixed. When will these patches find their way to openSUSE (and official gmime) packages?
Changing product and summary. Jeffrey, mark it as fixed if the patch has been checked in.
the fix is in the 2.2.11 release of GMime