GNOME Bugzilla – Bug 335819
x86_64 version of beagle crashes when parsing i386 evolution summary files
Last modified: 2006-07-14 19:04:44 UTC
Steps to reproduce: Run beagled. Sooner or later, it crashes. (Sometimes it runs okay for a while, but it always crashes within 10-15 minutes, sometimes much sooner.) Stack trace: (gdb) thread apply all bt
+ Trace 67206
Thread 6 (Thread 1075988816 (LWP 27414))
Other information: I upgraded three machines from Fedora Core 4 to Fedora Core 5 today. I installed beagle on all three of the machines. It seems to run fine on one of the machines. However, on the other two machines, beagled crashes every time it runs on the other two machines. The above stack trace is from an x86_64 machine, but I'm also getting crashes on another i386 machine.
I should have mentioned this sooner, but this only seems to be a problem when I try to index my evolution mail. If I execute the command "beagled --deny-backend EvolutionMail," the daemon runs normally.
Do you have gmime installed? If so, what version? Can you attach the stack trace that gets spit out to the console when beagled crashes? There might be a managed stack trace in there which could help.
gmime is installed; it's version 2.1.19-3. When I run beagle in debus mode, this is the output it returns before it crashes: Debug: +email://0/INBOX;uid=46991 Debug: +email://0/INBOX;uid=46992 Debug: +email://1143234013.30048.21@localhost.localdomain/INBOX;uid=12760579#0 Debug: +email://1143234013.30048.21@localhost.localdomain/INBOX;uid=12780223#0 Debug: Helper Size: VmRSS=48.3 MB, size=2.54, 38.5% Debug: The daemon appears to have gone away. Debug: Shutting down helper. Debug: (1) Waiting for 1 worker... Debug: waiting for server '/home/ebair/.beagle/socket-helper' Debug: Exiting Debug: Server '/home/ebair/.beagle/socket-helper' shut down Segmentation fault (It doesn't seg fault every time; sometimes it does and sometimes it doesn't.) As far as I can tell, it doesn't spit out any other sort of stack trace when it dies. Do I need to do something special to capture this? Forgive my ignorance, but when I tried to run the command "gdb beagled," I got the following message: [ebair@localhost ~]$ gdb beagled GNU gdb Red Hat Linux (6.3.0.0-1.122rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"..."/usr/bin/beagled": not in executable format: File format not recognized To get the stack trace I posted earlier, I had to start beagled and then use the "gdb program [id]" command, but that seemed to capture a stack trace of the process while it was still running. (After I executed gdb, beagled seemed to stop running, and wouldn't crash until I exited gdb.)
Just run beagled in the foreground like so: beagle --fg --debug And when it crashes, it should print out a bunch of stuff to the console.
That's exactly what I did to get the output in my earlier comment. When I ran "beagled --fg --debug," I got the following output before the crash: Debug: +email://0/INBOX;uid=46991 Debug: +email://0/INBOX;uid=46992 Debug: +email://1143234013.30048.21@localhost.localdomain/INBOX;uid=12760579#0 Debug: +email://1143234013.30048.21@localhost.localdomain/INBOX;uid=12780223#0 Debug: Helper Size: VmRSS=48.3 MB, size=2.54, 38.5% Debug: The daemon appears to have gone away. Debug: Shutting down helper. Debug: (1) Waiting for 1 worker... Debug: waiting for server '/home/ebair/.beagle/socket-helper' Debug: Exiting Debug: Server '/home/ebair/.beagle/socket-helper' shut down Segmentation fault It doesn't print anything else to the console, and there is no additional information in the ~/.beagle/Log files.
Ah, ok. What version of mono are you using?
It's version 1.1.13.4. Here is a list of all the mono packages installed on the two machines that aren't working: mono-web-1.1.13.4-2 mono-core-1.1.13.4-2 mono-data-1.1.13.4-2 mono-data-sqlite-1.1.13.4-2 For what it's worth, on the machine where beagle is working, I have the following mono packages installed: mono-extras-1.1.13.4-2 mono-data-1.1.13.4-2 mono-data-sqlite-1.1.13.4-2 mono-core-1.1.13.4-2 mono-data-sybase-1.1.13.4-2 mono-web-1.1.13.4-2 mono-data-oracle-1.1.13.4-2 mono-winforms-1.1.13.4-2 mono-locale-extras-1.1.13.4-2 mono-data-postgresql-1.1.13.4-2 mono-basic-1.1.13.4-2 mono-data-firebird-1.1.13.4-2 I don't know if that's significant, but I thought I'd mention it. (Is it possible that the version of beagle that I installed is missing a dependency on one of these other mono packages?)
Is this a 64-bit machine?
I'm having this problem on two machines. One is x86_64, the other is i386. In other words, this problem is not specific to the x86_64 architecture.
Ok, let's try gdb again. Can you install a debuginfo package for mono? It might be mono-core-debuginfo, I'm not sure what it is on FC5. Start gdb, and after it's running, do "gdb --pid=`pidof beagled`" and it should attach. You'll need to run these commands: handle SIGPWR nostop noprint handle SIGXCPU nostop noprint handle SIG33 nostop noprint and then "continue". It should run until it crashes.
I tried what you suggested; the output that I got is below. Is this helpful at all? (Incidentally, this is the output that my x86_64 machine produced. I can also post the output from my i386 machine if you like.) Program received signal SIGSEGV, Segmentation fault.
+ Trace 67283
Thread 6 (Thread 1075988816 (LWP 13322))
Thread 2 (Thread 1085442384 (LWP 13326))
Yeah, it looks helpful. It appears to be crashing inside some mono function, so this might be a bug in Mono. It would be good if you could move your ~/.wapi directory out of the way and try restarting. Some more info that would be helpful: when it crashes in gdb, and you see the backtrace for just your thread, ie: rogram received signal SIGSEGV, Segmentation fault.
+ Trace 67306
Thread 1081174352 (LWP 13324)
If you could do "call mono_pmip (0x000000004046b8f8)" (substitute the value for whatever the value of the stack frame is) that would be helpful. It'll print out on the console you're running beagle, not the gdb console. Another thing you might want to try, if you can, is running against a different version of mono.
I tried removing my .wapi directory and starting beagled again, but it still crashed. And I'm already running the most current version of mono (I believe), and I couldn't downgrade without breaking some dependencies. I tried the "call mono_pmip" command in gdb as you suggested, but I don't think that it worked correctly. Nothing appeared on the console where I was running beagle. The output that I got in gdb is below. Let me know if I should try something different. Program received signal SIGSEGV, Segmentation fault.
+ Trace 67308
Thread 1081239888 (LWP 22896)
Program received signal SIGSEGV, Segmentation fault. mono_pmip (ip=0x4046b9e0) at mini.c:190 190 { The program being debugged was signaled while in a function called from GDB. GDB remains in the frame where the signal was received. To change this behavior use "set unwindonsignal on" Evaluation of the expression containing the function (mono_pmip) will be abandoned. (gdb) set unwindonsignal on (gdb) call mono_pmip (0x000000004046b9e0) Program received signal SIGSEGV, Segmentation fault. mono_pmip (ip=0x4046b9e0) at mini.c:190 190 { The program being debugged was signaled while in a function called from GDB. GDB has restored the context to what it was before the call. To change this behavior use "set unwindonsignal off" Evaluation of the expression containing the function (mono_pmip) will be abandoned.
Eric, sorry I haven't gotten back to you sooner about this. Someone in our office has been seeing the same crash, and I am pretty sure it is a mono bug. The mono guys have recently released mono 1.1.14, you may want to try that and see if the problem persists. I've filed this in the Novell bugzilla here: https://bugzilla.novell.com/show_bug.cgi?id=162354 Unfortunately I think the access restrictions will prevent you from being allowed to see the bug. If you get a Novell bugzilla account I can add you to the CC for it. Right now, though, there's been no progress on it.
Upgrading to mono 1.1.14 did not resolve the issue, unfortunately. I created a Novell bugzilla account; if someone can add me to the cc list, that would be great. Thanks!
The mono bug mentioned above has been closed; here is the final comment from the Novell bugzilla: >We concluded that the beagle indexer is crashing on a particular data pattern. >The beagle team will be trying to figure out what exactly triggers the bug. > >This is what Cameron told me: > >CMeadors.Novell: I recently nuked my evolution cache >CMeadors.Novell: It seems that bug you were tracking down on my computer is >gone now >... >Dick Porter: or maybe someone from the beagle team could inspect the cache file >for reasons why it might be suspect? >CMeadors.Novell: I will work on that > >This bug has been reassigned to beagle for now. Is there anything I can do to help resolve this bug? (In particular, is my machine the only machine where this bug can be reproduced right now?)
Ok, if Dick's right, we need to find out what the data is that's causing the crash. Can you confirm that it always crashes on the same data? In comment 3 above, you reference these: Debug: +email://1143234013.30048.21@localhost.localdomain/INBOX;uid=12760579#0 Debug: +email://1143234013.30048.21@localhost.localdomain/INBOX;uid=12780223# Can you see if this is the message that causes problems?
It definitely doesn't crash on the same data every time. (Or at the very least, the final debug message isn't the same every time.) However, I did notice that it seems to have crashed on the following messages several times in a row: Debug: +email://local@local/Kelly Buhler;uid=36 Debug: +email://local@local/Mitch;uid=42 Debug: +email://local@local/Genome Biology;uid=49 Debug: +email://local@local/Reilein Paper;uid=12 Debug: +email://local@local/Ptacek Lab;uid=184 Debug: +email://local@local/Information;uid=305 Debug: +email://local@local/Receipts;uid=591 Debug: Helper Size: VmRSS=44.5 MB, size=3.61, 65.4% Debug: The daemon appears to have gone away. Debug: Shutting down helper. Debug: (1) Waiting for 1 worker... Debug: waiting for server '/home/ebair/.beagle/socket-helper' Segmentation fault Debug: Exiting Debug: Server '/home/ebair/.beagle/socket-helper' shut down Do you want me to attach some files related to these messages above? If so, can you let me know what you need?
Ah, well, in this case it appears that the daemon is crashing, and not the helper. So take a look at the ~/.beagle/Log/current-Beagle log rather than at the IndexHelper logs. That might give some more insight. Another thing to try is to start narrowing it down to specific backends using the --allow-backend and/or --deny-backend flags. It's a little tedious, but that would help a lot.
The current-Beagle log is below: 060614 1115244525 03996 Beagle ERROR: Caught exception while trying to parse Kopete contact list: Document element did not appear. file:///home/ebair/.kde/share/apps/kopete/contactlist.xml Line 1, position 1. 060614 1115245007 03996 Beagle ERROR: Unable to start EvolutionDataServer backend: Unable to find or open libecal-1.2.so.3 When I run beagled with the --debug flag, it displays an identical message each time that it starts. I doubt that's related to the bug, however, since the program continues to run normally after displaying the above message. And as I noted in an earlier comment, if I execute the command "beagled --deny-backend EvolutionMail," the daemon runs normally. The problem only seems to occur when I try to index my evolution mail.
Ok, if you don't mind, can you run with "--allow-backend EvolutionMail" and attach the full beagled and helper log to the bug?
Created attachment 67364 [details] helper log file I ran beagled with the "--allow-backend EvolutionMail" flag as you requested, and I am attaching the helper log file. (The beagle log file was blank, so I didn't bother to attach it.)
The beagle log is the important one, as that's what's crashing. Can you make sure to run beagle with --debug?
When I produced the above attachment, I used the following command at the command line: $ beagled --fg --debug --allow-backend EvolutionMail As I said earlier, the resulting log file (entitled "2006-06-14-14-56-09-Beagle") was blank. If I don't include the "--allow-backend EvolutionMail" flag, the beagle log file looks like this: 060614 1309358147 06080 Beagle ERROR: Caught exception while trying to parse Kopete contact list: Document element did not appear. file:///home/ebair/.kde/share/apps/kopete/contactlist.xml Line 1, position 1. 060614 1309358594 06080 Beagle ERROR: Unable to start EvolutionDataServer backend: Unable to find or open libecal-1.2.so.3 As I said, this is the same error message that appears on the console when I run "beagled --fg --debug," and the program doesn't crash until long after this message appears. Moreover, I get the same message when I run the program with the "--deny-backend EvolutionMail" flag, in which case it doesn't crash at all. It's looking like the beagle log file isn't giving us any useful information.
Are you running packages from mandriva by any chance? I think they (brokenly) patch their RPMs such that beagled doesn't print out any info. Otherwise, try running beagle in the foreground: beagle --fg <other args>
I am using the beagle packages distributed by Fedora Core (beagle-0.2.6-1.fc5.1). And once again, the command line that I used was $ beagled --fg --debug --allow-backend EvolutionMail Note that I used the "--fg" flag.
Can you pipe the output to a file then? Without the beagle logs, we've got nothing to work from.
Created attachment 67447 [details] piped beagle output file I'm attaching a piped output file, as you requested. As far as I can tell, it's merely a condensed version of the helper log file, so I doubt it's going to tell you anything that you couldn't learn from the helper log file that I already attached. But I'm attaching it anyway just in case I'm wrong.
Yeah, you're right. There must be a patch or something that Fedora is applying to turn off debug output from beagled. How irritating.
Yes, I just confirmed they do patch this and they do it in a very broken way. I have filed a fedora bug here: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=195621 One of the Fedora developers is very helpfully rebuilding the package, I'll post a URL for it when he's done.
No package, unfortunately. The Fedora guy is having a hard time with their new build system. In the meantime, if you have a devel environment, you can download the SRPM and remove the beagle-0.2.1-spew.patch and rebuild it.
Created attachment 68138 [details] beagle log file Sorry I didn't get back to you sooner; I'm not at expert at rebuilding packages, so I decided to wait until Fedora released a patched version. At any rate, I installed the patched version today, and now beagle log is no longer empty. I'm attaching it. I hope it contains some useful information. Let me know if you need me to do anything else.
It looks like there haven't been any further comments since I posted the new log file above. Was this log what you needed? Do you need me to do anything else? Please let me know if I can do anything to help resolve this issue.
I am sure Joe will attend to you once he gets his backlog cleared :). Just had a look at the last attachment; it isnt complete - especially the last part where it crashes isnt present. It does contain some excpetions while parsing an evolution email message summary file; maybe that will suffice. But if you could attach a complete log (start to crash) it might be helpful. Just tee the output or use "script" to record everything.
Yeah, I just hadn't gotten to it yet. Bera's right, the log file is incomplete; the crash isn't in the file at all.
Created attachment 68842 [details] piped beagle output file Well, rather than attaching the file from the ~/.beagle/Log directory, I piped the output to a file, which is attached. It doesn't seem to be much different from the log file that I attached previously, however. If this new log file isn't helpful, please let me know what I should do differently next time.
Aha! At long last, I managed to fix the problem. Based on the log file, beagle seemed to be crashing when it encountered the /home/ebair/.evolution/mail/imap/ebair@ebair.pobox.stanford.edu/folders/INBOX/subfolders/In/summary file. Well, I immediately thought that this path looked odd, since the INBOX folder of that e-mail account doesn't have a subfolder named "In." So I went to that directory and looked at its contents. I noticed that the "In" directory had not been accessed since 2004, suggesting to me that it was no longer being used by evolution. So I deleted that directory. Bingo! Now beagle no longer crashes. I'm almost certain that the file that was causing beagle to crash was an old 1.0 evolution data file that had never been deleted for some reason. I don't know why it was causing a crash, but I would speculate that beagle simply didn't know how to read the old data file. At any rate, since the problem is fixed, I'm going to close this bug. I'll attach a backup copy of the evolution summary file that's triggering the crash just in case someone wants to take a look at it and figure out what's causing it.
Created attachment 68843 [details] evolution summary file that causes beagle to crash
Excellent, thanks for tracking this down. I will take a look at the summary file and see if I can harden the mail backend against this type of thing.
I just tried this summary file and it worked okay for me. Is it possible you moved this data between a 32-bit and 64-bit box?
Yeah, it's definitely a 32- to 64-bit problem. Your summary is in 32 bits, but your logs indicate you are running on a 64-bit machine. This is a bug in the camel mail library that Evolution uses for its data. There is no way to detect whether or not the summary file was created on a 32-bit or 64-bit machine, and it stores its data differently depending. I've filed this against evolution-data-server: http://bugzilla.gnome.org/show_bug.cgi?id=347433 Unfortunately there is no easy fix for this.
Yeah, you're definitely right. I rsync my .evolution folder between several different machines, one of which is an x86_64 machine. When I copied my .evolution folder from my x86_64 machine to an i386 machine and ran beagled, it crashed with the same error as before. When I opened up evolution and let it replace the x86_64 summary files with i386 files, then beagled ran normally. I guess I may have closed this bug too quickly. If I rsync my .evolution folder to a different machine, beagled will crash if it tries to index that folder before I get a chance to run evolution on that machine. This isn't the end of the world, since I can always restart beagled after running evolution, but it is kind of annoying. I'm reopening the bug with a more descriptive title. I guess it probably can't be fixed until the bug you filed against evo-data-server is patched, but I suppose we should keep this bug open in the mean time.
I'm going to close it again, for a couple of reasons: (a) With the summary format as it currently is, there's no way we can fix this. Only a change to the Evo summary format will fix this, and even after that, this summary format has been around for years and is likely to be widely deployed for a long time to come. We still need to support it. The negative side-effect of this bug is outweighed by the fact that it works in almost all cases. (b) Basically, it's blocking on an e-d-s bug and I don't really want to clutter up the beagle bugzilla with it. What kinda surprises me about this is that Evo doesn't crash with it. Looking at the code, it clearly doesn't handle this situation correctly.