GNOME Bugzilla – Bug 512605
crash in Evolution: I pressed the "apply fil...
Last modified: 2008-10-01 16:54:56 UTC
What were you doing when the application crashed? I pressed the "apply filters" option under "Message" menu. Distribution: Debian lenny/sid Gnome Release: 2.20.3 2008-01-12 (Debian) BugBuddy Version: 2.20.1 System: Linux 2.6.16-2-686-smp #1 SMP Fri Aug 18 19:25:21 UTC 2006 i686 X Vendor: The X.Org Foundation X Vendor Release: 10300000 Selinux: No Accessibility: Disabled GTK+ Theme: Clearlooks Icon Theme: gnome Memory status: size: 50966528 vsize: 50966528 resident: 13893632 share: 8933376 rss: 13893632 rss_rlim: 4294967295 CPU usage: start_time: 1201535677 rtime: 60 utime: 52 stime: 8 cutime:0 cstime: 0 timeout: 0 it_real_value: 0 frequency: 100 Backtrace was generated from '/usr/lib/bug-buddy/evolution-exchange-storage' Using host libthread_db library "/lib/libthread_db.so.1". [Thread debugging using libthread_db enabled] [New Thread 0xa69076b0 (LWP 20916)] [New Thread 0xa646cb90 (LWP 20918)] 0xa6fb9321 in waitpid () from /lib/libpthread.so.0
+ Trace 187081
Thread 1 (Thread 0xa69076b0 (LWP 20916))
----------- .xsession-errors (14910 sec old) --------------------- ** (nm-applet:4477): WARNING **: <WARN> nma_dbus_init(): could not acquire its service. dbus_bus_acquire_service() says: 'Connection ":1.16" is not allowed to own the service "org.freedesktop.Networ ** (nm-applet:4477): WARNING **: <WARN> nma_dbus_init(): could not acquire its service. dbus_bus_acquire_service() says: 'Connection ":1.16" is not allowed to own the service "org.freedesktop.Networ ** (nm-applet:4477): WARNING **: <WARN> nma_dbus_init(): could not acquire its service. dbus_bus_acquire_service() says: 'Connection ":1.16" is not allowed to own the service "org.freedesktop.Networ ** (nm-applet:4477): WARNING **: <WARN> nma_dbus_init(): could not acquire its service. dbus_bus_acquire_service() says: 'Connection ":1.16" is not allowed to own the service "org.freedesktop.Networ ** (nm-applet:4477): WARNING **: <WARN> nma_dbus_init(): could not acquire its service. dbus_bus_acquire_service() says: 'Connection ":1.16" is not allowed to own the service "org.freedesktop.Networ ...Too much output, ignoring rest... --------------------------------------------------
*** Bug 514234 has been marked as a duplicate of this bug. ***
*** Bug 513487 has been marked as a duplicate of this bug. ***
*** Bug 513180 has been marked as a duplicate of this bug. ***
*** Bug 512588 has been marked as a duplicate of this bug. ***
*** Bug 514328 has been marked as a duplicate of this bug. ***
*** Bug 516304 has been marked as a duplicate of this bug. ***
*** Bug 516313 has been marked as a duplicate of this bug. ***
*** Bug 517175 has been marked as a duplicate of this bug. ***
*** Bug 518036 has been marked as a duplicate of this bug. ***
*** Bug 518609 has been marked as a duplicate of this bug. ***
*** Bug 519008 has been marked as a duplicate of this bug. ***
*** Bug 520341 has been marked as a duplicate of this bug. ***
*** Bug 520338 has been marked as a duplicate of this bug. ***
*** Bug 520048 has been marked as a duplicate of this bug. ***
*** Bug 520931 has been marked as a duplicate of this bug. ***
*** Bug 521588 has been marked as a duplicate of this bug. ***
Dear Bug reporters, Thanks for taking the time to report this bug. Unfortunately, that stack trace is missing some elements that will help a lot to solve the problem, so it will be hard for the developers to fix that crash. Can you get us a stack trace with debugging symbols? Please see http://live.gnome.org/GettingTraces for more information on how to do so and reopen this bug or report a new one. Thanks in advance!
Is this a better stack trace? Or do I need to compile libc6-i686 and liborbit2.0 as debug and try again? ;-] Note that the bug appears to be triggered when running filters on messages, but not -always- - certainly immediately after restarting, it tends to be ok, but if I leave evolution open for a couple of rounds of checking for new messages, it then triggers this fault. Mostly. I'm not sure if that helps identify it or not. ----- Distribution: Debian lenny/sid Gnome Release: 2.20.3 2008-01-12 (Debian) BugBuddy Version: 2.20.1 System: Linux 2.6.22-3-686-bigmem #1 SMP Sun Feb 10 21:47:55 UTC 2008 i686 X Vendor: The X.Org Foundation X Vendor Release: 10300000 Selinux: No Accessibility: Disabled GTK+ Theme: Simple Icon Theme: gnome Memory status: size: 73728000 vsize: 73728000 resident: 19537920 share: 9293824 rss: 19537920 rss_rlim: 4294967295 CPU usage: start_time: 1205166763 rtime: 152 utime: 132 stime: 20 cutime:0 cstime: 0 timeout: 0 it_real_value: 0 frequency: 100 Backtrace was generated from '/usr/lib/bug-buddy/evolution-exchange-storage' Using host libthread_db library "/lib/i686/cmov/libthread_db.so.1". [Thread debugging using libthread_db enabled] [New Thread 0xb65e76d0 (LWP 9846)] [New Thread 0xb6411b90 (LWP 9847)] 0xffffe410 in __kernel_vsyscall ()
+ Trace 191992
Thread 1 (Thread 0xb65e76d0 (LWP 9846))
----------- .xsession-errors --------------------- (gnome-panel:3309): Wnck-WARNING **: Unhandled action type (nil) (gnome-panel:3309): Wnck-WARNING **: Unhandled action type (nil) (gnome-panel:3309): Wnck-WARNING **: Unhandled action type (nil) (gnome-panel:3309): Wnck-WARNING **: Unhandled action type (nil) (gnome-panel:3309): Wnck-WARNING **: Unhandled action type (nil) (gnome-panel:3309): Wnck-WARNING **: Unhandled action type (nil) (gnome-panel:3309): Wnck-WARNING **: Unhandled action type (nil) (gnome-panel:3309): Wnck-WARNING **: Unhandled action type (nil) --------------------------------------------------
*** Bug 523036 has been marked as a duplicate of this bug. ***
*** Bug 523422 has been marked as a duplicate of this bug. ***
*** Bug 522063 has been marked as a duplicate of this bug. ***
*** Bug 521661 has been marked as a duplicate of this bug. ***
It has good traces ....thanks Rob
*** Bug 525382 has been marked as a duplicate of this bug. ***
*** Bug 525905 has been marked as a duplicate of this bug. ***
*** Bug 525808 has been marked as a duplicate of this bug. ***
*** Bug 525780 has been marked as a duplicate of this bug. ***
*** Bug 525619 has been marked as a duplicate of this bug. ***
*** Bug 525641 has been marked as a duplicate of this bug. ***
*** Bug 525611 has been marked as a duplicate of this bug. ***
*** Bug 525387 has been marked as a duplicate of this bug. ***
*** Bug 525389 has been marked as a duplicate of this bug. ***
*** Bug 525385 has been marked as a duplicate of this bug. ***
*** Bug 525383 has been marked as a duplicate of this bug. ***
It would be nice to see a "thread apply all bt" gdb command on evolution-data-server and evolution-exchange processes too. Also the whole output on evolution's, evolution-data-server's and evolution-exchange's console. I'm not sure whether it will help or not, but can make a better view on it.
*** Bug 526897 has been marked as a duplicate of this bug. ***
*** Bug 526222 has been marked as a duplicate of this bug. ***
*** Bug 527165 has been marked as a duplicate of this bug. ***
*** Bug 527490 has been marked as a duplicate of this bug. ***
*** Bug 528544 has been marked as a duplicate of this bug. ***
*** Bug 529526 has been marked as a duplicate of this bug. ***
*** Bug 529912 has been marked as a duplicate of this bug. ***
*** Bug 529934 has been marked as a duplicate of this bug. ***
*** Bug 529948 has been marked as a duplicate of this bug. ***
*** Bug 530406 has been marked as a duplicate of this bug. ***
*** Bug 530499 has been marked as a duplicate of this bug. ***
*** Bug 530518 has been marked as a duplicate of this bug. ***
Debian lenny/sid specific bug? What is the version of ldap there? I see it's patched with ntlm patch, but for some reason it crashes. Can someone point me to sources of that patched ldap?
Regarding Comment #48: The version of ldap in Debian sid appears to be '2.4.7' (plus Debian patches). Strangely, the package name is still 'openldap2.3'. Check out http://packages.debian.org/source/sid/openldap2.3 That page seems to have both a link to the upstream source tar file and a link to the diff that Debian applies to the upstream source.
Thanks for the pointer. I tried to reproduce with those packages compiled on my F8 box, with all debian patches in proper order, and it works. I'm sure I'm using this compiled version. I either do something wrong or the problem is somewhere else.
*** Bug 530934 has been marked as a duplicate of this bug. ***
*** Bug 531049 has been marked as a duplicate of this bug. ***
*** Bug 531855 has been marked as a duplicate of this bug. ***
*** Bug 531962 has been marked as a duplicate of this bug. ***
*** Bug 532032 has been marked as a duplicate of this bug. ***
*** Bug 532133 has been marked as a duplicate of this bug. ***
*** Bug 533070 has been marked as a duplicate of this bug. ***
I thing bug #533411 and bug #533463 are also duplicates of this bug. This bug doesn't happen because of filters: it happens because of LDAP lookup (Global Address List lookup on Exchange servers, in particular). I'm running on a Ubuntu 8.04 (Hardy) system. However, as below, I'm running my own, locally-built versions of Evolution & friends. Here's a stack trace from a crash I got when trying to type in an address in the To: field of the composer. The first address I typed worked, the second one caused evo exchange to crash as follows: Attaching to program: /opt/evo/libexec/evolution/2.22/evolution-exchange-storage, process 18082 ... Program received signal SIGABRT, Aborted. [Switching to Thread 0xb649f720 (LWP 18082)] 0xb7fbc410 in __kernel_vsyscall () (gdb) thread apply all bt
+ Trace 198136
A few notes here: first, I'm running the very latest gnome-2.22 branch from SVN of evolution, e-d-s, and evolution-exchange. I have them compiled locally with full debugging enabled, no optimization, and all logging enabled. Second, this bug is not directly reproducible: I'll try it again and it will work. However, I can make it happen once every day or two under moderate use; more under heavy use.
I installed the libldap-2.4.2-dbg package on my Ubuntu box to get the ldap libs built with debugging. I just got another core (this time when I was clicking on an email in my summary window, and I see "Retriving message ....." at the bottom of the Evo window). Even though it happens in a totally different place, this one someplace where it's not obvious that we're doing an LDAP lookup at all, the stack trace is similar. Here it is with a bit more debugging (although still not as useful as it could be because it was built with -O): Program received signal SIGABRT, Aborted.
+ Trace 198149
Thread 140227984406432 (LWP 6228)
I'm beginning to think that ALL of my Evo Exchange core dumps I've been seeing constantly with Evo 2.22 in Hardy are due to this GAL/LDAP lookup bug!! I'll keep running it in the debugger. I'd appreciate any pointers to how I can help fix this problem.
*** Bug 533463 has been marked as a duplicate of this bug. ***
*** Bug 533411 has been marked as a duplicate of this bug. ***
Paul, this happens while looking up contacts. Im still analysing this bug. I have heard so many complaints on it. I suspect a lock/race issue here. If you can give me a valgrind --tool=memcheck report as well, it will be really helpful. Thanks
OK, I've renamed my evoluton-exchange-server to evolution-exchange-server.bin and created a shell script to invoke the real one under valgrind. So far, not much interesting but I haven't had the crash yet either. I have to say that the crash seemed somewhat timing related: it seemed to me that it happened less frequently when I ran evolution-exchange-server with the debugger attached for example. But it might just be my imagination. If it's true, though, it might be next to impossible to repro under valgrind. I'll let you know what I find. I'm also looking at how to build libldap from scratch so I have the source code to debug as well. I'm not very familiar with building packages from DEB source.
I'm not sure I've found anything real; evolution-exchange-server hasn't died on me yet. But, I've seen an odd output from valgind which might point to some kind of bug; here's the output (I've built myself a debug version of the LDAP libraries from the Ubuntu source DEB as well): ==00:00:26:48.006 18201== ==00:00:26:48.006 18201== Invalid read of size 4 ==00:00:26:48.006 18201== at 0x4048ABF: ldap_send_server_request (request.c:213) ==00:00:26:48.006 18201== by 0x4048FA3: ldap_send_initial_request (request.c:136) ==00:00:26:48.006 18201== by 0x403DE77: ldap_sasl_bind (sasl.c:148) ==00:00:26:48.006 18201== by 0x403E159: ldap_sasl_bind_s (sasl.c:182) ==00:00:26:48.006 18201== by 0x403E32C: ldap_simple_bind_s (sbind.c:113) ==00:00:26:48.006 18201== by 0x4E1F567: connect_ldap (e2k-global-catalog.c:327) ==00:00:26:48.006 18201== by 0x4E1F796: get_gc_connection (e2k-global-catalog.c:396) ==00:00:26:48.006 18201== by 0x4E1F026: gc_search (e2k-global-catalog.c:205) ==00:00:26:48.006 18201== by 0x4E20875: e2k_global_catalog_lookup (e2k-global-catalog.c:798) ==00:00:26:48.006 18201== by 0x8061397: unmangle_sender_field (mail-stub-exchange.c:2344) ==00:00:26:48.006 18201== by 0x80618F7: get_message (mail-stub-exchange.c:249 ==00:00:26:48.006 18201== by 0x403DE77: ldap_sasl_bind (sasl.c:148) ==00:00:26:48.006 18201== by 0x403E159: ldap_sasl_bind_s (sasl.c:182) ==00:00:26:48.006 18201== by 0x403E32C: ldap_simple_bind_s (sbind.c:113) ==00:00:26:48.006 18201== by 0x4E1F567: connect_ldap (e2k-global-catalog.c:327) ==00:00:26:48.006 18201== by 0x4E1F796: get_gc_connection (e2k-global-catalog.c:396) ==00:00:26:48.006 18201== by 0x4E1F026: gc_search (e2k-global-catalog.c:205) ==00:00:26:48.006 18201== by 0x4E20875: e2k_global_catalog_lookup (e2k-global-catalog.c:798) ==00:00:26:48.006 18201== by 0x8061397: unmangle_sender_field (mail-stub-exchange.c:2344) ==00:00:26:48.006 18201== by 0x80618F7: get_message (mail-stub-exchange.c:2493) ==00:00:26:48.006 18201== by 0x8065046: connection_handler (mail-stub.c:271) ==00:00:26:48.007 18201== Address 0x6329210 is 32 bytes inside a block of size 48 free'd ==00:00:26:48.007 18201== at 0x402265C: free (vg_replace_malloc.c:323) ==00:00:26:48.007 18201== by 0x406DB99: ber_memfree_x (memory.c:152) ==00:00:26:48.036 18201== by 0x4048221: ldap_free_connection (request.c:688) ==00:00:26:48.036 18201== by 0x403460A: try_read1msg (result.c:564) ==00:00:26:48.036 18201== by 0x40361DB: ldap_result (result.c:400) ==00:00:26:48.036 18201== by 0x4E1EF7C: gc_ldap_result (e2k-global-catalog.c:182) ==00:00:26:48.037 18201== by 0x4E1F0D3: gc_search (e2k-global-catalog.c:217) ==00:00:26:48.037 18201== by 0x4E20875: e2k_global_catalog_lookup (e2k-global-catalog.c:798) ==00:00:26:48.037 18201== by 0x8061397: unmangle_sender_field (mail-stub-exchange.c:2344) ==00:00:26:48.037 18201== by 0x80618F7: get_message (mail-stub-exchange.c:2493) ==00:00:26:48.037 18201== by 0x8065046: connection_handler (mail-stub.c:271) ==00:00:26:48.037 18201== by 0x4F82C5C: (within /usr/lib/libglib-2.0.so.0.1600.3) It seems to be something weird about the loop in e2k-global-catalog.c:217:gc_search(), like this: for (try = 0; try < 2; try++) { ldap_error = get_gc_connection (gc, op); if (ldap_error != LDAP_SUCCESS) return ldap_error; ldap_error = ldap_search_ext (gc->priv->ldap, base, scope, filter, (char **)attrs, FALSE, NULL, NULL, NULL, 0, &msgid); if (ldap_error == LDAP_SERVER_DOWN) continue; else if (ldap_error != LDAP_SUCCESS) return ldap_error; ldap_error = gc_ldap_result (gc->priv->ldap, op, msgid, msg); if (ldap_error == LDAP_SERVER_DOWN) continue; else if (ldap_error != LDAP_SUCCESS) return ldap_error; return LDAP_SUCCESS; } What's apparently happening is that the first LDAP request is failing and when it fails, libldap is freeing some memory (that's the free operation) in the LDAP structure. The ldap_free_connection() call in the trace only happens if we eventually return an error of LDAP_SERVER_DOWN. It might be a connection timeout, or something else. Anyway, we go back around the loop and invoke get_gc_connection() which eventually calls ldap_send_server_request() where it tries to access the lconn_status member of the LDAPConn structure, which gives the above error. As I said, I'm not convinced this is the actual problem because it didn't dump core or anything for me when this happened.
*** Bug 534026 has been marked as a duplicate of this bug. ***
*** Bug 534053 has been marked as a duplicate of this bug. ***
Paul, I think you got closer. Can you do some debug output like the try count, ldap_error so that we can find out in which iter it fails and what was the prev error. Note: As you said, avoid, gdb/valgrind when you do debugging, as it can help you crash easily. [Sorry for not doing much, I have no way to reproduce this and it hardly happens for any of us with our test setups here and thanks a lot for your support]
*** Bug 534158 has been marked as a duplicate of this bug. ***
*** Bug 534214 has been marked as a duplicate of this bug. ***
*** Bug 534263 has been marked as a duplicate of this bug. ***
Just caught it again. Here, just as suspected, "try" is 1 and "last_error" is -1. If you look at the LDAP code where the free happens (as reported by valgrind) you'll see the only possible return code for that code path is LDAP_SERVER_DOWN so this is expected. It seems that when the server connection is lost for whatever reason, the ldap library is freeing some data but then when we try to reconnect, we're trying to re-use that data. Unfortunately my company is moving offices this weekend so we're packing up today and most systems will be down over the long weekend (in the U.S. Monday is a holiday). I'll try building a debug version of libldap from source and see if I can get a core from that, which might help.
Aha. I just got it to dump core in my version with a locally-built, debuggable libldap. Here's the trace: (gdb) bt
+ Trace 198414
$1 = (Sockbuf *) 0x812f728 SOCKBUF_VALID is defined as: ( (sb)->sb_valid == LBER_VALID_SOCKBUF ) (gdb) p *sb $2 = {sb_opts = {lbo_valid = 0, lbo_options = 0, lbo_debug = 65}, sb_iod = 0x811ceb8, sb_fd = 135526392, sb_max_incoming = 135460864, sb_trans_needs_read = 0, sb_trans_needs_write = 0} I assume that sb_valid is mapped to sb_opts.lbo_valid or similar. It's pretty clear this structure is not as expected. I'm still looking at this but fyi.
OK, found it. As far as I can see this is a bug in openldap. See the report I filed that describes in detail what's going wrong: http://www.openldap.org/its/index.cgi/Incoming?id=5525;page=1;statetype=1
Awesome Paul.
Chuck Short created a PPA package of libldap for Ubuntu Hardy that contains the fix Howard Chu added to the OpenLDAP source base: http://launchpad.net/~zulcss/+archive I've installed it on my system and I've added a debugging message to the loop in gc_search() so I can see how many times it goes through the try. I'll keep working with Evo and see if I can reproduce the situation (where try goes to > 0) and make sure we don't get a crash. However, due to my company moving to a new building this weekend the Exchange server is offline ATM; I'll let you know how it goes.
*** Bug 534452 has been marked as a duplicate of this bug. ***
*** Bug 535130 has been marked as a duplicate of this bug. ***
*** Bug 535219 has been marked as a duplicate of this bug. ***
*** Bug 536204 has been marked as a duplicate of this bug. ***
(In reply to comment #75) > Chuck Short created a PPA package of libldap for Ubuntu Hardy that contains the > fix Howard Chu added to the OpenLDAP source base: > http://launchpad.net/~zulcss/+archive I've hit this bug 3 or 4 times today, so I have now installed libldap from <http://launchpad.net/~zulcss/+archive>. So if it stops crashing it should be a good indication that it fixed it. (=
*** Bug 536392 has been marked as a duplicate of this bug. ***
*** Bug 536326 has been marked as a duplicate of this bug. ***
*** Bug 536867 has been marked as a duplicate of this bug. ***
*** Bug 537000 has been marked as a duplicate of this bug. ***
*** Bug 537451 has been marked as a duplicate of this bug. ***
*** Bug 537485 has been marked as a duplicate of this bug. ***
*** Bug 538834 has been marked as a duplicate of this bug. ***
*** Bug 538796 has been marked as a duplicate of this bug. ***
*** Bug 539133 has been marked as a duplicate of this bug. ***
I think this bug should be marked closed: this is not a bug in Evolution, or even in Gnome: it's a bug in OpenLDAP. Since the fix to OpenLDAP I've never seen this happen again, and there's nothing Evo can do about it anyway (this bug also causes failures in lots of other tools including ones as diverse as Apache, xscreensaver, etc.) The fix package has been promoted to Hardy main; see the Ubuntu bug report here: https://bugs.launchpad.net/ubuntu/+source/openldap2.3/+bug/215904 (I know it doesn't look like an Evo bug but it IS the same bug). Here's the relevant bug from Debian's BTS: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=484802 --it doesn't seem that it's been fixed in Debian yet but Debian folks can track that bug instead of this one. If you're using a distribution other than these please search for bugs including a reference to ber_flush2 in that distro's ldap packages and it will probably get you to the right place. Maybe users of other distros can add notes pointing to the relevant bugs for their distros so they can be tracked. Cheers!
Paul you rock. Thanks a lot for your efforts. I'm closing this as NOTGNOME.
*** Bug 539698 has been marked as a duplicate of this bug. ***
*** Bug 540863 has been marked as a duplicate of this bug. ***
*** Bug 540868 has been marked as a duplicate of this bug. ***
*** Bug 541005 has been marked as a duplicate of this bug. ***
*** Bug 540888 has been marked as a duplicate of this bug. ***
*** Bug 541194 has been marked as a duplicate of this bug. ***
*** Bug 541353 has been marked as a duplicate of this bug. ***
*** Bug 542291 has been marked as a duplicate of this bug. ***
*** Bug 542320 has been marked as a duplicate of this bug. ***
*** Bug 542831 has been marked as a duplicate of this bug. ***
*** Bug 543043 has been marked as a duplicate of this bug. ***
*** Bug 544573 has been marked as a duplicate of this bug. ***
*** Bug 550702 has been marked as a duplicate of this bug. ***
*** Bug 541116 has been marked as a duplicate of this bug. ***
*** Bug 553352 has been marked as a duplicate of this bug. ***
*** Bug 554570 has been marked as a duplicate of this bug. ***