GNOME Bugzilla – Bug 600449
segfault in camel_msgport_destroy
Last modified: 2009-11-27 12:17:07 UTC
This one appears similar to bug 600322 as the crash emanates from only a few lines below that one, but this one is indeed different. Maybe it's a result of my NOOPing the earlier "g_assert(reply == msg)". However, in the crash below, here's a small analysis of frame 6:
+ Trace 218751
Thread 24 (Thread 3859)
Thread 23 (Thread 10668)
BTW, the message that's printed when this triggers is: GThread-ERROR **: file /build/buildd/glib2.0-2.22.2/gthread/gthread-posix.c: line 171 (g_mutex_free_posix_impl): error 'Device or resource busy' during 'pthread_mutex_destroy ((pthread_mutex_t *) mutex)'
looks related to bug 578827
I seem to have hit this again:
+ Trace 219197
Thread 2 (Thread 18978)
And yet again. Notice that it's a mere 20 minutes since the last one. And evolution was not even up fully since the last time, still setting up vfolders and whatnot. You can imagine how frustrating it is to have evolution fall down again before it's even up. How is a person to get any real work done when the tools are soooooo unreliable? In this case, I was composing a reply to a message sent to me.
+ Trace 219198
Thread 20 (Thread 11538)
Grrrrrr! And again! I am not hitting the cancel button for any of these. Just tell me what is happening so that I can avoid doing it and I can get evolution to stay up for more than 20 minutes. Long enough to even finish setting up the fricken' vfolders already! Man I soooooo want 2.24 back. Nothing, not a thing, that has been done since that release has been at all near worth all of this instability.
+ Trace 219199
~sigh~ And yet again...
+ Trace 219206
Thread 25 (Thread 18239)
I should mention, that prior to one of these crashes, evo always prints out: GThread-ERROR **: file /build/buildd/glib2.0-2.22.2/gthread/gthread-posix.c: line 171 (g_mutex_free_posix_impl): error 'Device or resource busy' during 'pthread_mutex_destroy ((pthread_mutex_t *) mutex)' aborting...
And another. These happen while I am away from the computer even, so most definitely not driven by me canceling anything.
+ Trace 219218
All the threads are same in one way, there is in one thread shown: > #0 0x00c34422 in __kernel_vsyscall () > #1 0x002cdc0b in write () > #2 0x00bdc75c in camel_msgport_push (msgport=0xab92498, msg=0xa05045e0) > at camel-msgport.c:340 > #3 0x00bdc8eb in camel_msgport_reply (msg=0x1) at camel-msgport.c:424 > #4 0x00bdf325 in cs_getaddrinfo (data=0x1) at camel-net-utils.c:662 > #5 0x002c680e in start_thread (arg=0x863fdb70) at pthread_create.c:300 > #6 0x015f57ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130 and then in the other thread (which has number 1 in the latest trace, thought the thread number 3 is a main thread :( ): > #0 0x00c34422 in __kernel_vsyscall () > #1 0x015534d1 in *__GI_raise (sig=6) > #2 0x01556932 in *__GI_abort () at abort.c:92 > #3 0x05540036 in IA__g_logv (... > #4 0x05540066 in IA__g_log (log_domain=0x3d5888 "GThread", ...) > #5 0x003d52c4 in g_mutex_free_posix_impl (mutex=0x99dd58b0) > #6 0x055113cf in IA__g_async_queue_unref (queue=0x96c437f0) > #7 0x00bdcad8 in camel_msgport_destroy (msgport=0xab92498) > at camel-msgport.c:285 > #8 0x00bded53 in cs_waitinfo (worker=<value optimized out>, > msg=<value optimized out>, error=0xbf7ebb "Host lookup failed", > ex=0x8e1fe1e8) at camel-net-utils.c:530 > #9 0x00bdf198 in camel_getaddrinfo (name=0x9aa3de0 "mail", > service= 0x3fa5758 "imap", hints=0x8e1fde90, ex=0x8e1fe1e8) > at camel-net-utils.c:706 > #10 0x03f9ddbb in connect_to_server_wrapper (service=<value optimized > out>, ex=<value optimized out>) at camel-imap-store.c:975 > #11 0x03f9e4d7 in imap_connect (service=0x9a93698, ex=0x8e1fe1e8) > at camel-imap-store.c:1412 > #12 0x008f6c47 in camel_service_connect (service=0x9a93698, ex=0x8e1fe1e8) > at camel-service.c:364 > #13 0x03f9a084 in camel_imap_store_connected (store=0x9a93698, > ex=0x8e1fe1e8) at camel-imap-store.c:3009 > #14 0x03f8f7bf in replay_offline_journal (imap_store=0x9a93698, > imap_folder=0xad2bb08, ex=0x8e1fe1e8) at camel-imap-folder.c:249 > #15 0x03f914b6 in imap_sync (folder=0xad2bb08, expunge=0, ex=0x8e1fe1e8) > at camel-imap-folder.c:1410 > #16 0x008e3ab6 in camel_folder_sync (folder=0xad2bb08, expunge=0, > ex=0x8e1fe1e8) at camel-folder.c:321 > #17 0x00909235 in vee_sync (folder=0x994aaf0, expunge=0, ex=0x8e1fe1e8) > at camel-vee-folder.c:576 > #18 0x008e3ab6 in camel_folder_sync (folder=0x994aaf0, expunge=0, > ex=0x8e1fe1e8) at camel-folder.c:321 > #19 0x0120b36f in refresh_folders_exec (m=0x9a814b20) > at mail-send-recv.c:829 > #20 0x012095d0 in mail_msg_proxy (msg=0x9a814b20) at mail-mt.c:522 > #21 0x0556199f in g_thread_pool_thread_proxy (data=0x974efa0) > #22 0x0556036f in g_thread_create_proxy (data=0x1217cad0) > #24 0x015f57ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130 See the same msgport=0xab92498 in both threads, so there failed something with a waiting on the thread finish. What everything do you have changed exactly in camel-net-utils.c from your evolution-data-server sources of 2.28.1?
(In reply to comment #9) > > See the same msgport=0xab92498 in both threads, Yeah. > so there failed something with > a waiting on the thread finish. What everything do you have changed exactly in > camel-net-utils.c from your evolution-data-server sources of 2.28.1? Other than bug 574940, I have a patch which is probably related to 600322: --- ./camel/camel-net-utils.c 2009-11-16 19:13:27.000000000 -0500 +++ /tmp/camel-net-utils.c 2009-11-16 19:13:11.000000000 -0500 @@ -513,7 +513,13 @@ } else { struct _addrinfo_msg *reply = (struct _addrinfo_msg *)camel_msgport_try_pop(reply_port); +#if 0 g_assert(reply == msg); +#else + if (reply != msg) + fprintf (stderr, "oh no! the world is going to end! " + "reply(%p) != msg(%p)\n", reply, msg); +#endif d(printf("waiting for child to exit\n")); pthread_join(id, NULL); d(printf("child done\n")); Of course, it's not lost on me that that g_assert is there for a reason and simply commenting it out is not a real solution it was the difference between an evolution which stays up for a while and one that crashes pretty soon right after starting it. A better solution very warmly welcomed. :-) FWIW in fact, I don't seem to have failed this "reply != msg" test since my last restart of evolution.
Created attachment 148381 [details] [review] proposed eds patch for evolution-data-server; OK, could you revert all your changes in camel-net-utils.c and apply this patch only, please? I think I got the issue, but it'll be better if you can test it too. Thanks in advance. Note, I believe it'll fix bug #600322 as well.
it would be nice to have this on Monday's release of 2.29.3 for wider testing. So if you'll be able to test it before it, then it'll be great.
(In reply to comment #12) > it would be nice to have this on Monday's release of 2.29.3 for wider testing. Indeed. > So if you'll be able to test it before it, then it'll be great. It's running right now. Installed it earlier this afternoon and (touch wood) Evolution's been up all day. I'm not convinced that without some other/unknown external funkiness that I hit this all that often though after I went through the stdout from Evolution in the last day or two. This one might boil down to yet another situation where if everything on the network isn't just perfect, Evolution throws a fit as I was experiencing it every few minutes when I was experiencing it. Unfortunately, I don't know what it was that Evolution was so upset about at the time. Everything else looked normal. In any case, this patch doesn't seem to introduce any new and bad behavior as per the last few hours of using it.
(In reply to comment #13) > In any case, this patch doesn't seem to introduce any new and bad behavior as > per the last few hours of using it. Good, thanks for the update. Let it run for today and if everything will go well I'll commit this to master and gnome-2-28 tomorrow.
Created commit 7961e6c in eds master (2.29.3+) Created commit e6a6360 in eds gnome-2-28 (2.28.2+)