GNOME Bugzilla – Bug 597648
Crash due to DBus used from more threads in one time
Last modified: 2013-09-13 01:06:01 UTC
Evolution 2.29.1 ** ERROR **: Out of memory aborting... Program received signal SIGTRAP, Trace/breakpoint trap. 0xb67e2836 in IA__g_logv (log_domain=0x0, log_level=G_LOG_LEVEL_ERROR, format=0xb6c34548 "Out of memory", args1=0xbfffda1c "����@") at gmessages.c:502 502 G_BREAKPOINT (); (gdb) t a a bt
+ Trace 218121
Thread 1 (Thread 0xb63aa720 (LWP 29561))
So after a bit of discussion it seems there are a number of possible duplicates of this which are ultimately caused by threaded usage of dbus-glib/dbus. libdbus only has incomplete support for threading. dbus-glib is not really known to be safe here.
Options here: - make update_query_async use asynchronous dbus calls instead of sync This would entail some rewriting of e-cal.[ch]; I'm not sure of the ultimate scope here - make update_query_async synchronous Obviously lessens interactivity; but how long do we expect to be blocking here? Is this just reading local files, or will we do e.g. http requests? - Further investigate supporting threading in dbus-glib/libdbus Some people are using private connections for this, which is sort of suboptimal but relatively straightforward.
Ross, any opinions on this?
Just for a record, why we are talking about update_query_async: At the top are prints from DBus, to see which function failed before the "Out of memory" abort. Then is shown part of the valgrind output of the abort. protected_change_timeout: no timeouts _dbus_connection_attach_pending_call_unlocked: _dbus_connection_add_timeout_unlocked failed dbus_connection_send_with_reply: _dbus_connection_attach_pending_call_unlocked failed dbus_g_proxy_begin_call_internal: send with reply failed ** ERROR **: Out of memory aborting... 127 m_dispatch/dispatch-amd64-linux.S: No such file or directory. ==2767== ==2767== Invalid read of size 8 ==2767== at 0x8437D60: _dbus_pending_call_get_connection_and_lock (dbus-pending-call.c:309) ==2767== by 0x8424188: reply_handler_timeout (dbus-connection.c:3125) ==2767== by 0x843D592: dbus_timeout_handle (dbus-timeout.c:476) ==2767== by 0x81F48E4: timeout_handler_dispatch (dbus-gmain.c:343) ==2767== by 0x30A82381CA: (within /lib64/libglib-2.0.so.0.2000.4) ==2767== by 0x30A8237ABD: g_main_context_dispatch (in /lib64/libglib-2.0.so.0.2000.4) ==2767== by 0x30A823B277: (within /lib64/libglib-2.0.so.0.2000.4) ==2767== by 0x30A823B6D4: g_main_loop_run (in /lib64/libglib-2.0.so.0.2000.4) ==2767== by 0x344E3422B6: gtk_main (in /usr/lib64/libgtk-x11-2.0.so.0.1600.6) ==2767== by 0x403AFF: main (main.c:623) ==2767== Address 0x1d8a1498 is 32 bytes inside a block of size 72 free'd ==2767== at 0x4A0633D: free (vg_replace_malloc.c:323) ==2767== by 0x8446132: dbus_free (dbus-memory.c:644) ==2767== by 0x8437EE8: _dbus_pending_call_last_unref (dbus-pending-call.c:413) ==2767== by 0x8438118: dbus_pending_call_unref (dbus-pending-call.c:579) ==2767== by 0x84246FF: dbus_connection_send_with_reply_and_block (dbus-connection.c:3334) ==2767== by 0x81FB879: get_name_owner (dbus-gproxy.c:882) ==2767== by 0x81FE13B: dbus_g_proxy_new_for_name_owner (dbus-gproxy.c:1995) ==2767== by 0x71806F0: e_cal_get_query (e-cal.c:3591) ==2767== by 0xF6361F9: update_query_async (gnome-cal.c:1046) ==2767== by 0xF633CA0: message_proxy (gnome-cal.c:185) ==2767== by 0x30A8261F31: (within /lib64/libglib-2.0.so.0.2000.4) ==2767== by 0x30A8260933: (within /lib64/libglib-2.0.so.0.2000.4)
(In reply to comment #2) > - make update_query_async synchronous > Obviously lessens interactivity; but how long do we expect to be blocking > here? Is this just reading local files, or will we do e.g. http requests? The backend can make remote calls too, in order the events are not stored locally yet. Or it can wait for a local cache filling before returning. Thus yes, it can take some time.
Damn. We probably do need to solve the entire threaded apps using libecal/libebook sync APIs problem. I'd done some basic testing and it had been working for me, but obviously it's not working entirely correctly. I'd like to see a move to less threads, so evolution should make async calls instead of sync calls in a thread. Does anyone know where there are threading problems in dbus/dbus-glib? I thought that libdbus itself was threadsafe these days, so any problems are limited to dbus-glib.
*** Bug 597517 has been marked as a duplicate of this bug. ***
*** Bug 596293 has been marked as a duplicate of this bug. ***
(In reply to comment #6) > I'd like to see a move to less threads, so evolution should make async calls > instead of sync calls in a thread. +100. I'm pushing for the same thing on the Camel/mailer side.
(In reply to comment #6) > Damn. > > We probably do need to solve the entire threaded apps using libecal/libebook > sync APIs problem. I'd done some basic testing and it had been working for me, > but obviously it's not working entirely correctly. > > I'd like to see a move to less threads, so evolution should make async calls > instead of sync calls in a thread. > > Does anyone know where there are threading problems in dbus/dbus-glib? I > thought that libdbus itself was threadsafe these days, so any problems are > limited to dbus-glib. libdbus has a simple lock around the connection and won't obviously explode if used from multiple threads, and in fact could be termed "threadsafe". But that doesn't mean it has the semantics that one might want. This is http://bugs.freedesktop.org/show_bug.cgi?id=857 I haven't done much auditing of dbus-glib in this respect.
(In reply to comment #4) > Just for a record, why we are talking about update_query_async: I know for sure there are problems with the dbus-glib/libdbus stack and threads here; it's not going to be immediately useful to trace down exactly what the cause of the double free is here because it will likely come down to some sort of race condition, and to fix that kind of thing requires overall design work.
Created attachment 145414 [details] [review] test eds patch for evolution-data-server; Akhil, could you try to run with this test patch, please? It does nothing special, only adds locking on the connection/proxy on evo side, something similar should be done properly in a lower level, in dbus/dbus-glib, I believe, in a very less invasive manner. Anyway, what you can see is also a freeze of UI sometimes, from which it recovers after a while, so I believe it'll be easier to fix evo for "more threading" (I'm sorry Matt), than waiting for thread-safety on dbus side or rewriting all of the EBook/ECal stuff to be async. The async-ness is too complicated, anyway. Evo backtrace of one such freeze: Thread 2 (Thread 0x7fd0a3fff910 (LWP 780)): #0 in poll () #1 in socket_do_iteration (...) at dbus-transport-socket.c:1046 #2 in _dbus_transport_do_iteration (...) at dbus-transport.c:956 #3 in _dbus_connection_do_iteration_unlocked (...) at dbus-connection.c:1150 #4 in _dbus_connection_block_pending_call (...) at dbus-connection.c:2381 #5 in dbus_g_proxy_end_call_internal (...) at dbus-gproxy.c:2264 #6 in dbus_g_proxy_call (...) at dbus-gproxy.c:2593 #7 in org_gnome_evo...ar_CalView_start (...) at e-data-cal-view-bindings.h:32 #8 in e_cal_view_start (...) at e-cal-view.c:418 #9 in update_query_async (...) at gnome-cal.c:1074 #10 in message_proxy (...) at gnome-cal.c:185 #11 in ?? () #12 in ?? () #13 in start_thread () #14 in clone () #15 in ?? ()
+ Trace 218296
Thread 1 (Thread 0x7fd0fa767800 (LWP 663))
This patch puts locks around the DBus connection, so any threads making concurrent calls via DBus will be serialised. This is very invasive. :/
Yup, I know, but I have nothing better than this :( But I'm still opened to suggestions, even not much willing to rewrite all the evo from a ground.
Patch attached in comment#12 reduces crash frequency a lot.
Ross, what do u suggest, Go with the patch at comment #12 until we get a fix from libdbus/dbus-glib ?
*** Bug 601436 has been marked as a duplicate of this bug. ***
*** Bug 601434 has been marked as a duplicate of this bug. ***
For what it's worth, the patch doesn't apply cleanly to the current git master.
milan, please commit the patch.
Created commit 2f75310 in eds master (2.29.2+)
*** Bug 601888 has been marked as a duplicate of this bug. ***