GNOME Bugzilla – Bug 671988
Quickly registering / unregistering objects on bus causes crash
Last modified: 2012-03-19 21:15:39 UTC
When quickly registering and un-registering objects on the bus using gdbus, it possible to get crashes due to the assert in call_in_idle_cb. Basically that function is called even when there's an invalid vtable, and the assert makes everything crash. I guess that the idle that calls it should be removed when the vtable is unregistered. I'm attaching a stacktrace taken from launchpad bug http:://pad.lv/926208 The problem has been noticed in bamfdaemon but also in other packages.
Created attachment 209654 [details] BAMF crash on call_in_idle_cb I forgot to attach the stacktrace. Here you are.
Fixing up component (I only watch bugs assigned to gdbus).
I will need to look at the source code to see what's going on, please paste a link to an online source viewer for the version this is happening with.
The problem it's happening on the current BAMF version (but not only, just google for that callback function), you can get that at: https://code.launchpad.net/~unity-team/bamf/trunk The most relevant gdbus code is generated by gdbus-codegen, so I'd prefer not introduce workarounds in BAMF itself, but keep using that higher level code.
David: could it be possible that the problem is that the message gets dispatched by the worker into the main thread before the object is unregistered and doesn't get checked again before actually being dispatched (ie: similar to the cancellable stuff we solved recently)?
(In reply to comment #5) > David: could it be possible that the problem is that the message gets > dispatched by the worker into the main thread before the object is unregistered > and doesn't get checked again before actually being dispatched (ie: similar to > the cancellable stuff we solved recently)? No, we do check whether the object is still registered _just_ before doing the callback to the user, see http://git.gnome.org/browse/glib/tree/gio/gdbusconnection.c?id=2.31.20#n4668 so this whole thing looks like memory corruption to me. I asked the reported on IRC to check for valgrind, haven't heard back...
(also note that the vtable is copied at object registration time.. so there is no chance that we're accessing something the user has now freed...)
I've ran bamfdaemon in valgrind, there were some read errors (some of them fixed now) reported in this launchpad bug: http://pad.lv/929468 I've fixed the errors related to bamf itself, but there you can also see one more related to gdbus: ==21512== Invalid read of size 4 ==21512== at 0x47F92C5: call_in_idle_cb (gdbusconnection.c:4678) ==21512== by 0x495B4AF: g_idle_dispatch (gmain.c:4629) ==21512== by 0x495DA49: g_main_context_dispatch (gmain.c:2510) ==21512== by 0x495DE54: g_main_context_iterate.isra.21 (gmain.c:3118) ==21512== by 0x495E29A: g_main_loop_run (gmain.c:3312) ==21512== by 0x4A434D2: (below main) (libc-start.c:226) ==21512== Address 0x7a45c70 is 0 bytes inside a block of size 12 free'd ==21512== at 0x402B06C: free (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) ==21512== by 0x4963B7A: standard_free (gmem.c:98) ==21512== by 0x4963CEF: g_free (gmem.c:252) ==21512== by 0x47F40EB: exported_interface_free (gdbusconnection.c:3881) ==21512== by 0x494C297: g_hash_table_remove_node (ghash.c:484) ==21512== by 0x494CA19: g_hash_table_remove_internal (ghash.c:1274) ==21512== by 0x47FC93A: g_dbus_connection_unregister_object (gdbusconnection.c:5059) ==21512== by 0x481319A: remove_connection_locked (gdbusinterfaceskeleton.c:739) ==21512== by 0x48148D2: g_dbus_interface_skeleton_unexport (gdbusinterfaceskeleton.c:969) ==21512== by 0x806F89B: bamf_view_dispose (bamf-view.c:681) ==21512== by 0x48C6921: g_object_unref (gobject.c:2971) ==21512== by 0x48C453B: g_cclosure_marshal_VOID__VOID (gmarshal.c:85) ==21512== by 0x48C2F8B: g_closure_invoke (gclosure.c:774) ==21512== by 0x48D4844: signal_emit_unlocked_R (gsignal.c:3302) ==21512== by 0x48DC0C1: g_signal_emit_valist (gsignal.c:3033) So, maybe the problem happens during g_dbus_interface_skeleton_unexport that is called in the dispose function of the skeleton's extended class. Maybe is not even really needed to call that, since it should happen automagically when destroying that class (that extens the skeleton, as said)... But the read error seems to stay there (?). If you can, please check if bamf-view.c [1] is doing something wrong. [1] http://is.gd/0j9QT3
(In reply to comment #6) > No, we do check whether the object is still registered _just_ before doing the > callback to the user, see > > http://git.gnome.org/browse/glib/tree/gio/gdbusconnection.c?id=2.31.20#n4668 The first thing the code you linked to does is to pull the vtable out of the invocation's qdata. The invocation doesn't have its own copy of the vtable -- it's set here: http://git.gnome.org/browse/glib/tree/gio/gdbusconnection.c?id=2.31.20#n4798 And presumably it's getting GDBus's internal copy of the vtable (the one that is freed when the object is unregistered). This seems to explain the problem rather conclusively...
Interesting. Presumably we can just move vtable = g_object_get_data (G_OBJECT (invocation), "g-dbus-interface-vtable"); g_assert (vtable != NULL && vtable->method_call != NULL); down after the check. Marco: any chance you can try this and see if the problem goes away?
I look, even if for me is quite hard to reproduce the issue, but I'll try.
Created attachment 210108 [details] [review] gdbusconnection: get the interface vtable after getting the connection This patch does what you asked, I had no crash so far, but for me it was hard to reproduce them even without it...
OK, I've committed this with a slightly more informative commit message and a reference to this bug http://git.gnome.org/browse/glib/commit/?id=322c6e93444e74ae99b73f1ae7e3b55563470431 Thanks, David