GNOME Bugzilla – Bug 547568
gvfsd-trash crashed with SIGSEGV in g_main_context_dispatch()
Last modified: 2008-09-23 19:17:14 UTC
the bug has been opened on https://bugs.launchpad.net/ubuntu/+source/gvfs/+bug/254479 "I think this is similar to #252174 but that one was marked as invalid. Maybe there will be something useful in this core dump. I had just logged into a Gnome session after using a KDE4 session.
+ Trace 205087
could be the same issue than bug #547567 but the stacktraces are different
valgrind list such errors too: "==28416== Jump to the invalid address stated on the next line ==28416== at 0x519F0D9: ??? ==28416== by 0x41996E3: g_main_context_prepare (gmain.c:2392) ==28416== by 0x4199B69: g_main_context_iterate (gmain.c:2685) ==28416== by 0x419A3A1: g_main_loop_run (gmain.c:2928) ==28416== by 0x8051772: daemon_main (daemon-main.c:270) ==28416== by 0x8051A44: main (daemon-main-generic.c:39) ==28416== Address 0x519f0d9 is not stack'd, malloc'd or (recently) free'd"
Curious stacktrace, given that _g_dbus_bus_list_names_with_prefix doesn't seem to be called at all anywhere inside gvfs...
*** Bug 547567 has been marked as a duplicate of this bug. ***
*** Bug 547726 has been marked as a duplicate of this bug. ***
gicmo thinks that's because dbus is used in different thread, that seems to be a crasher lot of users are running into seeing the number of duplicates ubuntu is getting, GNOME probably doesn't get those since bug-buddy is not running on 2.23
Created attachment 118615 [details] Test Program I can get a crash with that program 1 out of maybe 7 times (or something). The Stacktrace looks like the following then: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f1c88265770 (LWP 16604)] 0x00007f1c85a15960 in ?? () from /lib/libdbus-1.so.3 (gdb) thread apply all bt full
+ Trace 206700
Thread 2 (Thread 0x41eb9950 (LWP 16607))
And another one: Starting program: /home/gicmo/Devel/temp/crash_test [Thread debugging using libthread_db enabled] [New Thread 0x7fdeb22b3770 (LWP 16861)] [New Thread 0x41275950 (LWP 16864)] XX Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fdeb22b3770 (LWP 16861)] 0x00007fdeafa63960 in ?? () from /lib/libdbus-1.so.3 (gdb) thread apply all bt full
+ Trace 206701
Thread 2 (Thread 0x41275950 (LWP 16864))
Just for the record: removing either the libgioremote-volume-monitor.so module or both .monitor files (ghoto2 and hal) from /usr/share/gvfs/remote-volume-monitors/ fixes the crash here. Just having one of them leads to crashes again. This one is with only the gphoto2 monitor: [Thread debugging using libthread_db enabled] [New Thread 0x7f955fb1d770 (LWP 17327)] [New Thread 0x41c2b950 (LWP 17330)] XX Program exited with code 01. (gdb) r Starting program: /home/gicmo/Devel/temp/crash_test [Thread debugging using libthread_db enabled] [New Thread 0x7f9dee695770 (LWP 17331)] [New Thread 0x425bd950 (LWP 17332)] XX Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f9dee695770 (LWP 17331)] 0x00007f9debe45960 in ?? () from /lib/libdbus-1.so.3 (gdb) thread apply all bt full
+ Trace 206702
Thread 2 (Thread 0x425bd950 (LWP 17332))
(gdb)
Adding dbus_threads_init_default() to your test program makes the crashes go away.
Indeed, couldn't get it to crash yet. Wrong direction ... :-/
Are we sure that gvfsd-trash calls dbus_threads_init ?
Nevermind, it does. So it seems we are back to the earlier hypothesis of what happens: GProxyVolumeMonitor queues an idle to emit a signal, and before that is run, libgio-remote-volume-monitor.so is unloaded, causing a segfault when the idle is run, because the callback is no longer present. If that is the case, here is a patch that might fix it. Unfortunately, I cannot reproduce the crash, so I can't test the patch.
Created attachment 118637 [details] [review] don't leave dangling idles
Additionally, if the problem is related to the unloading of modules (as it seems to be, putting G_DEBUG=resident-modules in the environment should work around the problem.
(In reply to comment #12) > libgio-remote-volume-monitor.so is unloaded, Under what circumstances would libgio-remote-volume-monitor.so ever be unloaded?
All giomodules get loaded in _g_io_modules_ensure_loaded to register their types, and then they get unloaded again until the registered types are actually instantiated. But from what I've seen so far, g_proxy_volume_monitor_setup_session_bus_connection and g_proxy_volume_monitor_teardown_session_bus_connection _seem_ to be doing the right thing with all the timeouts that are attached to the mainloop.
I have a suspicion about this. The helper library "common/" subdirectory in gvfs is (statically) linked into both the daemon and the client module. This is a bit tricky in the case of the trash backend, as that loads the client module into a daemon and thus has two copies of the common code. Since the linking is static there should be no problems with calls to the wrong version of the code, but there might be some global data that points to the other copy and then that somehow gets unmapped when that module is no longer used.
One difference the proxy volume monitoring has on the trash is that it integrates a mainloop (using the common/ code) during the module loading, so this is the main suspect.
I was able to repeat the crash by doing "killall -9 gvfsd-trash; gvfs-ls trash://" until the gvfs-ls failed with a timeout. That means the automatically spawned gvfsd-trash has crashed. I commited some code to the 2.24 branch that uses a single copy of the common code. This means we get a more readable crash, since we're not calling an unloaded address. The output for the crash is now: process 13106: dbus_watch_handle: Watch is invalid, it should have been removed This makes sense in that previously we called an dbus idle causing a crash due to the code being unloaded. Now its not unloaded, but its still getting an idle called when it shouldn't.
Ok. I see what is happening. We run _g_dbus_connection_integrate_with_main() and then soon thereafter _g_dbus_connection_remove_from_main() on a thread. However sometimes the mainloop runs, handling the dbus fd while the other thread calls g_dbus_connection_remove_from_main, which frees the IOHandler while its running.
_g_dbus_connection_integrate_with_main () was my main suspect the other day as well, but then I prolly run out of time or was bind ;-): "<gicmo> hmm might it be that its _g_dbus_connection_integrate_with_main () being called from the thread oder something"
This should be fixed with the latest commit: 2008-09-23 Alexander Larsson <alexl@redhat.com> * monitor/proxy/gproxyvolumemonitor.[ch]: * monitor/proxy/gproxyvolumemonitor.h: * monitor/proxy/remote-volume-monitor-module.c: Only call the IsSupported dbus call when the class is actually needed instead of on gio init. Don't integrate internal session bus with mainloop during is_support code, as that is not necessary yet, and it caused problem if done in a thread. This fixes the trash crash issue in bug #547568.