GNOME Bugzilla – Bug 787992
gvfsd-mtp segfault in unregister_mount_got_proxy_cb when disconnecting android device
Last modified: 2018-03-05 13:39:55 UTC
The issue is hit by quite some users according to fedora/ubuntu reports and is still there in GNOME 3.26 https://bugzilla.redhat.com/show_bug.cgi?id=1442777 https://bugs.launchpad.net/ubuntu/+source/gvfs/+bug/1706097 Backtrace from 1.32.1
+ Trace 237992
This is in generic code; there's nothing mtp specific in the backtrace.
one user having a segfault on file copies trying to under valgrind and got that error, unsure if that could trigger a segfault though ==3653== Conditional jump or move depends on uninitialised value(s) ==3653== at 0x65D48A7: __wmemchr_avx2 (memchr-avx2.S:250) ==3653== by 0x6533E70: internal_fnwmatch (fnmatch_loop.c:171) ==3653== by 0x6537268: fnmatch@@GLIBC_2.2.5 (fnmatch.c:440) ==3653== by 0x53B3F3D: cache_glob_lookup_fnmatch (xdgmimecache.c:464) ==3653== by 0x53B3F3D: cache_glob_lookup_file_name (xdgmimecache.c:689) ==3653== by 0x53B3F3D: __gio_xdg_cache_get_mime_types_from_file_name (xdgmimecache.c:870) ==3653== by 0x539193F: g_content_type_guess (gcontenttype.c:671) ==3653== by 0x10E3B4: get_file_info.isra.1 (gvfsbackendmtp.c:1276) ==3653== by 0x112679: do_enumerate (gvfsbackendmtp.c:1398) ==3653== by 0x4E4C7C9: g_vfs_job_run (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so) ==3653== by 0x4E46B7E: ??? (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so) ==3653== by 0x5F9500F: g_thread_pool_thread_proxy (gthreadpool.c:307) ==3653== by 0x5F94644: g_thread_proxy (gthread.c:784) ==3653== by 0x623D7FB: start_thread (pthread_create.c:465)b
I am the user Seb mentions in comment #2 and have reported it as bug 789150.
(gdb) info locals all data = 0x556785a52b40 proxy = 0x556785a49160 error = 0x0 result = <optimized out> backend = 0x0 (gdb) p *data $5 = { backend = 0x0, callback = 0x7fa16edc2260 <forced_unregister_mount_callback>, callback_data = 0x0 }
+ Trace 238082
Thread 1 (Thread 0x7fa16f1b4880 (LWP 10492))
Relevant kernel log during crash: Oct 19 21:15:45 test gvfsd[1965]: PTP_ERROR_IO: failed to open session, trying again after resetting USB interface Oct 19 21:15:45 test gvfsd[1965]: LIBMTP libusb: Attempt to reset device Oct 19 21:15:46 test gvfsd[1965]: inep: usb_get_endpoint_status(): No such device Oct 19 21:15:46 test gvfsd[1965]: outep: usb_get_endpoint_status(): No such device Oct 19 21:15:46 test gvfsd[1965]: libusb_open() failed!: No such device Oct 19 21:15:46 test gvfsd[1965]: LIBMTP PANIC: Could not init USB on second attempt Oct 19 21:15:46 test kernel: [105249.654069] usb 2-1.3.3: USB disconnect, device number 44 Oct 19 21:15:46 test gvfsd-mtp[10492]: invalid unclassed pointer in cast to 'GVfsBackendMtp' Oct 19 21:15:46 test gvfsd-mtp[10492]: g_object_ref: assertion 'G_IS_OBJECT (object)' failed Oct 19 21:15:46 test gvfsd-mtp[10492]: g_object_ref: assertion 'G_IS_OBJECT (object)' failed Oct 19 21:15:46 test kernel: [105249.670312] gvfsd-mtp[10492]: segfault at 18 ip 00007fa16edc1ae9 sp 00007ffcc127e920 error 4 in libgvfsdaemon.so[7fa16edb4000+25000]
ok, I managed to get an invalid read in a valgrind, it's missing one debug symbol but might be useful still Invalid read of size 8 at 0x10DBA8: mtp_heartbeat (gvfsbackendmtp.c:824) by 0x5F6D8A2: g_timeout_dispatch (gmain.c:4633) by 0x5F6CE24: g_main_dispatch (gmain.c:3148) by 0x5F6CE24: g_main_context_dispatch (gmain.c:3813) by 0x5F6D1EF: g_main_context_iterate.isra.30 (gmain.c:3886) by 0x5F6D501: g_main_loop_run (gmain.c:4082) by 0x113FDC: daemon_main (daemon-main.c:398) by 0x10CFC2: main (daemon-main-generic.c:45) Address 0x91e74b0 is 176 bytes inside a block of size 288 free'd at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5D02AE2: g_type_free_instance (gtype.c:1937) by 0x4E48DE3: ??? (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so) by 0x532A202: g_task_return_now (gtask.c:1145) by 0x532AC25: g_task_return (gtask.c:1203) by 0x536906A: reply_cb (gdbusproxy.c:2589) by 0x532A202: g_task_return_now (gtask.c:1145) by 0x532AC25: g_task_return (gtask.c:1203) by 0x535DCD1: g_dbus_connection_call_done (gdbusconnection.c:5722) by 0x532A202: g_task_return_now (gtask.c:1145) by 0x532A238: complete_in_idle_cb (gtask.c:1159) by 0x5F6CE24: g_main_dispatch (gmain.c:3148) by 0x5F6CE24: g_main_context_dispatch (gmain.c:3813) Block was alloc'd at at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5F72578: g_malloc (gmem.c:94) by 0x5F8A0F5: g_slice_alloc (gslice.c:1025) by 0x5F8A588: g_slice_alloc0 (gslice.c:1051) by 0x5D027E5: g_type_create_instance (gtype.c:1839) by 0x5CE3097: g_object_constructor (gobject.c:2146) by 0x4E48B97: ??? (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so) by 0x5CE341D: g_object_new_with_custom_constructor (gobject.c:1715) by 0x5CE341D: g_object_new_internal (gobject.c:1795) by 0x5CE545F: g_object_new_valist (gobject.c:2120) by 0x5CE57D8: g_object_new (gobject.c:1640) by 0x4E479F8: g_vfs_daemon_initiate_mount (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so) by 0x4E47AD4: ??? (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so)
connecting/disconnecting a phone a few time in a few seconds was enough to get the error in valgrind, this one it matches the gdb segfault and has debug symbols Invalid read of size 8 at 0x4E47D0D: g_vfs_daemon_close_active_channels (gvfsdaemon.c:1130) by 0x4E48DC3: forced_unregister_mount_callback (gvfsbackend.c:1053) by 0x532A202: g_task_return_now (gtask.c:1145) by 0x532AC25: g_task_return (gtask.c:1203) by 0x536906A: reply_cb (gdbusproxy.c:2589) by 0x532A202: g_task_return_now (gtask.c:1145) by 0x532AC25: g_task_return (gtask.c:1203) by 0x535DCD1: g_dbus_connection_call_done (gdbusconnection.c:5722) by 0x532A202: g_task_return_now (gtask.c:1145) by 0x532A238: complete_in_idle_cb (gtask.c:1159) by 0x5F6CE24: g_main_dispatch (gmain.c:3148) by 0x5F6CE24: g_main_context_dispatch (gmain.c:3813) by 0x5F6D1EF: g_main_context_iterate.isra.30 (gmain.c:3886) Address 0xdbadb88 is 8 bytes inside a block of size 24 free'd at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5F68A27: g_list_remove (glist.c:521) by 0x4E46E01: job_source_closed_callback (gvfsdaemon.c:410) by 0x5CDDF9C: g_closure_invoke (gclosure.c:804) by 0x5CF0CDD: signal_emit_unlocked_R (gsignal.c:3635) by 0x5CF94B4: g_signal_emit_valist (gsignal.c:3391) by 0x5CF9ECE: g_signal_emit (gsignal.c:3447) by 0x4E47D3A: g_vfs_daemon_close_active_channels (gvfsdaemon.c:1133) by 0x4E48DC3: forced_unregister_mount_callback (gvfsbackend.c:1053) by 0x532A202: g_task_return_now (gtask.c:1145) by 0x532AC25: g_task_return (gtask.c:1203) by 0x536906A: reply_cb (gdbusproxy.c:2589) Block was alloc'd at at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x5F72578: g_malloc (gmem.c:94) by 0x5F8A0F5: g_slice_alloc (gslice.c:1025) by 0x5F68F93: g_list_append (glist.c:261) by 0x4E4771F: g_vfs_daemon_add_job_source (gvfsdaemon.c:476) by 0x5CDDF9C: g_closure_invoke (gclosure.c:804) by 0x5CF0CDD: signal_emit_unlocked_R (gsignal.c:3635) by 0x5CF94B4: g_signal_emit_valist (gsignal.c:3391) by 0x5CFA3A7: g_signal_emit_by_name (gsignal.c:3487) by 0x4E4F7C0: create_reply (gvfsjobopenforread.c:201) by 0x4E4CE0B: send_reply (gvfsjobdbus.c:161) by 0x5CDDF9C: g_closure_invoke (gclosure.c:804)
(the gvfs version is 1.34.1)
(In reply to Philip Langdale from comment #1) > This is in generic code; there's nothing mtp specific in the backtrace. It is true that crashes in unregister_mount_got_proxy_cb don't happen exclusively in mtp backend, but at least for Fedora, the number of gvfsd-mtp crashes increased really heavily with the first gvfs release after libmtp >= 1.1.12 bump... So I am pretty sure that this relates with Bug 761278 which should lead to fix the unmount races in mtp backend, but it makes it probably worse than before... See: https://retrace.fedoraproject.org/faf/reports/1712080/
So, the crash happens on an attempt to dereference backend->priv to get the object_path. The obvious guess is that the backend was already finalised before the callback was called. But I have never seen this crash and certainly can't reproduce it right now - which I'd need to do to understand what the overall state is and what happened when. Ondrej, can you trigger this?
I can't reproduce it also even if I try force-unmounting the backend in various ways, unplugging the cable randomly during different operations etc. :-(
It might be device specific. We mostly don't know what devices are involved with these errors, and we both have a bunch of Nexus/Pixel devices. It's possible they are generating different wire behaviour than we see.
(In reply to Philip Langdale from comment #12) > It might be device specific. We mostly don't know what devices are involved > with these errors, and we both have a bunch of Nexus/Pixel devices. > > It's possible they are generating different wire behaviour than we see. You only have to take a look at the Fedora bug report to get a list: Samsung Galaxy S6 Samsung Galaxy S7 Google Nexus 7 2012 (2 reports) Google Pixel (2 reports) Sony M4 Aqua and that's just from https://bugzilla.redhat.com/show_bug.cgi?id=1495288 . There are many duplicates which might contain more devices mentioned.
You are right. I am just debugging it right now. I am still not able to reproduce the issue from Description, but I am able to reproduce another segfault. It may happen when device is unplugged in the middle of mount operation. Which is also probably the case from Comment 7. See debug log: mtp: (I) g_vfs_backend_mtp_init mtp: (I) g_vfs_backend_mtp_init done. mtp: Added new job source 0x7f0130 (GVfsBackendMtp) mtp: Queued new job 0x7f1020 (GVfsJobMount) mtp: (I) do_mount mtp: (I) do_mount: host=[usb:002,052] mtp: (II) get_dev_path_from_host: Parsed '[usb:002,052]' into device name /dev/bus/usb/002/052 mtp: (II) get_device: [usb:002,052] Device 0 (VID=18d1 and PID=4ee1) is a Google Inc Nexus/Pixel (MTP). mtp: (I) on_uevent (action bind, device (null)) LIBMTP PANIC: Unable to read device information on device 52 on bus 2, trying to continuemtp: send_reply(0x7f1020), failed=1 (Unable to open MTP device “[usb:002,052]”) mtp: (II) get_device done. mtp: (I) do_mount done. mtp: (I) g_vfs_backend_mtp_finalize mtp: (I) g_vfs_backend_mtp_finalize done. mtp: (I) on_uevent (action unbind, device (null)) mtp: (I) on_uevent (action remove, device (null)) mtp: (I) on_uevent (action unbind, device /dev/bus/usb/002/052) on_uevent is called after finalize. So, it might cause also the issue from Description in theory. I am going to attach a patch, however, I doubt that it will fix the issue from Description, because this bug used to be there since ever probably and I don't see any reason why it should cause such big amount of crashes in the last months...
Created attachment 368406 [details] [review] mtp: Handle uevents only in case of mount success Uevents handler can be connected even if mount operation failed (e.g. device is unplugged in the middle of mount operation). Backend is consequently finalized, however, g_vfs_backend_force_unmount can be still called from on_uevent in the meantime. This might cause segfaults in certain cases.
Created attachment 368407 [details] [review] daemon: Add missing guards for job_sources Use daemon->lock when accessing daemon->job_sources as on other places. This might potentially help to fix some segfaults.
Backtrace isn't much helpful in this case, because we don't see what happened before the async call... Can somebody, who faces this regularly, provide debug log until the crash to see what happening in there: https://wiki.gnome.org/Projects/gvfs/debugging
Review of attachment 368406 [details] [review]: Change looks fine.
Review of attachment 368407 [details] [review]: Change looks fine.
Thanks for looking! Let's push them and see what happens... Attachment 368406 [details] pushed as dec2f32 - mtp: Handle uevents only in case of mount success Attachment 368407 [details] pushed as e147e48 - daemon: Add missing guards for job_sources
I tried to create a backtrace as requested in comment 17. But when I follow the instructions to start gvfs in debug mode I get this: Failed to connect to the D-BUS daemon: Error spawning command line “dbus-launch --autolaunch=6651fe407c50475096aef306a1eda08e --binary-syntax --close-stderr”: Child process exited with code 1 (g-spawn-exit-error-quark, 1)
I've never seen this error, but it sounds like that the daemon doesn't see dbus session and trying to spawn new one and failed. Did you run the commands under normal user in graphical session? Is DBUS_SESSION_BUS_ADDRESS environment variable correctly set to something like DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$UID/bus"?
(In reply to Ondrej Holy from comment #20) > Thanks for looking! Let's push them and see what happens... > > Attachment 368406 [details] pushed as dec2f32 - mtp: Handle uevents only in > case of mount success > Attachment 368407 [details] pushed as e147e48 - daemon: Add missing guards > for job_sources In which stable version of gvfs are those fixed? I filed https://bugzilla.gnome.org/show_bug.cgi?id=793885 to avoid losing my backtraces.
Not yet in any stable release, but I've just pushed it in gnome-3-26, however, it is already part of 1.35.91. Any change you can test those patches?
*** Bug 793885 has been marked as a duplicate of this bug. ***
(In reply to Ondrej Holy from comment #24) > Not yet in any stable release, but I've just pushed it in gnome-3-26, > however, it is already part of 1.35.91. Any change you can test those > patches? As mentioned in the dupe, 1.35.91 fixes the problem, so backporting those fixes to stable and doing a release would certainly help.
Great news, thanks for testing! Not sure about the new stable release since I did it recently, but will definitely backport in Fedora and suggest to do the same in other distros...
I've finally made 1.32.2 and 1.34.2.1 stable releases given the frequency of this bug. So I hope we can close this now...
(In reply to Ondrej Holy from comment #28) > I've finally made 1.32.2 and 1.34.2.1 stable releases given the frequency of > this bug. So I hope we can close this now... The frequency should also lower with the android-tools fix.