Bug 787992 – gvfsd-mtp segfault in unregister_mount_got_proxy_cb when disconnecting android device

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 787992 - gvfsd-mtp segfault in unregister_mount_got_proxy_cb when disconnecting android device


Summary:	gvfsd-mtp segfault in unregister_mount_got_proxy_cb when disconnecting androi...


Status:	RESOLVED FIXED

Product:	gvfs
Classification:	Core
Component:	general
Version:	1.34.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gvfs-maint
QA Contact:	gvfs-maint

URL:
Whiteboard:

Duplicates:	793885 (view as bug list)
Depends on:
Blocks:

Reported:	2017-09-21 14:37 UTC by Sebastien Bacher
Modified:	2018-03-05 13:39 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
mtp: Handle uevents only in case of mount success (1.61 KB, patch) 2018-02-16 09:43 UTC, Ondrej Holy	committed	Details \| Review
daemon: Add missing guards for job_sources (1.66 KB, patch) 2018-02-16 09:43 UTC, Ondrej Holy	committed	Details \| Review

Description Sebastien Bacher 2017-09-21 14:37:08 UTC

The issue is hit by quite some users according to fedora/ubuntu reports and is still there in GNOME 3.26
https://bugzilla.redhat.com/show_bug.cgi?id=1442777
https://bugs.launchpad.net/ubuntu/+source/gvfs/+bug/1706097

Backtrace from 1.32.1

+ Trace 237992

#0 unregister_mount_got_proxy_cb
at gvfsbackend.c line 773
#1 g_task_return_now
at gtask.c line 1145
#2 g_task_return
at gtask.c line 1203
#3 init_second_async_cb
at gdbusproxy.c line 1808
#4 g_task_return_now
at gtask.c line 1145
#5 g_task_return
at gtask.c line 1203
#6 async_init_data_set_name_owner
at gdbusproxy.c line 1499
#7 async_init_get_name_owner_cb
at gdbusproxy.c line 1537
#8 g_task_return_now
at gtask.c line 1145

Comment 1 Philip Langdale 2017-09-22 03:19:14 UTC

This is in generic code; there's nothing mtp specific in the backtrace.

Comment 2 Sebastien Bacher 2017-10-17 19:14:21 UTC

one user having a segfault on file copies trying to under valgrind and got that error, unsure if that could trigger a segfault though

==3653== Conditional jump or move depends on uninitialised value(s)
==3653==    at 0x65D48A7: __wmemchr_avx2 (memchr-avx2.S:250)
==3653==    by 0x6533E70: internal_fnwmatch (fnmatch_loop.c:171)
==3653==    by 0x6537268: fnmatch@@GLIBC_2.2.5 (fnmatch.c:440)
==3653==    by 0x53B3F3D: cache_glob_lookup_fnmatch (xdgmimecache.c:464)
==3653==    by 0x53B3F3D: cache_glob_lookup_file_name (xdgmimecache.c:689)
==3653==    by 0x53B3F3D: __gio_xdg_cache_get_mime_types_from_file_name (xdgmimecache.c:870)
==3653==    by 0x539193F: g_content_type_guess (gcontenttype.c:671)
==3653==    by 0x10E3B4: get_file_info.isra.1 (gvfsbackendmtp.c:1276)
==3653==    by 0x112679: do_enumerate (gvfsbackendmtp.c:1398)
==3653==    by 0x4E4C7C9: g_vfs_job_run (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so)
==3653==    by 0x4E46B7E: ??? (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so)
==3653==    by 0x5F9500F: g_thread_pool_thread_proxy (gthreadpool.c:307)
==3653==    by 0x5F94644: g_thread_proxy (gthread.c:784)
==3653==    by 0x623D7FB: start_thread (pthread_create.c:465)b

Comment 3 Till Kamppeter 2017-10-18 21:21:14 UTC

I am the user Seb mentions in comment #2 and have reported it as bug 789150.

Comment 4 gkrithi8 2017-10-20 09:09:56 UTC

(gdb) info locals all
data = 0x556785a52b40
proxy = 0x556785a49160
error = 0x0
result = <optimized out>
backend = 0x0

(gdb) p *data
$5 = {
  backend = 0x0, 
  callback = 0x7fa16edc2260 <forced_unregister_mount_callback>, 
  callback_data = 0x0
}

+ Trace 238082

Thread 1 (Thread 0x7fa16f1b4880 (LWP 10492))

#0 unregister_mount_got_proxy_cb
at gvfsbackend.c line 773
#1 g_task_return_now
at ../../../../gio/gtask.c line 1145
#2 g_task_return
at ../../../../gio/gtask.c line 1203
#3 init_second_async_cb
at ../../../../gio/gdbusproxy.c line 1808
#4 g_task_return_now
at ../../../../gio/gtask.c line 1145
#5 g_task_return
at ../../../../gio/gtask.c line 1203
#6 async_init_data_set_name_owner
at ../../../../gio/gdbusproxy.c line 1499
#7 async_init_get_name_owner_cb
at ../../../../gio/gdbusproxy.c line 1537
#8 g_task_return_now
at ../../../../gio/gtask.c line 1145
#9 g_task_return
at ../../../../gio/gtask.c line 1203
#10 g_dbus_connection_call_done
at ../../../../gio/gdbusconnection.c line 5722
#11 g_task_return_now
at ../../../../gio/gtask.c line 1145
#12 complete_in_idle_cb
at ../../../../gio/gtask.c line 1159
#13 g_main_dispatch
at ../../../../glib/gmain.c line 3148
#14 g_main_context_dispatch
at ../../../../glib/gmain.c line 3813
#15 g_main_context_iterate
at ../../../../glib/gmain.c line 3886
#16 g_main_loop_run
at ../../../../glib/gmain.c line 4082
#17 daemon_main
at daemon-main.c line 398
#18 main
at daemon-main-generic.c line 45

Comment 5 gkrithi8 2017-10-20 09:13:47 UTC

Relevant kernel log during crash:

Oct 19 21:15:45 test gvfsd[1965]: PTP_ERROR_IO: failed to open session, trying again after resetting USB interface
Oct 19 21:15:45 test gvfsd[1965]: LIBMTP libusb: Attempt to reset device
Oct 19 21:15:46 test gvfsd[1965]: inep: usb_get_endpoint_status(): No such device
Oct 19 21:15:46 test gvfsd[1965]: outep: usb_get_endpoint_status(): No such device
Oct 19 21:15:46 test gvfsd[1965]: libusb_open() failed!: No such device
Oct 19 21:15:46 test gvfsd[1965]: LIBMTP PANIC: Could not init USB on second attempt
Oct 19 21:15:46 test kernel: [105249.654069] usb 2-1.3.3: USB disconnect, device number 44
Oct 19 21:15:46 test gvfsd-mtp[10492]: invalid unclassed pointer in cast to 'GVfsBackendMtp'
Oct 19 21:15:46 test gvfsd-mtp[10492]: g_object_ref: assertion 'G_IS_OBJECT (object)' failed
Oct 19 21:15:46 test gvfsd-mtp[10492]: g_object_ref: assertion 'G_IS_OBJECT (object)' failed
Oct 19 21:15:46 test kernel: [105249.670312] gvfsd-mtp[10492]: segfault at 18 ip 00007fa16edc1ae9 sp 00007ffcc127e920 error 4 in libgvfsdaemon.so[7fa16edb4000+25000]

Comment 6 Sebastien Bacher 2017-11-03 09:40:49 UTC

ok, I managed to get an invalid read in a valgrind, it's missing one debug symbol but might be useful still

 Invalid read of size 8
    at 0x10DBA8: mtp_heartbeat (gvfsbackendmtp.c:824)
    by 0x5F6D8A2: g_timeout_dispatch (gmain.c:4633)
    by 0x5F6CE24: g_main_dispatch (gmain.c:3148)
    by 0x5F6CE24: g_main_context_dispatch (gmain.c:3813)
    by 0x5F6D1EF: g_main_context_iterate.isra.30 (gmain.c:3886)
    by 0x5F6D501: g_main_loop_run (gmain.c:4082)
    by 0x113FDC: daemon_main (daemon-main.c:398)
    by 0x10CFC2: main (daemon-main-generic.c:45)
  Address 0x91e74b0 is 176 bytes inside a block of size 288 free'd
    at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    by 0x5D02AE2: g_type_free_instance (gtype.c:1937)
    by 0x4E48DE3: ??? (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so)
    by 0x532A202: g_task_return_now (gtask.c:1145)
    by 0x532AC25: g_task_return (gtask.c:1203)
    by 0x536906A: reply_cb (gdbusproxy.c:2589)
    by 0x532A202: g_task_return_now (gtask.c:1145)
    by 0x532AC25: g_task_return (gtask.c:1203)
    by 0x535DCD1: g_dbus_connection_call_done (gdbusconnection.c:5722)
    by 0x532A202: g_task_return_now (gtask.c:1145)
    by 0x532A238: complete_in_idle_cb (gtask.c:1159)
    by 0x5F6CE24: g_main_dispatch (gmain.c:3148)
    by 0x5F6CE24: g_main_context_dispatch (gmain.c:3813)
  Block was alloc'd at
    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    by 0x5F72578: g_malloc (gmem.c:94)
    by 0x5F8A0F5: g_slice_alloc (gslice.c:1025)
    by 0x5F8A588: g_slice_alloc0 (gslice.c:1051)
    by 0x5D027E5: g_type_create_instance (gtype.c:1839)
    by 0x5CE3097: g_object_constructor (gobject.c:2146)
    by 0x4E48B97: ??? (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so)
    by 0x5CE341D: g_object_new_with_custom_constructor (gobject.c:1715)
    by 0x5CE341D: g_object_new_internal (gobject.c:1795)
    by 0x5CE545F: g_object_new_valist (gobject.c:2120)
    by 0x5CE57D8: g_object_new (gobject.c:1640)
    by 0x4E479F8: g_vfs_daemon_initiate_mount (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so)
    by 0x4E47AD4: ??? (in /usr/lib/x86_64-linux-gnu/gvfs/libgvfsdaemon.so)

Comment 7 Sebastien Bacher 2017-11-03 11:25:36 UTC

connecting/disconnecting a phone a few time in a few seconds was enough to get the error in valgrind, this one it matches the gdb segfault and has debug symbols

 Invalid read of size 8
    at 0x4E47D0D: g_vfs_daemon_close_active_channels (gvfsdaemon.c:1130)
    by 0x4E48DC3: forced_unregister_mount_callback (gvfsbackend.c:1053)
    by 0x532A202: g_task_return_now (gtask.c:1145)
    by 0x532AC25: g_task_return (gtask.c:1203)
    by 0x536906A: reply_cb (gdbusproxy.c:2589)
    by 0x532A202: g_task_return_now (gtask.c:1145)
    by 0x532AC25: g_task_return (gtask.c:1203)
    by 0x535DCD1: g_dbus_connection_call_done (gdbusconnection.c:5722)
    by 0x532A202: g_task_return_now (gtask.c:1145)
    by 0x532A238: complete_in_idle_cb (gtask.c:1159)
    by 0x5F6CE24: g_main_dispatch (gmain.c:3148)
    by 0x5F6CE24: g_main_context_dispatch (gmain.c:3813)
    by 0x5F6D1EF: g_main_context_iterate.isra.30 (gmain.c:3886)
  Address 0xdbadb88 is 8 bytes inside a block of size 24 free'd
    at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    by 0x5F68A27: g_list_remove (glist.c:521)
    by 0x4E46E01: job_source_closed_callback (gvfsdaemon.c:410)
    by 0x5CDDF9C: g_closure_invoke (gclosure.c:804)
    by 0x5CF0CDD: signal_emit_unlocked_R (gsignal.c:3635)
    by 0x5CF94B4: g_signal_emit_valist (gsignal.c:3391)
    by 0x5CF9ECE: g_signal_emit (gsignal.c:3447)
    by 0x4E47D3A: g_vfs_daemon_close_active_channels (gvfsdaemon.c:1133)
    by 0x4E48DC3: forced_unregister_mount_callback (gvfsbackend.c:1053)
    by 0x532A202: g_task_return_now (gtask.c:1145)
    by 0x532AC25: g_task_return (gtask.c:1203)
    by 0x536906A: reply_cb (gdbusproxy.c:2589)
  Block was alloc'd at
    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    by 0x5F72578: g_malloc (gmem.c:94)
    by 0x5F8A0F5: g_slice_alloc (gslice.c:1025)
    by 0x5F68F93: g_list_append (glist.c:261)
    by 0x4E4771F: g_vfs_daemon_add_job_source (gvfsdaemon.c:476)
    by 0x5CDDF9C: g_closure_invoke (gclosure.c:804)
    by 0x5CF0CDD: signal_emit_unlocked_R (gsignal.c:3635)
    by 0x5CF94B4: g_signal_emit_valist (gsignal.c:3391)
    by 0x5CFA3A7: g_signal_emit_by_name (gsignal.c:3487)
    by 0x4E4F7C0: create_reply (gvfsjobopenforread.c:201)
    by 0x4E4CE0B: send_reply (gvfsjobdbus.c:161)
    by 0x5CDDF9C: g_closure_invoke (gclosure.c:804)

Comment 8 Sebastien Bacher 2017-11-03 11:32:05 UTC

(the gvfs version is 1.34.1)

Comment 9 Ondrej Holy 2018-01-02 14:33:02 UTC

(In reply to Philip Langdale from comment #1)
> This is in generic code; there's nothing mtp specific in the backtrace.

It is true that crashes in unregister_mount_got_proxy_cb don't happen exclusively in mtp backend, but at least for Fedora, the number of gvfsd-mtp crashes increased really heavily with the first gvfs release after libmtp >= 1.1.12 bump...

So I am pretty sure that this relates with Bug 761278 which should lead to fix the unmount races in mtp backend, but it makes it probably worse than before...

See:
https://retrace.fedoraproject.org/faf/reports/1712080/

Comment 10 Philip Langdale 2018-01-26 04:42:37 UTC

So, the crash happens on an attempt to dereference backend->priv to get the object_path. The obvious guess is that the backend was already finalised before the callback was called. But I have never seen this crash and certainly can't reproduce it right now - which I'd need to do to understand what the overall state is and what happened when.

Ondrej, can you trigger this?

Comment 11 Ondrej Holy 2018-02-05 14:45:36 UTC

I can't reproduce it also even if I try force-unmounting the backend in various ways, unplugging the cable randomly during different operations etc. :-(

Comment 12 Philip Langdale 2018-02-05 19:04:58 UTC

It might be device specific. We mostly don't know what devices are involved with these errors, and we both have a bunch of Nexus/Pixel devices.

It's possible they are generating different wire behaviour than we see.

Comment 13 Dominik 'Rathann' Mierzejewski 2018-02-16 08:32:07 UTC

(In reply to Philip Langdale from comment #12)
> It might be device specific. We mostly don't know what devices are involved
> with these errors, and we both have a bunch of Nexus/Pixel devices.
> 
> It's possible they are generating different wire behaviour than we see.

You only have to take a look at the Fedora bug report to get a list:
Samsung Galaxy S6
Samsung Galaxy S7
Google Nexus 7 2012 (2 reports)
Google Pixel (2 reports)
Sony M4 Aqua

and that's just from https://bugzilla.redhat.com/show_bug.cgi?id=1495288 . There are many duplicates which might contain more devices mentioned.

Comment 14 Ondrej Holy 2018-02-16 09:43:04 UTC

You are right. I am just debugging it right now. I am still not able to reproduce the issue from Description, but I am able to reproduce another segfault. It may happen when device is unplugged in the middle of mount operation. Which is also probably the case from Comment 7. See debug log:

mtp: (I) g_vfs_backend_mtp_init
mtp: (I) g_vfs_backend_mtp_init done.
mtp: Added new job source 0x7f0130 (GVfsBackendMtp)
mtp: Queued new job 0x7f1020 (GVfsJobMount)
mtp: (I) do_mount
mtp: (I) do_mount: host=[usb:002,052]
mtp: (II) get_dev_path_from_host: Parsed '[usb:002,052]' into device name /dev/bus/usb/002/052
mtp: (II) get_device: [usb:002,052]
Device 0 (VID=18d1 and PID=4ee1) is a Google Inc Nexus/Pixel (MTP).
mtp: (I) on_uevent (action bind, device (null))
LIBMTP PANIC: Unable to read device information on device 52 on bus 2, trying to continuemtp: send_reply(0x7f1020), failed=1 (Unable to open MTP device “[usb:002,052]”)
mtp: (II) get_device done.
mtp: (I) do_mount done.
mtp: (I) g_vfs_backend_mtp_finalize
mtp: (I) g_vfs_backend_mtp_finalize done.
mtp: (I) on_uevent (action unbind, device (null))
mtp: (I) on_uevent (action remove, device (null))
mtp: (I) on_uevent (action unbind, device /dev/bus/usb/002/052)

on_uevent is called after finalize. So, it might cause also the issue from Description in theory. I am going to attach a patch, however, I doubt that it will fix the issue from Description, because this bug used to be there since ever probably and I don't see any reason why it should cause such big amount of crashes in the last months...

Comment 15 Ondrej Holy 2018-02-16 09:43:44 UTC

Created attachment 368406 [details] [review]
mtp: Handle uevents only in case of mount success

Uevents handler can be connected even if mount operation failed (e.g.
device is unplugged in the middle of mount operation). Backend is
consequently finalized, however, g_vfs_backend_force_unmount can be
still called from on_uevent in the meantime. This might cause segfaults
in certain cases.

Comment 16 Ondrej Holy 2018-02-16 09:43:49 UTC

Created attachment 368407 [details] [review]
daemon: Add missing guards for job_sources

Use daemon->lock when accessing daemon->job_sources as on other places.
This might potentially help to fix some segfaults.

Comment 17 Ondrej Holy 2018-02-16 09:49:26 UTC

Backtrace isn't much helpful in this case, because we don't see what happened before the async call... Can somebody, who faces this regularly, provide debug log until the crash to see what happening in there:
https://wiki.gnome.org/Projects/gvfs/debugging

Comment 18 Philip Langdale 2018-02-16 20:11:16 UTC

Review of attachment 368406 [details] [review]:

Change looks fine.

Comment 19 Philip Langdale 2018-02-16 20:11:24 UTC

Review of attachment 368407 [details] [review]:

Change looks fine.

Comment 20 Ondrej Holy 2018-02-19 12:26:41 UTC

Thanks for looking! Let's push them and see what happens...

Attachment 368406 [details] pushed as dec2f32 - mtp: Handle uevents only in case of mount success
Attachment 368407 [details] pushed as e147e48 - daemon: Add missing guards for job_sources

Comment 21 Jan Vlug 2018-02-21 23:08:10 UTC

I tried to create a backtrace as requested in comment 17. But when I follow the instructions to start gvfs in debug mode I get this:

Failed to connect to the D-BUS daemon: Error spawning command line “dbus-launch --autolaunch=6651fe407c50475096aef306a1eda08e --binary-syntax --close-stderr”: Child process exited with code 1 (g-spawn-exit-error-quark, 1)

Comment 22 Ondrej Holy 2018-02-23 10:23:03 UTC

I've never seen this error, but it sounds like that the daemon doesn't see dbus session and trying to spawn new one and failed. Did you run the commands under normal user in graphical session? Is DBUS_SESSION_BUS_ADDRESS environment variable correctly set to something like DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$UID/bus"?

Comment 23 Bastien Nocera 2018-02-27 16:09:55 UTC

(In reply to Ondrej Holy from comment #20)
> Thanks for looking! Let's push them and see what happens...
> 
> Attachment 368406 [details] pushed as dec2f32 - mtp: Handle uevents only in
> case of mount success
> Attachment 368407 [details] pushed as e147e48 - daemon: Add missing guards
> for job_sources

In which stable version of gvfs are those fixed? I filed https://bugzilla.gnome.org/show_bug.cgi?id=793885 to avoid losing my backtraces.

Comment 24 Ondrej Holy 2018-02-28 09:11:17 UTC

Not yet in any stable release, but I've just pushed it in gnome-3-26, however, it is already part of 1.35.91. Any change you can test those patches?

Comment 25 Bastien Nocera 2018-02-28 16:58:58 UTC

*** Bug 793885 has been marked as a duplicate of this bug. ***

Comment 26 Bastien Nocera 2018-02-28 17:00:29 UTC

(In reply to Ondrej Holy from comment #24)
> Not yet in any stable release, but I've just pushed it in gnome-3-26,
> however, it is already part of 1.35.91. Any change you can test those
> patches?

As mentioned in the dupe, 1.35.91 fixes the problem, so backporting those fixes to stable and doing a release would certainly help.

Comment 27 Ondrej Holy 2018-02-28 17:46:20 UTC

Great news, thanks for testing! Not sure about the new stable release since I did it recently, but will definitely backport in Fedora and suggest to do the same in other distros...

Comment 28 Ondrej Holy 2018-03-05 13:11:48 UTC

I've finally made 1.32.2 and 1.34.2.1 stable releases given the frequency of this bug. So I hope we can close this now...

Comment 29 Bastien Nocera 2018-03-05 13:39:55 UTC

(In reply to Ondrej Holy from comment #28)
> I've finally made 1.32.2 and 1.34.2.1 stable releases given the frequency of
> this bug. So I hope we can close this now...

The frequency should also lower with the android-tools fix.