GNOME Bugzilla – Bug 724916
gio unmount code makes XFCE's Thunar crash
Last modified: 2014-04-10 19:09:40 UTC
Created attachment 269955 [details] [review] patch which includes both fixes This bug manifests as a crash on OpenBSD with XFCE's Thunar file manager. A crash happens when attempting to unmount a disk device which was mounted with toad. toad is a helper program for auto-mounting devices in GNOME under OpenBSD, see http://undeadly.org/cgi?action=article&sid=20140219085851 toad uses gvfs to make the disk device appear in the file manager. In Thunar, unmounting such devices is possible by clicking on the 'eject' button next to the disk device, similar to how it can be done in GNOME. When I tried this, Thunar crashed: [[[ (gdb) bt
+ Trace 233205
The cause of the crash was puzzling at first, because the stack of the called function seemed corrupt. It turns out that this is due to a duplicate event source associated with a task in eject_unmount_do_cb() in gio's gunixmount. The code reads: [[[ data->error_channel_source = g_io_create_watch (data->error_channel, G_IO_IN); g_task_attach_source (task, data->error_channel_source, (GSourceFunc) eject_unmount_read_error); child_watch = g_child_watch_source_new (child_pid); g_task_attach_source (task, data->error_channel_source, (GSourceFunc) eject_unmount_cb); g_source_unref (child_watch); ]]] Note how the child_watch event source is never used. Rather, data->error_channel_source is used twice as event source. The second call to g_task_attach_source() above results in the following message: (thunar:6655): GLib-CRITICAL **: g_source_attach: assertion 'source->context == NULL' failed Also, note how the prototype of eject_unmount_cb() doesn't match the GSourceFunc type, but rather matches the GChildWatchFunc type. When the event source fires the first time, the code reaches thunar_device_operation_finish() and things go as planned. But then it fires the second time and the crash in thunar_device_operation_finish() occurs. The exact nature of the crash varies, but invariably looks like a corrupt stack (while stepping through gdb, local variabels don't have the values that were assigned to them). I could fix this by using an API that matches more closely what I believe eject_unmount_do_cb() is trying to do. g_child_watch_add() waits for a child process to finish (in this case, the child process is "unmount"), and expects a callback matching the type of eject_unmount_cb. This works: [[[ + g_child_watch_add (child_pid, (GChildWatchFunc) eject_unmount_cb, task); - child_watch = g_child_watch_source_new (child_pid); - g_task_attach_source (task, data->error_channel_source, - (GSourceFunc) eject_unmount_cb); - g_source_unref (child_watch); ]]] Now, another problem prevents Thunar from finishing the unmount procedure. The eject_unmount_read_error() function never completes its event loop if it only reads EOF, and no error string. The event which calls eject_unmount_read_error() keeps firing in an endless loop: [[[ Breakpoint 2, eject_unmount_read_error (channel=0x5c067243000, condition=G_IO_IN, user_data=0x5c06724d280) at gunixmount.c:302 302 { (gdb) n 303 GTask *task = user_data; (gdb) 304 UnmountEjectOp *data = g_task_get_task_data (task); (gdb) 310 if (g_task_return_error_if_cancelled (task)) (gdb) 313 error = NULL; (gdb) 315 status = g_io_channel_read_chars (channel, buf, sizeof (buf), &bytes_r ead, &error); (gdb) 316 if (status == G_IO_STATUS_NORMAL) (gdb) 322 else if (status == G_IO_STATUS_EOF) (gdb) p status $10 = G_IO_STATUS_EOF (gdb) n 323 g_string_append_len (data->error_string, buf, bytes_read); (gdb) c Continuing. Breakpoint 2, eject_unmount_read_error (channel=0x5c067243000, condition=G_IO_IN, user_data=0x5c06724d280) at gunixmount.c:302 302 { (gdb) n 303 GTask *task = user_data; (gdb) 304 UnmountEjectOp *data = g_task_get_task_data (task); (gdb) 310 if (g_task_return_error_if_cancelled (task)) (gdb) 313 error = NULL; (gdb) 315 status = g_io_channel_read_chars (channel, buf, sizeof (buf), &bytes_r ead, &error); (gdb) 316 if (status == G_IO_STATUS_NORMAL) (gdb) 322 else if (status == G_IO_STATUS_EOF) (gdb) p status $11 = G_IO_STATUS_EOF (gdb) c Continuing. Breakpoint 2, eject_unmount_read_error (channel=0x5c067243000, condition=G_IO_IN, user_data=0x5c06724d280) at gunixmount.c:302 302 { (gdb) ]]] I could fix this by tweaking the EOF case to abort the event loop. [[[ else if (status == G_IO_STATUS_EOF) - g_string_append_len (data->error_string, buf, bytes_read); + { + if (data->error_channel_source) + { + g_source_unref (data->error_channel_source); + data->error_channel_source = NULL; + } + return G_SOURCE_REMOVE; + } ]]] These fixes combined make Thunar perfectly happy with unmounting devices. Attached is a patch against git HEAD which contains these fixes.
Created attachment 269958 [details] [review] patch which includes both fixes Slightly better version of my patch which keeps the g_string_append_len() call at the top of the EOF case intact: [[[ else if (status == G_IO_STATUS_EOF) - g_string_append_len (data->error_string, buf, bytes_read); + { + g_string_append_len (data->error_string, buf, bytes_read); + if (data->error_channel_source) + { + g_source_unref (data->error_channel_source); + data->error_channel_source = NULL; + } + return G_SOURCE_REMOVE; + } ]]]
Tested under Nautilus and OpenBSD and it actually fixes some odd behavior I was seeing as well where the nautilus window could end up in a "broken" state (with the entire left pane gone). It would be awesome if this could make it in.
Review of attachment 269958 [details] [review]: This patch has at least one problem with it that prevents it from being merged. However, taking a step back, I think the approach is wrong. We should be using GSubprocess here now that we have it... That's obviously a much bigger patch, though. I'll take a look at it to see how feasible it would be. ::: gio/gunixmount.c @@ +377,3 @@ g_task_attach_source (task, data->error_channel_source, (GSourceFunc) eject_unmount_read_error); + g_child_watch_add (child_pid, (GChildWatchFunc) eject_unmount_cb, task); This will attach to the global default main context, which may not be the correct destination for this task. Why not just change this to use the child_watch source that we just created? That seems to be the obvious intent here.
Created attachment 270740 [details] [review] gsubprocess: fix communicate() with empty buffers On the splice for stdout or stderr completing, GSubprocess calls _slice_finish() to collect the result. We assume that a zero return value here means failure, but in fact this function returns a gssize -- the number of bytes transferred, or -1 for an error. This causes GSubprocess to mistakenly think that it has an error when it actually just has an empty buffer (as would be the case when collecting stderr from a successful command). Check for -1 instead of FALSE to detect the error.
Created attachment 270741 [details] [review] GUnixMount: port unmount to GSubprocess The existing code is buggy and now that we have GSubprocess, we should just use it instead, allowing for some substantial reduction in complexity.
Can you check if this pair of patches fixes it for you? Thanks
Created attachment 270742 [details] [review] GUnixVolume: port to GSubprocess This is completely untested -- I couldn't actually figure out a way to get a GUnixVolume handed to me on the API of GIO. Using the volume monitor gives me no results if gvfs is uninstalled and if gvfs is installed then I get GProxyVolumes...
Thanks Ryan for looking into this. Gsubprocess doesn't seem to exist in glib-2.38.2 which the OpenBSD ports tree is currently using. So your patches cannot be applied there until OpenBSD's glib port is updated or Gsubprocess is part of a released version of glib (I don't know which is the case). But I will try testing your patches with glib's git HEAD and get back to you.
Review of attachment 270740 [details] [review]: Looks right. Would be good to have a test...
Created attachment 270864 [details] [review] gsubprocess: test empty splices Make sure we handle the case that our splice returns no data properly. As requested -- fails before the patch, passes after.
Attachment 270740 [details] pushed as 9f71965 - gsubprocess: fix communicate() with empty buffers Attachment 270864 [details] pushed as 9da88a1 - gsubprocess: test empty splices
Review of attachment 270741 [details] [review]: All I can say is the new version looks much more straightforward. But without test cases, pretty hard to be 100% sure
Review of attachment 270742 [details] [review]: Same comment: new code looks much more straightforward
Created attachment 271533 [details] [review] GUnixVolume: implement _finish functions The _finish functions for GUnixVolume _mount and _eject functions were never implemented, having been simply stubbed out as 'return TRUE;'. Implement them.
I've now fully tested these changes under pass and fail cases (and, indeed, found one additional issue). Would still be nice to hear from the original reporter, though...
*** Bug 726122 has been marked as a duplicate of this bug. ***
Ryan, I would love to help test your patches. However, within several hours I couldn't manage to find a straightforward way of compiling glib from git and make use of it from Thunar without breaking existing glib-using applications on my system. I'd like to avoid stomping on the existing system installation of glib if possible but, long story short, glib's and thunar's build systems don't seem to be wired up to make this very easy. I'll need to set aside a separate machine for this (VM won't work easily since I need USB hotplug for testing, which I doubt would work on OpenBSD with qemu) and then try replacing the system glib installation. If you don't hear from me soon please don't wait for me. Just assume I cannot afford to let more of my time sink into this. In which case I will simply have to test your changes when they eventually get released in a version of glib which OpenBSD then updates to.
Attachment 270741 [details] pushed as 64ec757 - GUnixMount: port unmount to GSubprocess Attachment 270742 [details] pushed as 62fa9c5 - GUnixVolume: port to GSubprocess Attachment 271533 [details] pushed as ffe4e94 - GUnixVolume: implement _finish functions Okay. Pushing these and closing the bug, then, since this is a release blocker. Please let me know if you are able to test this, and reopen if needed.
Ryan, I have tested your changes regarding Bug 726122, and confirm that issue is fixed. New code runs much better. Thank you for fast response.
*** Bug 722692 has been marked as a duplicate of this bug. ***