After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 724916 - gio unmount code makes XFCE's Thunar crash
gio unmount code makes XFCE's Thunar crash
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gio
unspecified
Other OpenBSD
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 722692 726122 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2014-02-21 22:58 UTC by Stefan Sperling
Modified: 2014-04-10 19:09 UTC
See Also:
GNOME target: 3.12
GNOME version: ---


Attachments
patch which includes both fixes (1.66 KB, patch)
2014-02-21 22:58 UTC, Stefan Sperling
none Details | Review
patch which includes both fixes (1.72 KB, patch)
2014-02-21 23:22 UTC, Stefan Sperling
needs-work Details | Review
gsubprocess: fix communicate() with empty buffers (1.35 KB, patch)
2014-03-03 02:07 UTC, Allison Karlitskaya (desrt)
committed Details | Review
GUnixMount: port unmount to GSubprocess (6.85 KB, patch)
2014-03-03 02:07 UTC, Allison Karlitskaya (desrt)
committed Details | Review
GUnixVolume: port to GSubprocess (7.35 KB, patch)
2014-03-03 02:25 UTC, Allison Karlitskaya (desrt)
committed Details | Review
gsubprocess: test empty splices (2.02 KB, patch)
2014-03-04 02:27 UTC, Allison Karlitskaya (desrt)
committed Details | Review
GUnixVolume: implement _finish functions (1.29 KB, patch)
2014-03-11 16:21 UTC, Allison Karlitskaya (desrt)
committed Details | Review

Description Stefan Sperling 2014-02-21 22:58:31 UTC
Created attachment 269955 [details] [review]
patch which includes both fixes

This bug manifests as a crash on OpenBSD with XFCE's Thunar file manager.

A crash happens when attempting to unmount a disk device which was mounted with toad. toad is a helper program for auto-mounting devices in GNOME under OpenBSD, see http://undeadly.org/cgi?action=article&sid=20140219085851

toad uses gvfs to make the disk device appear in the file manager.
In Thunar, unmounting such devices is possible by clicking on the 'eject' button next to the disk device, similar to how it can be done in GNOME.

When I tried this, Thunar crashed:

[[[
(gdb) bt
  • #0 ??
  • #1 thunar_device_operation_finish
    at thunar-device.c line 258
  • #2 g_task_return_now
    at gtask.c line 1108
  • #3 g_task_return
    at gtask.c line 1161
  • #4 g_task_return_boolean
    at gtask.c line 1592
  • #5 eject_unmount_cb
    at gunixmount.c line 293
  • #6 g_io_unix_dispatch
    at giounix.c line 167
  • #7 g_main_dispatch
    at gmain.c line 3066
  • #8 g_main_context_dispatch
    at gmain.c line 3642
  • #9 g_main_context_iterate
    at gmain.c line 3713
  • #10 g_main_loop_run
    at gmain.c line 3907
  • #11 gtk_main
    from /usr/local/lib/libgtk-x11-2.0.so.2400.0
  • #12 main
    at main.c line 310

The cause of the crash was puzzling at first, because the stack of the called function seemed corrupt.

It turns out that this is due to a duplicate event source associated with a task in eject_unmount_do_cb() in gio's gunixmount. The code reads:

[[[
   data->error_channel_source = g_io_create_watch (data->error_channel, G_IO_IN);
   g_task_attach_source (task, data->error_channel_source,
                         (GSourceFunc) eject_unmount_read_error);

   child_watch = g_child_watch_source_new (child_pid);
   g_task_attach_source (task, data->error_channel_source,
                         (GSourceFunc) eject_unmount_cb);
   g_source_unref (child_watch);
]]]

Note how the child_watch event source is never used. Rather, data->error_channel_source is used twice as event source.
The second call to g_task_attach_source() above results in the following message:
(thunar:6655): GLib-CRITICAL **: g_source_attach: assertion 'source->context == NULL' failed

Also, note how the prototype of eject_unmount_cb() doesn't match the GSourceFunc type, but rather matches the GChildWatchFunc type.

When the event source fires the first time, the code reaches thunar_device_operation_finish() and things go as planned.
But then it fires the second time and the crash in thunar_device_operation_finish() occurs. The exact nature of the crash varies, but invariably looks like a corrupt stack (while stepping through gdb, local variabels don't have the values that were assigned to them).

I could fix this by using an API that matches more closely what I believe eject_unmount_do_cb() is trying to do. g_child_watch_add() waits for a child process to finish (in this case, the child process is "unmount"), and expects a callback matching the type of eject_unmount_cb. This works:

[[[
+  g_child_watch_add (child_pid, (GChildWatchFunc) eject_unmount_cb, task);
   
-  child_watch = g_child_watch_source_new (child_pid);
-  g_task_attach_source (task, data->error_channel_source,
-                        (GSourceFunc) eject_unmount_cb);
-  g_source_unref (child_watch);
]]]

Now, another problem prevents Thunar from finishing the unmount procedure.
The eject_unmount_read_error() function never completes its event loop if it only reads EOF, and no error string. The event which calls eject_unmount_read_error() keeps firing in an endless loop:

[[[
Breakpoint 2, eject_unmount_read_error (channel=0x5c067243000,
    condition=G_IO_IN, user_data=0x5c06724d280) at gunixmount.c:302
302     {
(gdb) n
303       GTask *task = user_data;
(gdb) 
304       UnmountEjectOp *data = g_task_get_task_data (task);                   
(gdb) 
310       if (g_task_return_error_if_cancelled (task))
(gdb) 
313       error = NULL;
(gdb) 
315       status = g_io_channel_read_chars (channel, buf, sizeof (buf), &bytes_r
ead, &error);
(gdb) 
316       if (status == G_IO_STATUS_NORMAL)
(gdb) 
322       else if (status == G_IO_STATUS_EOF)
(gdb) p status
$10 = G_IO_STATUS_EOF
(gdb) n
323         g_string_append_len (data->error_string, buf, bytes_read);
(gdb) c
Continuing.

Breakpoint 2, eject_unmount_read_error (channel=0x5c067243000, 
    condition=G_IO_IN, user_data=0x5c06724d280) at gunixmount.c:302
302     {
(gdb) n
303       GTask *task = user_data;
(gdb)
304       UnmountEjectOp *data = g_task_get_task_data (task);
(gdb)
310       if (g_task_return_error_if_cancelled (task))
(gdb)
313       error = NULL;
(gdb)
315       status = g_io_channel_read_chars (channel, buf, sizeof (buf), &bytes_r
ead, &error);
(gdb)
316       if (status == G_IO_STATUS_NORMAL)
(gdb)
322       else if (status == G_IO_STATUS_EOF)
(gdb) p status
$11 = G_IO_STATUS_EOF
(gdb) c
Continuing.

Breakpoint 2, eject_unmount_read_error (channel=0x5c067243000,
    condition=G_IO_IN, user_data=0x5c06724d280) at gunixmount.c:302
302     {
(gdb)
]]]

I could fix this by tweaking the EOF case to abort the event loop.
[[[
   else if (status == G_IO_STATUS_EOF)
-    g_string_append_len (data->error_string, buf, bytes_read);
+    {
+     if (data->error_channel_source)
+       {
+         g_source_unref (data->error_channel_source);
+         data->error_channel_source = NULL;
+       }
+      return G_SOURCE_REMOVE;
+    }
]]]

These fixes combined make Thunar perfectly happy with unmounting devices.
Attached is a patch against git HEAD which contains these fixes.
Comment 1 Stefan Sperling 2014-02-21 23:22:26 UTC
Created attachment 269958 [details] [review]
 patch which includes both fixes 

Slightly better version of my patch which keeps the g_string_append_len() call at the top of the EOF case intact:

[[[
   else if (status == G_IO_STATUS_EOF)
-    g_string_append_len (data->error_string, buf, bytes_read);
+    {
+      g_string_append_len (data->error_string, buf, bytes_read);
+      if (data->error_channel_source)
+        {
+          g_source_unref (data->error_channel_source);
+          data->error_channel_source = NULL;
+        }
+      return G_SOURCE_REMOVE;
+    }
]]]
Comment 2 Antoine Jacoutot 2014-02-22 08:37:38 UTC
Tested under Nautilus and OpenBSD and it actually fixes some odd behavior I was seeing as well where the nautilus window could end up in a "broken" state (with the entire left pane gone).

It would be awesome if this could make it in.
Comment 3 Allison Karlitskaya (desrt) 2014-03-02 19:40:46 UTC
Review of attachment 269958 [details] [review]:

This patch has at least one problem with it that prevents it from being merged.

However, taking a step back, I think the approach is wrong.  We should be using GSubprocess here now that we have it...  That's obviously a much bigger patch, though.

I'll take a look at it to see how feasible it would be.

::: gio/gunixmount.c
@@ +377,3 @@
   g_task_attach_source (task, data->error_channel_source,
                         (GSourceFunc) eject_unmount_read_error);
+  g_child_watch_add (child_pid, (GChildWatchFunc) eject_unmount_cb, task);

This will attach to the global default main context, which may not be the correct destination for this task.

Why not just change this to use the child_watch source that we just created?  That seems to be the obvious intent here.
Comment 4 Allison Karlitskaya (desrt) 2014-03-03 02:07:37 UTC
Created attachment 270740 [details] [review]
gsubprocess: fix communicate() with empty buffers

On the splice for stdout or stderr completing, GSubprocess calls
_slice_finish() to collect the result.

We assume that a zero return value here means failure, but in fact this
function returns a gssize -- the number of bytes transferred, or -1 for
an error.

This causes GSubprocess to mistakenly think that it has an error when it
actually just has an empty buffer (as would be the case when collecting
stderr from a successful command).

Check for -1 instead of FALSE to detect the error.
Comment 5 Allison Karlitskaya (desrt) 2014-03-03 02:07:41 UTC
Created attachment 270741 [details] [review]
GUnixMount: port unmount to GSubprocess

The existing code is buggy and now that we have GSubprocess, we should just use
it instead, allowing for some substantial reduction in complexity.
Comment 6 Allison Karlitskaya (desrt) 2014-03-03 02:08:14 UTC
Can you check if this pair of patches fixes it for you?

Thanks
Comment 7 Allison Karlitskaya (desrt) 2014-03-03 02:25:39 UTC
Created attachment 270742 [details] [review]
GUnixVolume: port to GSubprocess

This is completely untested -- I couldn't actually figure out a way to
get a GUnixVolume handed to me on the API of GIO.  Using the volume
monitor gives me no results if gvfs is uninstalled and if gvfs is
installed then I get GProxyVolumes...
Comment 8 Stefan Sperling 2014-03-03 11:24:36 UTC
Thanks Ryan for looking into this.

Gsubprocess doesn't seem to exist in glib-2.38.2 which the OpenBSD ports tree is currently using. So your patches cannot be applied there until OpenBSD's glib port is updated or Gsubprocess is part of a released version of glib (I don't know which is the case).

But I will try testing your patches with glib's git HEAD and get back to you.
Comment 9 Matthias Clasen 2014-03-04 02:16:38 UTC
Review of attachment 270740 [details] [review]:

Looks right. Would be good to have a test...
Comment 10 Matthias Clasen 2014-03-04 02:17:01 UTC
Review of attachment 270740 [details] [review]:

Looks right. Would be good to have a test...
Comment 11 Allison Karlitskaya (desrt) 2014-03-04 02:27:47 UTC
Created attachment 270864 [details] [review]
gsubprocess: test empty splices

Make sure we handle the case that our splice returns no data properly.


As requested -- fails before the patch, passes after.
Comment 12 Allison Karlitskaya (desrt) 2014-03-04 02:28:17 UTC
Attachment 270740 [details] pushed as 9f71965 - gsubprocess: fix communicate() with empty buffers
Attachment 270864 [details] pushed as 9da88a1 - gsubprocess: test empty splices
Comment 13 Matthias Clasen 2014-03-04 02:28:29 UTC
Review of attachment 270741 [details] [review]:

All I can say is the new version looks much more straightforward. But without test cases, pretty hard to be 100% sure
Comment 14 Matthias Clasen 2014-03-04 02:30:07 UTC
Review of attachment 270742 [details] [review]:

Same comment: new code looks much more straightforward
Comment 15 Allison Karlitskaya (desrt) 2014-03-11 16:21:12 UTC
Created attachment 271533 [details] [review]
GUnixVolume: implement _finish functions

The _finish functions for GUnixVolume _mount and _eject functions were
never implemented, having been simply stubbed out as 'return TRUE;'.

Implement them.
Comment 16 Allison Karlitskaya (desrt) 2014-03-11 16:22:08 UTC
I've now fully tested these changes under pass and fail cases (and, indeed, found one additional issue).  Would still be nice to hear from the original reporter, though...
Comment 17 Allison Karlitskaya (desrt) 2014-03-11 17:51:57 UTC
*** Bug 726122 has been marked as a duplicate of this bug. ***
Comment 18 Stefan Sperling 2014-03-11 22:09:53 UTC
Ryan, I would love to help test your patches. However, within several hours I couldn't manage to find a straightforward way of compiling glib from git and make use of it from Thunar without breaking existing glib-using applications on my system. I'd like to avoid stomping on the existing system installation of glib if possible but, long story short, glib's and thunar's build systems don't seem to be wired up to make this very easy. I'll need to set aside a separate machine for this (VM won't work easily since I need USB hotplug for testing, which I doubt would work on OpenBSD with qemu) and then try replacing the system glib installation.

If you don't hear from me soon please don't wait for me. Just assume I cannot afford to let more of my time sink into this. In which case I will simply have to test your changes when they eventually get released in a version of glib which OpenBSD then updates to.
Comment 19 Allison Karlitskaya (desrt) 2014-03-12 01:30:05 UTC
Attachment 270741 [details] pushed as 64ec757 - GUnixMount: port unmount to GSubprocess
Attachment 270742 [details] pushed as 62fa9c5 - GUnixVolume: port to GSubprocess
Attachment 271533 [details] pushed as ffe4e94 - GUnixVolume: implement _finish functions

Okay.  Pushing these and closing the bug, then, since this is a release blocker.

Please let me know if you are able to test this, and reopen if needed.
Comment 20 Vitaly Bogdanov 2014-03-12 03:16:46 UTC
Ryan, I have tested your changes regarding Bug 726122, and confirm that issue is fixed. New code runs much better. Thank you for fast response.
Comment 21 Ross Lagerwall 2014-04-10 19:09:40 UTC
*** Bug 722692 has been marked as a duplicate of this bug. ***