GNOME Bugzilla – Bug 710986
crashes when unmounting
Last modified: 2018-09-21 17:33:30 UTC
I recently upgraded to Ubuntu 13.10 and started experiencing this issue. (The original reporter on Launchpad appears to be using gvfs 1.17 at the time, whereas I currently have 1.18 installed.) I can reproduce this issue with the following steps: 1. Open a Nautilus window. (It defaults to my home directory.) 2. Open a new tab or window. (This also opens to my home directory.) 3. In the new tab/window, navigate to a Samba share. (I have one bookmarked so I click on the bookmark in the left side panel.) A new entry for the share appears under Network in the left side panel. 4. Close the new tab/window. 5. In the left side panel, under Network, click the unmount icon next to Samba share entry. At this point, gvfsd-smb crashes. I also get two alert boxes (I believe from Nautilus): 1. Oops! Something went wrong. Unhandled error message: The connection is closed 2. Unable to unmount <share name> The connection is closed The Launchpad bug report is at https://bugs.launchpad.net/ubuntu/+source/gvfs/+bug/1205647 Please let me know if more information is needed. Thanks!
Thanks. I have seen this before and it seems to be because an unmount and a query_info operation are issued at the same time by Nautilus causing the unmount to fail with "The connection is closed" (which is wrong).
This causes in some cases also SIGSEGV. There are several bugs for it downstream: https://bugzilla.redhat.com/show_bug.cgi?id=1049966
Possible solution is block new jobs and wait until other jobs are finished before executing unmount. However it can wait forever if there is hanging some job...
commit 510a6dcfc74bd412ce478d3a458ddcbbd40f2c2c Author: Ondrej Holy <oholy@redhat.com> Date: Thu Feb 13 16:22:39 2014 +0100 smb: set context to NULL after it has been freed https://bugzilla.gnome.org/show_bug.cgi?id=710986 diff --git a/daemon/gvfsbackendsmb.c b/daemon/gvfsbackendsmb.c index 6065912..a673c59 100644 --- a/daemon/gvfsbackendsmb.c +++ b/daemon/gvfsbackendsmb.c @@ -751,6 +751,7 @@ do_unmount (GVfsBackend *backend, /* shutdown_ctx = TRUE, "all connections and files will be closed even if they are busy" */ res = smbc_free_context (op_backend->smb_context, TRUE); + op_backend->smb_context = NULL; if (res != 0) { g_vfs_job_failed_from_errno (G_VFS_JOB (job), errno);
Created attachment 269118 [details] [review] consider all jobs excepting unmount as blocking processes User can wait to finish all jobs (not only jobs with opened channel) before unmount to avoid potential crashes...
Review of attachment 269118 [details] [review]: Seems good, as long as the pid = -1 doesn't cause trouble anywhere. This will however prevent backend with stuck jobs being unmounted. Think of a stuck QueryInfo on lost network connection.
Works like intended, however it blocks session as in Bug 688471.
Created attachment 270570 [details] [review] consider all jobs excepting unmount as blocking processes Does the same thing as previous one, however without faking pids.
Still this doesn't solve race conditions, however it is good way to avoid them.
Review of attachment 270570 [details] [review]: Apart from the (important) issue found it looks good. ::: daemon/gvfsdaemon.c @@ +1077,3 @@ + if (!G_VFS_IS_JOB_UNMOUNT (l->data)) + { + return TRUE; Missed mutex unlock.
I gave the patch a test and it didn't work for me (after fixing Tomas's issue). I was still able to cause a SIGSEGV in the backend daemon. I haven't yet had a chance to figure out why because I got distracted with another unmount race: See bug 731077 for the unmount race. I'd appreciate if Tomas, as master of the GDBus port, could take a look :-)
Created attachment 277685 [details] test program Attached is the test program I used to cause the SIGSEGV and some sample output: """ $ ./unmount2 smb://localhost/tmp mount starting mount finishing mount finished finding enclosing mount finding enclosing mount finished unmount starting query info starting query info finishing query error: Backend currently unmounting /// this worked correctly query info finished unmount finishing unmount finished /// this worked correctly mount starting mount finishing mount finished finding enclosing mount finding enclosing mount finished unmount starting query info starting unmount finishing unmount error: The connection is closed /// this segfaulted unmount finished query info finishing query error: The connection is closed /// this segfaulted query info finished ... """
Unfortunately it doesn't cover all cases. It doesn't work in your case probably, because you just simply ignore GMountOperation. GMountOperation asks you if you want to cancel or unmount anyway. However we can't wait by default, because some operation can be stucked...
Created attachment 279609 [details] [review] consider all jobs excepting unmount as blocking processes Thanks for review, again with mutex unlock :-)
Review of attachment 279609 [details] [review]: Looks good, but you also need the following: diff --git a/daemon/gvfsjobunmount.c b/daemon/gvfsjobunmount.c index 416fb35..205407c 100644 --- a/daemon/gvfsjobunmount.c +++ b/daemon/gvfsjobunmount.c @@ -188,7 +188,7 @@ job_finish_immediately_if_possible (GVfsJobUnmount *op_job) if (class->try_unmount != NULL || class->unmount != NULL) return FALSE; - is_busy = g_vfs_backend_has_blocking_processes (backend); + is_busy = g_vfs_daemon_has_blocking_processes (g_vfs_backend_get_daemon (backend)); force_unmount = op_job->flags & G_MOUNT_UNMOUNT_FORCE; if (is_busy && ! force_unmount) Please commit with this change.
Thank you for the review, but I don't see what I should to fix. Are you forget to write the change or have I overlooked something?
(In reply to comment #16) > Thank you for the review, but I don't see what I should to fix. Are you forget > to write the change or have I overlooked something? In job_finish_immediately_if_possible() in gvfsjobunmount.c, you need to change the g_vfs_backend_has_blocking_processes() call to g_vfs_daemon_has_blocking_processes() for it to compile correctly.
Sorry, I'm blind... will fix it and commit, thanks :-)
Comment on attachment 279609 [details] [review] consider all jobs excepting unmount as blocking processes commit be4d7fa3b44a8c195bf6e19cf2a44ff9440ffc07
I closed it by mistake, we need to make unmount procedure more robust to cover all cases, so reopen...
Can you say what the remaining cases are that still need to be fixed? (There's also bug 731077 (partially been fixed) which is an unmount race on the client side.) The title of this bug and the component should be probably be updated since I don't think these races are specific to the smb backend.
I see several problems with the unmount procedure: 1) There is check for blocking processes before execution unmount operation. Blocking process is any type of job now (see attached patch). However in case of force unmount or unhandled show-processes signal, unmount can be executed in parallel with other jobs. We should to avoid this situation. It can be improved by canceling running jobs and wait some time before executing the unmount operation. 2) New jobs are blocked and mount is unregistered after successful unmount. However new jobs can be executed after the check for blocking process and before successful unmount. We should to block new jobs before the unmount operation. So I am changing the bug title and will attach patches to improve unmount procedure later.
Created attachment 286219 [details] [review] gvfsjobunmount: set error as per user interaction If the dialog about blocking processes was cancelled, G_IO_ERROR_FAILED_HANDLED will be returned. If it wasn't interacted and there were outstanding jobs, G_IO_ERROR_BUSY will be returned.
Created attachment 286220 [details] [review] gvfs-mount: allow specifying G_MOUNT_UNMOUNT_FORCE
Created attachment 286223 [details] [review] gvfsjobunmount: block new requests before unmount Jobs are block after successful unmount. However new jobs can be executed after check for blocking process and before successful unmount. We should to avoid running another jobs concurrently with unmount operation. This patch blocks new request before unmount (do not block existing channel requests). So this should solve second issue from the Comment 22...
Created attachment 286285 [details] [review] client: increase dbus proxy timeout for unmount Default timeout is too short and error "Timeout was reached" could be shown if user don't interact with unmount dialog. The patch sets the timeout to 30 minutes as it is set on server side.
Review of attachment 286220 [details] [review]: Looks good, with minor comments. ::: man/gvfs-mount.xml @@ +115,3 @@ + <term><option>-f</option>, <option>--force</option></term> + + <listitem><para>Ignore outstanding file operations.</para></listitem> Probably good to indicate that it applies to unmounting: "Ignore outstanding file operations when unmounting or ejecting." ::: programs/gvfs-mount.c @@ +63,3 @@ { "eject", 'e', 0, G_OPTION_ARG_NONE, &mount_eject, N_("Eject"), NULL}, { "unmount-scheme", 's', 0, G_OPTION_ARG_STRING, &unmount_scheme, N_("Unmount all mounts with the given scheme"), N_("SCHEME") }, + { "force", 'f', 0, G_OPTION_ARG_NONE, &force, N_("Ignore outstanding file operations"), NULL }, Agai, probably good to indicate that it applies to unmounting: "Ignore outstanding file operations when unmounting or ejecting"
Review of attachment 286219 [details] [review]: The overall approach looks OK. I'm not sure what you mean by "it wasn't interacted" and when the G_IO_ERROR_BUSY error is returned. Additionally, I think that if the client program does not pass in a GMountOperation, instead of just force unmounting, it should return G_IO_ERROR_BUSY unless the force_unmount flag is given. Something like: diff --git a/daemon/gvfsjobunmount.c b/daemon/gvfsjobunmount.c index bf3cc42..9d60751 100644 --- a/daemon/gvfsjobunmount.c +++ b/daemon/gvfsjobunmount.c @@ -253,13 +254,17 @@ try (GVfsJob *job) is_busy = g_vfs_daemon_has_blocking_processes (g_vfs_backend_get_daemon (backend)); force_unmount = op_job->flags & G_MOUNT_UNMOUNT_FORCE; - if (is_busy && ! force_unmount - && ! g_mount_source_is_dummy (op_job->mount_source)) + if (is_busy && ! force_unmount) { - g_vfs_backend_unmount_with_operation (backend, - op_job->mount_source, - (GAsyncReadyCallback) unmount_cb, - op_job); + if (g_mount_source_is_dummy (op_job->mount_source)) + g_vfs_job_failed_literal (G_VFS_JOB (op_job), + G_IO_ERROR, G_IO_ERROR_BUSY, + _("File system is busy")); + else + g_vfs_backend_unmount_with_operation (backend, + op_job->mount_source, + (GAsyncReadyCallback) unmount_cb, + op_job); return TRUE; } ::: daemon/gvfsbackend.c @@ +825,3 @@ + g_simple_async_result_set_error (simple, G_IO_ERROR, G_IO_ERROR_BUSY, + _("File system is busy")); + ret = FALSE; In what situations is branch taken? The only way I could find was if the dbus operation got an error (e.g. the client program disappeared). @@ +910,3 @@ + * If the operation was cancelled, G_IO_ERROR_FAILED_HANDLED will be returned. + * If the operation wasn't interacted and there were outstanding jobs, + * G_IO_ERROR_BUSY will be returned. How can I test the G_IO_ERROR_BUSY error?
Review of attachment 286285 [details] [review]: Looks good!
Created attachment 286799 [details] [review] gvfs-mount: allow specifying G_MOUNT_UNMOUNT_FORCE (In reply to comment #27) > Review of attachment 286220 [details] [review]: > Probably good to indicate that it applies to unmounting: "Ignore outstanding > file operations when unmounting or ejecting." Good point, fixed :-)
Created attachment 286802 [details] [review] gvfsjobunmount: set error as per client interaction (In reply to comment #28) > Review of attachment 286219 [details] [review]: > > The overall approach looks OK. I'm not sure what you mean by "it wasn't > interacted" and when the G_IO_ERROR_BUSY error is returned. It means when GMountOperation is passed in, but "show-processes" signal isn't handled. I reworded it in the attached patch. There are use cases for unmounting when g_vfs_daemon_has_blocking_processes is TRUE: force = TRUE => do/try force = FALSE & is_dummy = TRUE => G_IO_ERROR_BUSY force = FALSE & is_dummy = FALSE & UNHANDLED => G_IO_ERROR_BUSY force = FALSE & is_dummy = FALSE & ABORTED => G_IO_ERROR_FAILED_HANDLED force = FALSE & is_dummy = FALSE & HANDLED (Cancel) => G_IO_ERROR_FAILED_HANDLED force = FALSE & is_dummy = FALSE & HANDLED (Force unmount) => do/try See: https://developer.gnome.org/gio/stable/GMountOperation.html#GMountOperationResult You can test it e.g. this way: $ gvfs-mount localtest:// $ gvfs-cat localtest:///dev/random > /dev/null $ gvfs-mount -u localtest:// Error unmounting mount: File system is busy > Additionally, I think that if the client program does not pass in a > GMountOperation, instead of just force unmounting, it should return > G_IO_ERROR_BUSY unless the force_unmount flag is given. Something like: That's right, I missed that use case firstly...
Review of attachment 286799 [details] [review]: Looks good!
Review of attachment 286802 [details] [review]: Yes, that looks good. I like your enumeration of all the different cases :-)
Review of attachment 286223 [details] [review]: Why shouldn't new channel jobs be blocked as well? Otherwise you could have the following sequence: Begin unmount Check for blocking processes -> false Start executing backend unmount code in a thread --> client does a read on an existing channel This sequence could easily cause a segfault because the backend is unmounted while a read is in progress. Secondly, I think we should only block requests when we're actually going to unmount the backend. Otherwise, requests may be blocked for a long time (while waiting for the user to make a decision) or unnecessarily (if force is needed but not specified). In the case of backends that implement try_unmount(), this is easy since all the job handling code is executed in the main thread so it already works by just blocking requests in send_reply. Only when the backend implements unmount() in a thread do need to block requests before calling the method.
Created attachment 287304 [details] [review] gvfsjobunmount: Block new requests before calling unmount() on a thread Block new requests before calling unmount() on a separate thread to prevent a race where new jobs are received and processed while the unmount() is being executed. This is not necessary for try_unmount() because all the job handling is done on the same thread as the try_unmount() method and requests are blocked when the try_unmount() method completes.
Created attachment 287305 [details] [review] mtp: Remove fail-during-unmount workaround Since jobs should now fail properly when an unmount is in progress, remove the mtp backend-specific implementation.
*** Bug 719327 has been marked as a duplicate of this bug. ***
Review of attachment 287304 [details] [review]: I think this supersedes attachment 286223 [details] [review].
Created attachment 287309 [details] [review] mtp: Remove fail-during-unmount workaround Since jobs should now fail properly when an unmount is in progress, remove the mtp backend-specific implementation.
Created attachment 287310 [details] [review] daemon: Don't abort if jobs != NULL when finalizing Don't abort if there are outstanding jobs when finalizing. This may happen if the backend is force unmounted. Use a warning instead.
Comment on attachment 286285 [details] [review] client: increase dbus proxy timeout for unmount commit 22c7d972914fb62c60e260ad0da86bcd595745c3
(In reply to comment #34) > Review of attachment 286223 [details] [review]: > > Why shouldn't new channel jobs be blocked as well? Otherwise you could have the > following sequence: > > Begin unmount > Check for blocking processes -> false > Start executing backend unmount code in a thread > --> client does a read on an existing channel > > This sequence could easily cause a segfault because the backend is unmounted > while a read is in progress. That's true, my fault (don't know why I've thought open channel is blocking process)... > Secondly, I think we should only block requests when we're actually going to > unmount the backend. Otherwise, requests may be blocked for a long time (while > waiting for the user to make a decision) I don't think it is problem while the unmount dialog is system model (at least for nautilus). It's better to block jobs as soon as possible to avoid concurrent run in case of force unmount. > or unnecessarily (if force is needed but not specified). But this could be problem for some applications probably (allowing channel requests isn't enough probably).
Review of attachment 287304 [details] [review]: Hmm, it is probably the best, what we can do if we can't block new jobs before g_vfs_backend_unmount_with_operation, but I'm afraid this doesn't solve the second problem I've mentioned in the Comment #22...
Review of attachment 287310 [details] [review]: Would be good to make a comment that this could happen in a case of force unmount, otherwise looks good.
Review of attachment 287309 [details] [review]: Looks good to me, however it should be committed after the problems from the Comment #22 will be fixed.
(In reply to comment #43) > Review of attachment 287304 [details] [review]: > > Hmm, it is probably the best, what we can do if we can't block new jobs before > g_vfs_backend_unmount_with_operation, but I'm afraid this doesn't solve the > second problem I've mentioned in the Comment #22... It does solve the problem because the check for blocking processes is always done before the unmount() method is called, and it is always done on the daemon's main thread, the thread that accepts new jobs. So there is no way that a concurrent race could happen because it is all done in a single thread. The only way that a new job could be processed during an unmount is when the unmount() method is executed in a thread pool. This patch fixes that case by blocking new jobs just before running the unmount() method on a thread pool.
Even if this happens via complete_unmount_with_op(), it is still executed on the main thread, and so the same logic applies.
You can test this out by putting sleep(10); in a backend's unmount() method or various places in the unmount code (e.g. just after checking for blocking processes in gvfsjobunmount.c). Any concurrent job issued should fail with "Backend currently unmounting".
Comment on attachment 287304 [details] [review] gvfsjobunmount: Block new requests before calling unmount() on a thread I thought executing new jobs when the unmount dialog is shown, but we can't block it during the time as you wrote before...
Created attachment 287667 [details] [review] daemon: add has-blocking-processes property with notify support Only running jobs are used to decide there are blocking processes currently, consider open channels as blocking processes also. Add has-blocking-processes property to be possible track changes by notify signals.
Created attachment 287669 [details] [review] gvfsjobunmount: wait for blocking jobs Backend may crash if unmount is executed and there are running another jobs. Wait some time to finish jobs before unmount if force close is executed to avoid potentional crashes.
Those patches are draft for fixing the first issue from the Comment 22. It is just partial fix, because I am afraid we can fix it fully in case of force unmount. It has to be more tested. I have found Bug 737842 during the testing already, so I have filed new bug report, because there is a quite a lot of patches and comments. What are you thinking about?
Actually I have another idea how to fix it. We can just call g_vfs_job_succeeded() without calling unmount operation of the backend. Something like g_vfs_backend_force_unmount(). But I'm afraid it could cause another failures if the device isn't unmounted properly (especially for mtp, ptp etc.).
Review of attachment 279609 [details] [review]: If all jobs are blocking, should not g_vfs_daemon_get_blocking_processes() return the pids of those too? Also, is this right for the case where one daemon process handles multiple backends? Don't we need to pass the backend we're unmounting to has_blocking_processes().
Review of attachment 287667 [details] [review]: I wonder if we should not in some sense treat open fds channels differently than executing jobs. Yes, they are a source of possible incomming jobs, however we can easily and safely block that, and we can completely safely terminate the channels in the force-unmount case (by just closing the fd if nothing else). That is not the case for outstanding i/o operations, we can't really safely terminate those but instead have to wait a bit and if nothing happens we force terminate the process, hoping things are "ok". ::: daemon/gvfsdaemon.c @@ +422,3 @@ + daemon->blocking_sources_count--; + if (daemon->blocking_sources_count == 0) + g_object_notify (G_OBJECT (daemon), "has-blocking-processes"); Its pretty risky to do callouts like this with the lock held. It can very easily lead to deadlocks.
Review of attachment 287669 [details] [review]: ::: daemon/gvfsbackend.c @@ +59,3 @@ #include <gvfsdbus.h> +#define WAIT_FOR_PROCESSES_TIMEOUT_IN_SECONDS 15 This seems a bit long for a *forced* unmount. I'd say 5 seconds or so at the most. @@ +1096,3 @@ + data->user_data = user_data; + + if (g_vfs_daemon_has_blocking_processes (daemon)) I don't think this is quite right. If we're actively in a force-unmount there is no need to wait for open channels to be closed. We can safely close those. We should only wait for open i/o ops.
Review of attachment 287669 [details] [review]: Also, we should probably send an error to the client for all outstanding operations, and then when we get the real operation result we just ignore that.
Review of attachment 287669 [details] [review]: ::: daemon/gvfsbackend.c @@ +1096,3 @@ + data->user_data = user_data; + + if (g_vfs_daemon_has_blocking_processes (daemon)) Especially considering the fact that there is a high probability that an open file will not be closed immediately, so just waiting will not get you anywhere.
(In reply to comment #54) > Review of attachment 279609 [details] [review]: > > If all jobs are blocking, should not g_vfs_daemon_get_blocking_processes() > return the pids of those too? Definitely it would be better, we have to extend d-bus api for it. Filed separate bug for it, Bug 738218. > Also, is this right for the case where one daemon process handles multiple > backends? This isn't right if it is possible. When this could happen?
Comment on attachment 287310 [details] [review] daemon: Don't abort if jobs != NULL when finalizing Pushed to master as 9f2651129d86c4d926e7e186813922d2ed5d2e23 (with a comment). Thanks for the review.
(In reply to comment #59) > (In reply to comment #54) > > Review of attachment 279609 [details] [review] [details]: > > > > If all jobs are blocking, should not g_vfs_daemon_get_blocking_processes() > > return the pids of those too? > > Definitely it would be better, we have to extend d-bus api for it. Filed > separate bug for it, Bug 738218. > > > Also, is this right for the case where one daemon process handles multiple > > backends? > > This isn't right if it is possible. When this could happen? This happens with http backends. A single process handles all the http backends. From the developer docs: "When talking about backends, let's see how objects are organized. A daemon instance represents the process itself. Daemon can handle one or more mounts (this is actually determined by the master gvfs daemon, the backend daemon infrastructure is generally always able to handle multiple mounts) and maintains list of job sources. Default job source is created for every single mount (GVfsBackend instance), additional job sources are created for each opened file (both for reading and writing, GVfsChannel). Job source (GVfsJobSource) is an interface implemented by GVfsBackend and GVfsChannel."
Comment on attachment 286799 [details] [review] gvfs-mount: allow specifying G_MOUNT_UNMOUNT_FORCE commit 038f186fa471c1da1b5c72aaae7dca89aa098aa6
Comment on attachment 286802 [details] [review] gvfsjobunmount: set error as per client interaction commit db89e2908ec3da88472fd80aea20382692d48217
In this case it is wrong, so I've filed separate Bug 738563.
Comment on attachment 287304 [details] [review] gvfsjobunmount: Block new requests before calling unmount() on a thread Pushed to master as 5904c6d9614b48311217faac66b4e2f57174ba9d. Thanks for the review!
Comment on attachment 287309 [details] [review] mtp: Remove fail-during-unmount workaround Pushed to master as 0dedf40fec26e793692d8e4d2911a157ba7203a4. Thanks for the review!
Created attachment 288950 [details] [review] daemon: add blocking-processes-count property with notify support - added blocking_processes_count property (However usage of word JOBS instead of PROCESSES would be more accurately, because one process could handle multiple jobs. But I let this this name to correspond with g_vfs_daemon_*_blocking_processes functions.) - notify isn't sent under the lock - it counts only jobs (not channels)
Created attachment 288951 [details] [review] gvfsjobunmount: wait for blocking jobs - timeout decreased to 5 seconds - we don't wait for closing the channels (there are force closed on send_replay)
Created attachment 288952 [details] [review] gvfsjobunmount: send error for unfinished jobs before unmount
Review of attachment 288951 [details] [review]: Have found some issues by tests, so marking as needs-work before uploading fixed version...
Created attachment 289272 [details] [review] gvfsjobunmount: wait for blocking jobs Fixed and rebased version to master to wait before unmount... However I've just realized (because I tested it before with localtest) that smb and lot of other backends has set MAX_JOB_THREADS=1. So there isn't possible the case 1 from the comment #22 (multiple jobs are running concurrently) and unmount job is executed after previous jobs are finished. So for most of backends should be crashing fixed by the attachment 287304 [details] [review] and the patch for waiting before unmount doesn't make sense a lot. But probably this could be still issue for the rest of backends e.g. afc, afp, and backends using try_ methods, since we haven't already bug reports for them...
Comment on attachment 288952 [details] [review] gvfsjobunmount: send error for unfinished jobs before unmount Sending error for unfinished jobs isn't pretty good idea since there is only one thread in most cases and we are waiting for finishing previous jobs... (also this version of the patch causes deadlock)
Comment on attachment 287304 [details] [review] gvfsjobunmount: Block new requests before calling unmount() on a thread Would be good to backport this patch for gnome-3-14 also...
Comment on attachment 287304 [details] [review] gvfsjobunmount: Block new requests before calling unmount() on a thread pushed for gnome-3-14 as commit 87c9f6d80ac612caedb7b39b22b3d4ed2804bb0b
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gvfs/issues/212.