Bug 720860 – Hang when peer connection closes while enumerating

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 720860 - Hang when peer connection closes while enumerating


Summary:	Hang when peer connection closes while enumerating


Status:	RESOLVED OBSOLETE

Product:	gvfs
Classification:	Core
Component:	daemon
Version:	git master
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	---
Assigned To:	gvfs-maint
QA Contact:	gvfs-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-12-21 08:25 UTC by Ross Lagerwall
Modified:	2018-09-21 17:37 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Allow cancelling jobs that have sent a reply but not finished (1.15 KB, patch) 2014-09-13 11:06 UTC, Ross Lagerwall	committed	Details \| Review

Description Ross Lagerwall 2013-12-21 08:25:39 UTC

The is an easily reproducible hang when searching in nautilus with the sftp backend.

If the peer connection closes between the time that the enumerate job calls send_reply() and the time that the enumerate job actually finishes, then it gets stuck in peer_connection_closed() trying to cancel a job that cannot be cancelled.

It cannot be cancelled because g_vfs_job_cancel() does nothing if send_reply has been called.

Removing the "|| job->sent_reply" clause in g_vfs_job_cancel() fixes the problem, but I'm not sure if it is the correct fix.

Comment 1 Alexander Larsson 2014-01-13 10:36:32 UTC

Maybe it should look at job->finished instead. This would mean that the cancel signal can get into the enumerate job inbetween the time it has sent the reply and not finished the enumeration. For many operations there is no real difference, because GVfsJobDbus and GVfs channel emits it in send_reply, 
The only difference is in mount/unmount/enumerate that override dbus_send_reply().

Comment 2 Ross Lagerwall 2014-09-13 11:06:19 UTC

Created attachment 286128 [details] [review]
Allow cancelling jobs that have sent a reply but not finished

A few job types (mount, unmount and enumerate) can be in a state where
sent_reply is TRUE but finished is FALSE because they override
send_reply.  If the peer connection closes during this period, the
daemon hangs in peer_connection_closed because the job still exists but
cannot be cancelled.  To fix this, allow cancelling jobs that have sent
a reply but not yet finished.

This can be reproduced fairly easily by doing a search on an sftp mount
in Nautilus.

Comment 3 Ross Lagerwall 2014-09-13 11:06:50 UTC

(In reply to comment #1)
> Maybe it should look at job->finished instead. This would mean that the cancel
> signal can get into the enumerate job inbetween the time it has sent the reply
> and not finished the enumeration. For many operations there is no real
> difference, because GVfsJobDbus and GVfs channel emits it in send_reply, 
> The only difference is in mount/unmount/enumerate that override
> dbus_send_reply().

That makes sense, and the fix seems to work!

Comment 4 Alexander Larsson 2014-09-16 07:27:12 UTC

This is a slight change in semantics. We're now allowed to cancel the operation after we have *started* but not finished sending a reply. This makes sense, as the sending of a reply is an i/o operation that could block, especially in the case of enumerate as above where the reply is actually a streaming op. 

I worry a bit that we're not correctly handing this in every case though. For instance, say we force unmount the mount causing a gvfs_job_cancel() at the exact time that a GVfsReadChannel read call finished. That means we're after g_vfs_job_send_reply() which sents sent_reply=1 but we've only queued an async write of the reply on the channel via g_vfs_channel_send_reply(). However, atm GVfsChannel doesn't seems to look at the job cancellable for the async operation so it will never recieve it and may block forever.

However, this is clearly a much tinier race than the huge race the above finishes, so i think we should apply this for now and then do further work ensuring that no job can block forever if they get a gvfs_job_cancel() between gvfs_job_send_reply() and gvfs_job_emit_finished(). Of course, as i said above, most standard job types calls emit_finished() unconditionally in send_reply, so this is a limited search.

Comment 5 Ross Lagerwall 2014-09-27 09:50:29 UTC

Pushed to master as fcffe33ba57f968e2cb0073cb201007808155702. Thanks for the review.

Leaving open to fix the smaller race in comment 4.

Comment 6 GNOME Infrastructure Team 2018-09-21 17:37:51 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gvfs/issues/220.