Bug 724780 – gvfsd-sftp beging consume 100% CPU when I try connect to already shutdown server.

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 724780 - gvfsd-sftp beging consume 100% CPU when I try connect to already shutdown server.


Summary:	gvfsd-sftp beging consume 100% CPU when I try connect to already shutdown ser...


Status:	RESOLVED FIXED

Product:	gvfs
Classification:	Core
Component:	sftp backend
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gvfs-maint
QA Contact:	gvfs-maint

URL:
Whiteboard:

Duplicates:	671127 731439 (view as bug list)
Depends on:
Blocks:

Reported:	2014-02-20 08:14 UTC by Mikhail
Modified:	2014-10-17 19:51 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
backtraces of gvfsd-sftp (1.13 KB, application/gzip) 2014-02-20 08:14 UTC, Mikhail		Details
htop screenshot (213.08 KB, image/png) 2014-02-20 08:14 UTC, Mikhail		Details
sftp: Don't spin before the SSH process opens the slave pty (2.13 KB, patch) 2014-06-04 20:51 UTC, Ross Lagerwall	needs-work	Details \| Review
sftp: Don't spin before the SSH process opens the slave pty (7.37 KB, patch) 2014-06-29 14:16 UTC, Ross Lagerwall	needs-work	Details \| Review
pty_open: Fix warnings with BSD codepath (1.71 KB, patch) 2014-06-29 14:16 UTC, Ross Lagerwall	committed	Details \| Review
pty_open: Remove unused function prototype (777 bytes, patch) 2014-06-29 14:16 UTC, Ross Lagerwall	committed	Details \| Review
sftp: Don't spin before the SSH process opens the slave pty (7.64 KB, patch) 2014-09-16 18:48 UTC, Ross Lagerwall	committed	Details \| Review

Description Mikhail 2014-02-20 08:14:28 UTC

Created attachment 269773 [details]
backtraces of gvfsd-sftp

gvfsd-sftp beging consume 100% CPU when I try connect to already shutdown server.

Comment 1 Mikhail 2014-02-20 08:14:49 UTC

Created attachment 269774 [details]
htop screenshot

Comment 2 Ondrej Holy 2014-02-20 12:50:51 UTC

*** Bug 671127 has been marked as a duplicate of this bug. ***

Comment 3 Ondrej Holy 2014-02-20 12:51:41 UTC

I can reproduce it by:
gvfs-mount sftp://8.8.8.8/

Comment 4 Ondrej Holy 2014-02-28 15:11:26 UTC

+ Trace 233246

Thread 3 (Thread 0x7fffebfff700 (LWP 8199))

#0 poll
at ../sysdeps/unix/syscall-template.S line 81
#1 handle_login
at gvfsbackendsftp.c line 884
#2 do_mount
at gvfsbackendsftp.c line 1624
#3 real_do_mount
at gvfsbackendsftp.c line 1740
#4 run
at gvfsjobmount.c line 116
#5 g_vfs_job_run
at gvfsjob.c line 197
#6 job_handler_callback
at gvfsdaemon.c line 201
#7 g_thread_pool_thread_proxy
at gthreadpool.c line 309
#8 g_thread_proxy
at gthread.c line 798
#9 start_thread
at pthread_create.c line 309
#10 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 111

Comment 5 Ross Lagerwall 2014-03-03 08:17:42 UTC

I have debugged this as follows:

gvfs runs the ssh process with pipes connected to stdin, stdout and stderr. It also sets up a pseudo-terminal and connects an arbitrary file descriptor to the slave end in the ssh process.

It then polls on this file descriptor, waiting for it to be readable (in handle_login). The problem is that one of the first things that the ssh process does is close any arbitrary file descriptors, including the tty fd. This causes the poll call to immediately return with POLLHUP and causes the 100% CPU usage.

This behavior actually happens when connecting to any host, but usually the situation is resolved quickly because when ssh connects, it opens /dev/tty which causes the poll call to return POLLIN instead of POLLHUP and then things proceed as normal.

Unfortunately, I don't really know enough about this stuff to propose a fix. Perhaps Alex has a solution?

Comment 6 Ross Lagerwall 2014-03-03 10:00:05 UTC

This comment from the sshpass project may shed some light on the matter with a possible solution:

 /*
Comment no. 3.14159
This comment documents the history of code.
We need to open the slavept inside the child process, after "setsid", so that it becomes the controlling
TTY for the process. We do not, otherwise, need the file descriptor open. The original approach was to
close the fd immediately after, as it is no longer needed.
It turns out that (at least) the Linux kernel considers a master ptty fd that has no open slave fds
to be unused, and causes "select" to return with "error on fd". The subsequent read would fail, causing us
to go into an infinite loop. This is a bug in the kernel, as the fact that a master ptty fd has no slaves
is not a permenant problem. As long as processes exist that have the slave end as their controlling TTYs,
new slave fds can be created by opening /dev/tty, which is exactly what ssh is, in fact, doing.
Our attempt at solving this problem, then, was to have the child process not close its end of the slave
ptty fd. We do, essentially, leak this fd, but this was a small price to pay. This worked great up until
openssh version 5.6.
Openssh version 5.6 looks at all of its open file descriptors, and closes any that it does not know what
they are for. While entirely within its prerogative, this breaks our fix, causing sshpass to either
hang, or do the infinite loop again.
Our solution is to keep the slave end open in both parent AND child, at least until the handshake is
complete, at which point we no longer need to monitor the TTY anyways.
*/

Comment 7 victorz 2014-06-04 16:10:24 UTC

Hi, I don't know if this is related to this issue, but under Fedora 20, when I'm connected via sftp using 'files' gnome app, when I start typing in order to search gvfsd-sftp process takes over ~100% CPU. And all apps that have remote files opened freezes.

This issue started about a week ago with latest update.

top - 11:05:37 up 2 days, 45 min,  3 users,  load average: 1.06, 1.21, 1.13
Tasks: 296 total,   2 running, 294 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.0 us,  0.1 sy,  0.0 ni, 86.2 id,  0.6 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8045632 total,  7239468 used,   806164 free,   298448 buffers
KiB Swap:  2047996 total,     5384 used,  2042612 free,  3735808 cached

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
24271 victor    20   0  460048   5360   3540 R 100.0  0.1  38:21.02 gvfsd-sftp

[victor@localhost ~]$ uname -a
Linux localhost 3.14.4-200.fc20.x86_64 #1 SMP Tue May 13 13:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

[victor@localhost ~]$ rpm -qa | grep gvfs
gvfs-gphoto2-1.18.3-2.fc20.x86_64
gvfs-mtp-1.18.3-2.fc20.x86_64
gvfs-1.18.3-2.fc20.x86_64
gvfs-archive-1.18.3-2.fc20.x86_64
gvfs-fuse-1.18.3-2.fc20.x86_64
gvfs-goa-1.18.3-2.fc20.x86_64
gvfs-afp-1.18.3-2.fc20.x86_64
gvfs-afc-1.18.3-2.fc20.x86_64
gvfs-smb-1.18.3-2.fc20.x86_64

Comment 8 Ross Lagerwall 2014-06-04 16:55:24 UTC

That sounds like it could be bug 720860.

Next time it happens, try attach to the process (after installing the debuginfo packages) and grab a stacktrace with:
thread apply all bt full

http://fedoraproject.org/wiki/StackTraces

Thanks

Comment 9 Ross Lagerwall 2014-06-04 20:51:55 UTC

Created attachment 277906 [details] [review]
sftp: Don't spin before the SSH process opens the slave pty

When OpenSSH starts, it closes all its open file descriptors, including
the slave pty which is used for handling the login.  This would cause
gvfsd-sftp to spin because the poll() call in handle_login would return
immediately with POLLHUP.  Open the slave pty in the parent process so
that poll() doesn't return POLLHUP and spin needlessly.

This change is based on what sshfs does, and is also similar to how the
BSD openpty codepath behaves.

Although the problem occurs briefly for any login, it is most noticeable
when trying to connect to a server that drops SSH packets.
E.g. gvfs-mount sftp://8.8.8.8

Comment 10 Ross Lagerwall 2014-06-04 20:52:55 UTC

bug 629184 is related.

Comment 11 Ross Lagerwall 2014-06-11 16:54:23 UTC

Comment on attachment 277906 [details] [review]
sftp: Don't spin before the SSH process opens the slave pty

Marking as needs-work. I'll redo this patch so that it doesn't leak the fd and also fix the leak in the openpty codepath.

Comment 12 Ross Lagerwall 2014-06-29 14:16:45 UTC

Created attachment 279530 [details] [review]
sftp: Don't spin before the SSH process opens the slave pty

When OpenSSH starts, it closes all its open file descriptors, including
the slave pty which is used for handling the login.  This would cause
gvfsd-sftp to spin because the poll() call in handle_login would return
immediately with POLLHUP.  Open the slave pty in the parent process so
that poll() doesn't return POLLHUP and spin needlessly.

Once the login has been completed, the slave fd can be closed in the
parent process (this fixes an fd leak in the BSD openpty codepath).

This change is based on what sshfs does, and is also similar to how the
BSD openpty codepath behaves.

Although the problem occurs briefly for any login, it is most noticeable
when trying to connect to a server that drops SSH packets.
E.g. gvfs-mount sftp://8.8.8.8

Comment 13 Ross Lagerwall 2014-06-29 14:16:50 UTC

Created attachment 279531 [details] [review]
pty_open: Fix warnings with BSD codepath

Fix warnings by conditionally compiling functions needed for the
UNIX98_PTY codepath.  Also, make a function static.

Comment 14 Ross Lagerwall 2014-06-29 14:16:53 UTC

Created attachment 279532 [details] [review]
pty_open: Remove unused function prototype

Comment 15 Ross Lagerwall 2014-08-25 14:39:09 UTC

*** Bug 731439 has been marked as a duplicate of this bug. ***

Comment 16 Alexander Larsson 2014-09-16 07:44:11 UTC

Review of attachment 279530 [details] [review]:

In general this looks good, some error handling issues though.

::: daemon/pty_open.c
@@ +459,3 @@
+	*slave_fd = open(path, O_RDWR | O_NOCTTY);
+	if (*slave_fd == -1)
+		goto bail_stderr;

This needs to call what bail_fork does, as we need to close the stderr pipe too when this fails.
Also, the actual fork failure needs to call a new bail that closes the slave fd too.

@@ +530,3 @@
 			return -1;
 		}
+		close (*slave_fd);

Do we really need this? Isn't it closed in the "Close most descriptors" loop?
Although i would prefer if this was more explicitly done early on in the "case 0:" code.

Comment 17 Ross Lagerwall 2014-09-16 18:48:46 UTC

Created attachment 286320 [details] [review]
sftp: Don't spin before the SSH process opens the slave pty

When OpenSSH starts, it closes all its open file descriptors, including
the slave pty which is used for handling the login.  This would cause
gvfsd-sftp to spin because the poll() call in handle_login would return
immediately with POLLHUP.  Open the slave pty in the parent process so
that poll() doesn't return POLLHUP and spin needlessly.

Once the login has been completed, the slave fd can be closed in the
parent process (this fixes an fd leak in the BSD openpty codepath).

This change is based on what sshfs does, and is also similar to how the
BSD openpty codepath behaves.

Although the problem occurs briefly for any login, it is most noticeable
when trying to connect to a server that drops SSH packets.
E.g. gvfs-mount sftp://8.8.8.8

Comment 18 Ross Lagerwall 2014-09-16 18:50:46 UTC

(In reply to comment #16)
> Review of attachment 279530 [details] [review]:
> @@ +530,3 @@
>              return -1;
>          }
> +        close (*slave_fd);
> 
> Do we really need this? Isn't it closed in the "Close most descriptors" loop?
> Although i would prefer if this was more explicitly done early on in the "case
> 0:" code.

Thanks for the review.  I suppose the close is not needed; anyway, I changed it to be done earlier.

Comment 19 Ross Lagerwall 2014-10-17 19:51:05 UTC

Pushed to master as 56402c74c9b48943ef0688b01d6da08807fe405c.