GNOME Bugzilla – Bug 724780
gvfsd-sftp beging consume 100% CPU when I try connect to already shutdown server.
Last modified: 2014-10-17 19:51:31 UTC
Created attachment 269773 [details] backtraces of gvfsd-sftp gvfsd-sftp beging consume 100% CPU when I try connect to already shutdown server.
Created attachment 269774 [details] htop screenshot
*** Bug 671127 has been marked as a duplicate of this bug. ***
I can reproduce it by: gvfs-mount sftp://8.8.8.8/
+ Trace 233246
Thread 3 (Thread 0x7fffebfff700 (LWP 8199))
I have debugged this as follows: gvfs runs the ssh process with pipes connected to stdin, stdout and stderr. It also sets up a pseudo-terminal and connects an arbitrary file descriptor to the slave end in the ssh process. It then polls on this file descriptor, waiting for it to be readable (in handle_login). The problem is that one of the first things that the ssh process does is close any arbitrary file descriptors, including the tty fd. This causes the poll call to immediately return with POLLHUP and causes the 100% CPU usage. This behavior actually happens when connecting to any host, but usually the situation is resolved quickly because when ssh connects, it opens /dev/tty which causes the poll call to return POLLIN instead of POLLHUP and then things proceed as normal. Unfortunately, I don't really know enough about this stuff to propose a fix. Perhaps Alex has a solution?
This comment from the sshpass project may shed some light on the matter with a possible solution: /* Comment no. 3.14159 This comment documents the history of code. We need to open the slavept inside the child process, after "setsid", so that it becomes the controlling TTY for the process. We do not, otherwise, need the file descriptor open. The original approach was to close the fd immediately after, as it is no longer needed. It turns out that (at least) the Linux kernel considers a master ptty fd that has no open slave fds to be unused, and causes "select" to return with "error on fd". The subsequent read would fail, causing us to go into an infinite loop. This is a bug in the kernel, as the fact that a master ptty fd has no slaves is not a permenant problem. As long as processes exist that have the slave end as their controlling TTYs, new slave fds can be created by opening /dev/tty, which is exactly what ssh is, in fact, doing. Our attempt at solving this problem, then, was to have the child process not close its end of the slave ptty fd. We do, essentially, leak this fd, but this was a small price to pay. This worked great up until openssh version 5.6. Openssh version 5.6 looks at all of its open file descriptors, and closes any that it does not know what they are for. While entirely within its prerogative, this breaks our fix, causing sshpass to either hang, or do the infinite loop again. Our solution is to keep the slave end open in both parent AND child, at least until the handshake is complete, at which point we no longer need to monitor the TTY anyways. */
Hi, I don't know if this is related to this issue, but under Fedora 20, when I'm connected via sftp using 'files' gnome app, when I start typing in order to search gvfsd-sftp process takes over ~100% CPU. And all apps that have remote files opened freezes. This issue started about a week ago with latest update. top - 11:05:37 up 2 days, 45 min, 3 users, load average: 1.06, 1.21, 1.13 Tasks: 296 total, 2 running, 294 sleeping, 0 stopped, 0 zombie %Cpu(s): 13.0 us, 0.1 sy, 0.0 ni, 86.2 id, 0.6 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 8045632 total, 7239468 used, 806164 free, 298448 buffers KiB Swap: 2047996 total, 5384 used, 2042612 free, 3735808 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24271 victor 20 0 460048 5360 3540 R 100.0 0.1 38:21.02 gvfsd-sftp [victor@localhost ~]$ uname -a Linux localhost 3.14.4-200.fc20.x86_64 #1 SMP Tue May 13 13:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux [victor@localhost ~]$ rpm -qa | grep gvfs gvfs-gphoto2-1.18.3-2.fc20.x86_64 gvfs-mtp-1.18.3-2.fc20.x86_64 gvfs-1.18.3-2.fc20.x86_64 gvfs-archive-1.18.3-2.fc20.x86_64 gvfs-fuse-1.18.3-2.fc20.x86_64 gvfs-goa-1.18.3-2.fc20.x86_64 gvfs-afp-1.18.3-2.fc20.x86_64 gvfs-afc-1.18.3-2.fc20.x86_64 gvfs-smb-1.18.3-2.fc20.x86_64
That sounds like it could be bug 720860. Next time it happens, try attach to the process (after installing the debuginfo packages) and grab a stacktrace with: thread apply all bt full http://fedoraproject.org/wiki/StackTraces Thanks
Created attachment 277906 [details] [review] sftp: Don't spin before the SSH process opens the slave pty When OpenSSH starts, it closes all its open file descriptors, including the slave pty which is used for handling the login. This would cause gvfsd-sftp to spin because the poll() call in handle_login would return immediately with POLLHUP. Open the slave pty in the parent process so that poll() doesn't return POLLHUP and spin needlessly. This change is based on what sshfs does, and is also similar to how the BSD openpty codepath behaves. Although the problem occurs briefly for any login, it is most noticeable when trying to connect to a server that drops SSH packets. E.g. gvfs-mount sftp://8.8.8.8
bug 629184 is related.
Comment on attachment 277906 [details] [review] sftp: Don't spin before the SSH process opens the slave pty Marking as needs-work. I'll redo this patch so that it doesn't leak the fd and also fix the leak in the openpty codepath.
Created attachment 279530 [details] [review] sftp: Don't spin before the SSH process opens the slave pty When OpenSSH starts, it closes all its open file descriptors, including the slave pty which is used for handling the login. This would cause gvfsd-sftp to spin because the poll() call in handle_login would return immediately with POLLHUP. Open the slave pty in the parent process so that poll() doesn't return POLLHUP and spin needlessly. Once the login has been completed, the slave fd can be closed in the parent process (this fixes an fd leak in the BSD openpty codepath). This change is based on what sshfs does, and is also similar to how the BSD openpty codepath behaves. Although the problem occurs briefly for any login, it is most noticeable when trying to connect to a server that drops SSH packets. E.g. gvfs-mount sftp://8.8.8.8
Created attachment 279531 [details] [review] pty_open: Fix warnings with BSD codepath Fix warnings by conditionally compiling functions needed for the UNIX98_PTY codepath. Also, make a function static.
Created attachment 279532 [details] [review] pty_open: Remove unused function prototype
*** Bug 731439 has been marked as a duplicate of this bug. ***
Review of attachment 279530 [details] [review]: In general this looks good, some error handling issues though. ::: daemon/pty_open.c @@ +459,3 @@ + *slave_fd = open(path, O_RDWR | O_NOCTTY); + if (*slave_fd == -1) + goto bail_stderr; This needs to call what bail_fork does, as we need to close the stderr pipe too when this fails. Also, the actual fork failure needs to call a new bail that closes the slave fd too. @@ +530,3 @@ return -1; } + close (*slave_fd); Do we really need this? Isn't it closed in the "Close most descriptors" loop? Although i would prefer if this was more explicitly done early on in the "case 0:" code.
Created attachment 286320 [details] [review] sftp: Don't spin before the SSH process opens the slave pty When OpenSSH starts, it closes all its open file descriptors, including the slave pty which is used for handling the login. This would cause gvfsd-sftp to spin because the poll() call in handle_login would return immediately with POLLHUP. Open the slave pty in the parent process so that poll() doesn't return POLLHUP and spin needlessly. Once the login has been completed, the slave fd can be closed in the parent process (this fixes an fd leak in the BSD openpty codepath). This change is based on what sshfs does, and is also similar to how the BSD openpty codepath behaves. Although the problem occurs briefly for any login, it is most noticeable when trying to connect to a server that drops SSH packets. E.g. gvfs-mount sftp://8.8.8.8
(In reply to comment #16) > Review of attachment 279530 [details] [review]: > @@ +530,3 @@ > return -1; > } > + close (*slave_fd); > > Do we really need this? Isn't it closed in the "Close most descriptors" loop? > Although i would prefer if this was more explicitly done early on in the "case > 0:" code. Thanks for the review. I suppose the close is not needed; anyway, I changed it to be done earlier.
Pushed to master as 56402c74c9b48943ef0688b01d6da08807fe405c.