GNOME Bugzilla – Bug 465036
gnome-pty-helper locks /var/run/utmp
Last modified: 2015-05-09 17:29:15 UTC
Please describe the problem: After logging out from the console under SUSE Enterprise Server 10 there stays an gnome-pty-helper process who locks the utmp, so that no other record can made in. For instance the unix command "who" starts to give wrong results, because utmp can't be updated anymore. This bug is known in the web, see also http://readlist.com/lists/lists.debian.org/debian-user/16/84731.html After killing the process gnome-pty-helper with kill -9, utmp ist updateable again and "who" starts to give right results again. I use SUSE Enterprise Server 10, the window "About the GNOME Desktop" showed me Version 2.12.2, Distributor: SUSE, Build Date: 06/27/06. Is there a solution? Steps to reproduce: 1. Login at the desktop 2. Log out 3. Look if there is a (zombie) gnome-pty-helper process 4. Log in via Telnet, ssh or something else 5. run the who command, your new connection is not to be seen Actual results: who brings wrong results, because utmp is no more updateable Expected results: who brings the right results Does this happen every time? mostly Other information: see above
Created attachment 123624 [details] [review] Modify update_utmp to call endutent()/endutxent() Part of the problem appears to be in the definition of update_utmp() for the HAVE_GETUTENT/HAVE_GETTTYENT cases. The defined routine does not properly close the utmp file with endutent()/endutxent(). This can lead to the gnome-pty-helper routine holding a locked reference to the utmp file if the pututline()/pututxline() routines are interrupted at a bad time (we have noticed this happening when users simply close their Exceed sessions without logging out first). The provided patch is a potential simple change to make sure the utmp file is properly closed before update_utmp() returns.
The patch looks good to me; committed to svn trunk. You say 'part of the problem...'; does that mean that even with this patch the original bug still exists?
It appears so. The lack of endutxent() is mostly a robustness change. The fact that the gnome-pty-helper process was not properly dying bugged me, so I did a little more digging today. I need to see if I can replicate this, but there appears to be deadlock potential in the exit_handler() for SIGHUP and SIGTERM. If the code catches a signal while in the pututxline() portion of the update_utmp() routine, it will self deadlock when the exit_handler() calls shutdown_helper() (as it will end up stuck in update_utmp() again waiting for the advisory lock on the utmp file to be released).
I was able to confirm the self-deadlock. I had /opt/gnome/lib/vte/gnome-pty-helper running under strace, and was able to catch it with a SIGHUP while it was in the middle up updating the utmp file: 17440 14:29:38 fcntl64(5, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) = 0 17440 14:29:38 read(5, "\2\0\0\0\0\0\0\0~\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 17440 14:29:38 read(5, "\10\0\0\0h\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 (...) 17440 14:29:52 read(5, "\10\0\0\0\0\0\0\0pts/3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 17440 14:29:52 --- SIGHUP (Hangup) @ 0 (0) --- 17440 14:29:52 gettimeofday({1228170592, 284404}, NULL) = 0 17440 14:29:52 futex(0xb7e8d5b8, FUTEX_WAIT, 2, NULL In this particular scenario, my proposed endutent()/endutxent() modification won't help. Because there is file locking involved, the update_utmp() routine should have protection in place to keep pututline()/pututxline() from being re-entered while the file is locked. Maybe something as simple as the following example: static int update_pending = 0; (...) update_utmp (UTMP *ut) { if ( update_pending ) endutent(); setutent(); update_pending = 1; pututline (ut); endutent(); update_pending = 0; }
The rest of this may actually be bug 488960
The self deadlock doesn't quite match the problem outlined in bug 488960 The deadlock issue I noticed had to do with an interrupt coming in while the utmp file is currently locked. The interrupt handler will end up invoking the routine that tries to lock it again, and the process will deadlock until killed. No other lock honoring processes (like sshd) will be able to update the utmp file until the deadlocked gnome-pty-helper process is killed.
Still reproducible? In any case, g-p-h will hopefully go away soon.
SUSE rolled an update for this problem, so we haven't seen the issue for some time now. g-p-h did appear to go away in SLES11, you would get no complaints from me for calling this a done/dead issue. Thanks.
Obsolete now that g-p-h has been removed.