After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 465036 - gnome-pty-helper locks /var/run/utmp
gnome-pty-helper locks /var/run/utmp
Status: RESOLVED OBSOLETE
Product: vte
Classification: Core
Component: general
0.12.x
Other All
: Normal major
: ---
Assigned To: VTE Maintainers
VTE Maintainers
Depends on:
Blocks:
 
 
Reported: 2007-08-09 13:17 UTC by Detlef Gösmann
Modified: 2015-05-09 17:29 UTC
See Also:
GNOME target: ---
GNOME version: 2.11/2.12


Attachments
Modify update_utmp to call endutent()/endutxent() (534 bytes, patch)
2008-11-28 18:57 UTC, Kevin W. Rudd
committed Details | Review

Description Detlef Gösmann 2007-08-09 13:17:56 UTC
Please describe the problem:
After logging out from the console under SUSE Enterprise Server 10 there stays an gnome-pty-helper process who locks the utmp, so that no other record can made in. For instance the unix command "who" starts to give wrong results, because utmp can't be updated anymore. This bug is known in the web, see also
http://readlist.com/lists/lists.debian.org/debian-user/16/84731.html

After killing the process gnome-pty-helper with kill -9, utmp ist updateable again and "who" starts to give right results again.

I use SUSE Enterprise Server 10, the window "About the GNOME Desktop" showed me Version 2.12.2, Distributor: SUSE, Build Date: 06/27/06. Is there a solution?

Steps to reproduce:
1. Login at the desktop
2. Log out
3. Look if there is a (zombie) gnome-pty-helper process
4. Log in via Telnet, ssh or something else
5. run the who command, your new connection is not to be seen


Actual results:
who brings wrong results, because utmp is no more updateable

Expected results:
who brings the right results

Does this happen every time?
mostly

Other information:
see above
Comment 1 Kevin W. Rudd 2008-11-28 18:57:31 UTC
Created attachment 123624 [details] [review]
Modify update_utmp to call endutent()/endutxent()

Part of the problem appears to be in the definition of update_utmp() for the HAVE_GETUTENT/HAVE_GETTTYENT cases.  The defined routine does not properly close the utmp file with endutent()/endutxent().  This can lead to the gnome-pty-helper routine holding a locked reference to the utmp file if the pututline()/pututxline() routines are interrupted at a bad time (we have noticed this happening when users simply close their Exceed sessions without logging out first).  The provided patch is a potential simple change to make sure the utmp file is properly closed before update_utmp() returns.
Comment 2 Christian Persch 2008-11-29 23:52:31 UTC
The patch looks good to me; committed to svn trunk.

You say 'part of the problem...'; does that mean that even with this patch the original bug still exists?
Comment 3 Kevin W. Rudd 2008-12-01 21:07:10 UTC
It appears so.  The lack of endutxent() is mostly a robustness change.  The fact that the gnome-pty-helper process was not properly dying bugged me, so I did a little more digging today.  I need to see if I can replicate this, but there appears to be deadlock potential in the exit_handler() for SIGHUP and SIGTERM.  If the code catches a signal while in the pututxline() portion of the update_utmp() routine, it will self deadlock when the exit_handler() calls shutdown_helper() (as it will end up stuck in update_utmp() again waiting for the advisory lock on the utmp file to be released).
Comment 4 Kevin W. Rudd 2008-12-01 23:23:27 UTC
I was able to confirm the self-deadlock.  I had /opt/gnome/lib/vte/gnome-pty-helper running under strace, and was able to catch it with a SIGHUP while it was in the middle up updating the utmp file:

17440 14:29:38 fcntl64(5, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) = 0
17440 14:29:38 read(5, "\2\0\0\0\0\0\0\0~\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384
17440 14:29:38 read(5, "\10\0\0\0h\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384
 (...)
17440 14:29:52 read(5, "\10\0\0\0\0\0\0\0pts/3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384
17440 14:29:52 --- SIGHUP (Hangup) @ 0 (0) ---
17440 14:29:52 gettimeofday({1228170592, 284404}, NULL) = 0
17440 14:29:52 futex(0xb7e8d5b8, FUTEX_WAIT, 2, NULL

In this particular scenario, my proposed endutent()/endutxent() modification won't help.  Because there is file locking involved, the update_utmp() routine should have protection in place to keep pututline()/pututxline() from being re-entered while the file is locked.  Maybe something as simple as the following example:

static int update_pending = 0;
(...)
update_utmp (UTMP *ut)
{
    if ( update_pending )
        endutent();
    setutent();
    update_pending = 1;
    pututline (ut);
    endutent();
    update_pending = 0;
}
Comment 5 JP Rosevear 2009-01-19 16:45:40 UTC
The rest of this may actually be bug 488960
Comment 6 Kevin W. Rudd 2009-01-19 18:14:14 UTC
The self deadlock doesn't quite match the problem outlined in bug 488960 
The deadlock issue I noticed had to do with an interrupt coming in while the utmp file is currently locked.  The interrupt handler will end up invoking the routine that tries to lock it again, and the process will deadlock until killed.  No other lock honoring processes (like sshd) will be able to update the utmp file until the deadlocked gnome-pty-helper process is killed.
Comment 7 Christian Persch 2014-04-27 07:44:00 UTC
Still reproducible? In any case, g-p-h will hopefully go away soon.
Comment 8 Kevin W. Rudd 2014-04-28 23:04:08 UTC
SUSE rolled an update for this problem, so we haven't seen the issue for some time now.  g-p-h did appear to go away in SLES11, you would get no complaints from me for calling this a done/dead issue.  Thanks.
Comment 9 Christian Persch 2015-05-09 17:29:15 UTC
Obsolete now that g-p-h has been removed.