After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 315846 - GDM hangs after entering password almost every time
GDM hangs after entering password almost every time
Status: RESOLVED FIXED
Product: gdm
Classification: Core
Component: general
2.8.x
Other Linux
: Normal normal
: ---
Assigned To: GDM maintainers
GDM maintainers
Depends on:
Blocks:
 
 
Reported: 2005-09-09 17:19 UTC by Sebastien Bacher
Modified: 2006-05-03 19:03 UTC
See Also:
GNOME target: ---
GNOME version: 2.11/2.12


Attachments
output from "strace /usr/sbin/gdm -nodaemon" (161.31 KB, text/plain)
2005-09-18 15:08 UTC, Stian Jordet
Details
debug output from gdm (26.93 KB, text/plain)
2005-09-18 15:13 UTC, Stian Jordet
Details
strace from gdm without debug enabled (135.81 KB, text/plain)
2005-09-18 19:59 UTC, Stian Jordet
Details

Description Sebastien Bacher 2005-09-09 17:19:43 UTC
This bug has been opened here: https://bugzilla.ubuntu.com/show_bug.cgi?id=14763

"GDM in Breezy is at least 9 out of 10 times hanging after I enter the password.
I know this is a lousy bug report, but I really don't know what to write. It
worked fine with Hoary. I'm using LDAP authentication, but I have removed every
trace of it, and it's stille the same. I have purged gdm and deleted every
conf-file, after reinstallation the problem is the same.

The (weirdest) thing is that when gdm hangs, if I switch to a console, I can't
login, it does the same, just hangs after I write the password. If I'm already
logged in to a console, that console works just fine. This _never_ happens if I
shut down gdm first, and then log in from console.

Please tell me what info I should give you to diagnose this problem.

Best regards,
Stian"
Comment 1 Brian Cameron 2005-09-09 18:07:09 UTC
I've added Stian to the CC: list from the Ubuntu report, since he'll need to
be involved to fix this bug.

This problem sounds like it is going to be tricky to resolve.  I haven't gotten
a mass of similar bug reports so I suspect this is not a widely seen problem.

I suspect the problem is in the way GDM is interacting with PAM and the kernel.
 PAM is the system that all programs that require authentication use in order to
verify passwords.  There really should be no way to get PAM into a state where
other login programs can not use it.  It's also possible that gdm is consuming a 
system resource and causing your system to hang or become insanely slow.

Do you have access to another machine so that you can telnet into your machine
and watch the GDM process as it fails?  It would be useful to see what processes
are running and the resources being consumed (top or ps output).  A stack trace
of the running GDM processes would be useful.  I know you use pstack on Solaris
to get this, I'm sure the command on Linux is different.  I believe the trace
command on Linux can be used to print a report showing system functions called
on a running program.  If it is hanging in a system function, this might also
highlight the code path causing the hang.

If you don't have access to another machine, perhaps you can try starting
"/usr/bin/xterm &" in the /etc/gdm/Init/Default file which will cause an 
xterm to get launched while the login program is running.  Although login is
freezing, hopefully the xterm will still be useable.

I'll warn you that these sorts of problems tend to be difficult to debug 
without digging into the source code a bit.  So it would be helpful if you
are able to recompile the source code for testing.  I will likely need to
ask you to add gdm_debug() statements to the code to help track down where
the problem is happening.

You can also try emailing gdm-list@gnome.org.  There are a lot of people on
that list who know the innards of PAM and the kernel than I, so perhaps you
might get useful suggestions there.
Comment 2 Brian Cameron 2005-09-13 22:33:23 UTC
Another thing you can try is turning enable=true in the "[debug]" section of the
/etc/gdm/gdm.conf file and see if there's any useful information passes to the
system log (/var/log/messages).  Attaching the gdm.conf output to this bugreport
for evaluation would be a good first step. 
Comment 3 Stian Jordet 2005-09-18 15:07:16 UTC
First sorry for being so late replying, and thank you for looking into this.
It's been a busy week.

I'm not sure if the linux commando "strace" is the same as yours "pstack", but I
hope so. Thing is, when I started gdm with "strace /usr/sbin/gdm -nodaemon" it
was almost impossible to get it to hang. I must have tried logging in and out 20
times before it hanged. On the (soon to be) attached output, I was lucky and it
hanged on the second login. Doesn't make me any wiser. I'll attach the output
from /var/log/messages as well (Which neither doesn't seem problematic to me).

Two things; I haven't tested without for ages (but will now), but I have to
start gdm with the "debug=enable" in gdm.conf, else gdm (or the xserver) just
restarts when I try to log in. I know it sounds weird. Before this hang started
to happen, gdm just restarted when I pressed enter at the password prompt. In a
try to find the reason for this, I enabled debugging, and then the restart
disappeared. It's just a couple of months ago (I think) I started to get the hang.

Second; I was 98% sure I had removed every trace of ldap from my system, and
still got the hang. But since several other Ubuntu users have reported this
problem with ldap auth, and noone without, I might very well be mistaken.

Anyway, thanks so far :) I'll be happy to recompile or do whatever you need to
get this fixed. And I'll be faster around this time :)

Thanks!
Comment 4 Stian Jordet 2005-09-18 15:08:18 UTC
Created attachment 52365 [details]
output from "strace /usr/sbin/gdm -nodaemon"
Comment 5 Stian Jordet 2005-09-18 15:13:40 UTC
Created attachment 52366 [details]
debug output from gdm

This is from the same session as the strace-output. I had two logins; the first
sucessful, the second hanged so I had to press reset.
Comment 6 Stian Jordet 2005-09-18 15:23:46 UTC
Ok, I just confirmed that gdm still restarts when I press "enter" after entering
my password without debug enabled. I guess this can have something to do with
each other... I'll try again to remove every trace of ldap from my system, to
see if I can reproduce one or both of these issues without ldap.
Comment 7 Stian Jordet 2005-09-18 16:04:44 UTC
Ok, neither of these problems occur without ldap. Sorry for the confusion. I
must have forgot to reboot the last time I tried to check it.

Btw. everything worked fine with Ubuntu Hoary. And everything works fine from
the console (if GDM hasn't hanged. If I have logged in on the console before gdm
hangs, I can use that console just fine. But I can't log on from another console
(or even remotely).

Thanks.
Comment 8 Stian Jordet 2005-09-18 19:59:36 UTC
Created attachment 52372 [details]
strace from gdm without debug enabled

This is strace of gdm. I logon once, gdm restarts, and then I aborted the
strace. Hope this helps?
Comment 9 Brian Cameron 2005-09-19 23:41:49 UTC
Okay, I'm a little confused.  Are you saying that the problems go away when you
remove LDAP?  If so, then this is probably not a GDM bug and has something to do
with LDAP being broken, I'd think.  Should we close this bug?
Comment 10 Stian Jordet 2005-09-20 12:24:26 UTC
First, sorry for my bad english :(

But I don't think the bug is with ldap, I think ldap is exposing a bug in gdm.
There is never, ever a problem with console logins, and it was never a problem
with older gdm (in Ubuntu Hoary). But if you it's a ldap bug, please explain to
me why.

Best regards,
Stian
Comment 11 Brian Cameron 2005-09-20 21:45:41 UTC
GDM should not be able to mess up PAM, and from your earlier description GDM
hanging was also causing console login to hang.  It's hard to imagine how
GDM could cause this sort of problem.  It makes much more sense for this
type of problem to be caused by PAM misconfiguration, or PAM thinking it 
should be using LDAP and LDAP being broken/misconfigured.  It is very 
possible that your PAM was configured so that GDM would trigger the problem 
and console login would not.

Searching google, I notice other users complaining about ldap/pam 
misconfigurations causing hangs:

   http://albatross.madduck.net/pipermail/debian-unizh/2004-October/000355.html

Looking at this thread it looks like people are looking at PAM configuration
to figure out what the problem is, and not thinking the problem is with
GDM itself.

If this is something you want to pursue, I'd recommend looking for help on
mailing lists focused on PAM issues (mailing lists forcused on configuring
LDAP might also be useful):

   http://www.kernel.org/pub/linux/libs/pam/

I think we should close this bug and reopen it if we discover that the 
problem is really caused by GDM and not PAM or LDAP.  Since you worked out
your problem by removing LDAP completely from your system, I assume you
do not really need to use LDAP and will not be tracking down the problem 
futher.  

Does this sound reasonable?
Comment 12 Stian Jordet 2005-09-20 22:21:20 UTC
Hmm. doesn't seem like they came to an conclusion in that thread. Besides, there
are at least two (maybe three) others that has reported the same bug in Ubuntu.
Don't think we all have misconfigured our ldap setup.

I _do_ really need to use ldap, I was just testing with a newly created local
user to see if ldap was the cause of the problem. But I don't really need gdm,
so it's not that a big deal for me :)

So, go ahead and close it if you want. I just have one question to you: why is
gdm restarting when I press enter at the password prompt with my system setup
with ldap? 100% reproducible. Is that also ldap's fault? Should gdm restart like
that when(/if) pam/ldap is behaving wrong?
Comment 13 Brian Cameron 2005-09-21 03:07:31 UTC
I can leave the bug open for a while and see what happens.  Here are some pointers
to some other threads where people seem to be issues with the libnss-ldap and 
pam-ldap modules triggering problems with GDM.

http://people.redhat.com/alikins/ldap/ldap.html
http://groups.google.com/group/linux.debian.bugs.dist/browse_thread/thread/e67e486a9098ca0a/5069e74a95053809?lnk=st&q=ldap+restarts+gdm&rnum=1&hl=en#5069e74a95053809

I suspect you will find the solution to your problem exploring down this path.
When I say PAM or LDAP is misconfigured, this could include the distro shipping
a combination of modules (ldap/pam/gdm) that just don't work well together.
Comment 14 Greg 2005-10-08 23:25:44 UTC
After upgrading from Hoary to Breezy in Ubuntu, I am also experiencing this
problem.  Everything worked fine in Hoary.

Greg
Comment 15 Magne 2005-11-13 15:50:56 UTC
Same problem here. Ubuntu Breezy. I have confirmed this to be a problem with
GDM, since KDM works just fine. I got a strange error message first time i tried
loggin in (GDM) with a LDAP user, saying; "Cannot set your user group; you will
not be able to log in. Please contact you system administrator". Running 'getent
passwd' and 'getent group' from the commandline gives me all groups and users
listed in the LDAP directory, so there cannot be any misconfiguration in my
setup. In addition, i can onfirm that the exact same setup worked under hoary (GDM).
Comment 16 Brian Cameron 2005-11-15 01:26:13 UTC
This is strange.  The error message you are seeing is defined in 3 files in GDM.
 In verify-pam.c, verify-crypt.c, and verify-shadow.c.  It would be useful to
know which you are using.  I'm assuming PAM.

Looking at the code it only prints this message if the following fails:

    pwent = getpwnam (login);
    if (/* paranoia */ pwent == NULL ||
        ! gdm_setup_gids (login, pwent->pw_gid)) {
            gdm_error (_("Cannot set user group for %s"), login);


So, if you write a simple program that calls getpwnam on the login name and run
it as the gdm user, does it work?

Note that the gdm_setup_gids function will only fail if setgid (gid) returns a
value < 0, or if the initgroups function fails.  However, if either failure
happens, you should also see a syslog message like "Could not setgid %d.
Aborting" or "initgroups() failed for %s. Aborting".  But you are not mentioning
you see either of these errors, so I'm guessing the getpwnam is somehow failing.

Perhaps we can dig into this and find out why.

Comment 17 Brian Cameron 2005-11-15 01:27:54 UTC
oh, sorry, you should run setpwnam as the root user.  The daemon runs as root,
not as the gdm user.  Could you test this, or let me know if you are seeing
other error messages in syslog as well?  Turning on enable=true in the debug
section of the gdm.conf file might also cause useful error message to start
appearing in the syslog.
Comment 18 Stian Jordet 2006-02-12 23:03:58 UTC
For what it's worth, I "fixed" this by changing:

auth        sufficient    pam_unix.so likeauth nullok
auth        sufficient    pam_ldap.so use_first_pass
auth        required    pam_deny.so

into:

auth        sufficient    pam_unix.so likeauth nullok
auth        required    pam_ldap.so use_first_pass

Seems to work mostly the same, except that gdm won't hang :) Since it used to work with gdm, and it always worked with regular console logins - I still want to claim there is a gdm bug somewhere. Anyway, since I was the original reporter; the bug is fixed for me.
Comment 19 Daniel Kahn Gillmor 2006-03-14 23:51:52 UTC
I'm seeing this problem (or a slightly-related one) in version 2.8.0.5-0ubuntu1 (Ubuntu's Breezy Badger release).

i'm not authenticating against LDAP at all: i'm actually authenticating against kerberos.  i'm using libnss-ldap and the standard unix passwd files for a user database, but authentication for the LDAP users uses krb5.  i'm not running nscd at all.  at the text-mode console, login works fine.  kdm works fine.  gdm has a problem.

the authentication step doesn't seem to be the problem, because i'm not even getting to a password prompt.

With libnss-ldap installed, i type in a username in the box for gdm, and hit <tab> or <enter>.  the box goes grey (disabled) and then gdm just hangs.  permanently.  i've left it running for several days like this.

There are a different things that i have found as workarounds.  Consider one of these the following modifications:  

0) if I set Enabled=true in the [debug] section of /etc/gdm/gdm.conf, the login works exactly as expected.

1) if i rebuild gdm and patch it to do nothing in close_all_descriptors() (in daemon/misc.c) the login works exactly as expected.

2) if i remove libnss-ldap from the system, the login works exactly as expected (though of course it fails for LDAP-based accounts because they have no entry in the passwd db anymore)

the fact that (1) works makes me think that gdm is clobbering an LDAP session descriptor in close_all_descriptors, which then of course causes problems for the libnss query...

Given that i need libnss-ldap, option (2) is not really a workaround.  i'm concerned about unknown stability/security implications of option (1).  This leaves me with option (0), which i don't like because it doesn't make sense to me as to why this should make any difference at all.

I'd be happy to test any changes that might get this resolved.  I'd also be curious to hear the rationale behind close_all_descriptors().   It seems to me like a program with as many library hooks as gdm (using nss, pam, etc) can't really responsibly say "i want to close all file descriptors i have open because i won't need them anymore."  What if the background libraries are actually using those descriptors?
Comment 20 Brian Cameron 2006-03-16 02:23:47 UTC
Yes, it sounds like you have tracked down the problem.  Would you be willing to investigate this a bit further?  It would be useful if we could identify which call (or calls) to close_all_descriptors is causing the problem.  It probably is okay to call this function, for example, in server.c and display.c since that code only gets called by the slave process that starts the Xserver.  It also is probably okay to call this function in slave.c since the slave just launches the GUI programs.  Likewise errorgui.c is probably okay since those errorgui's only get shown in the event of a fatal error that causes GDM to not start.  In other words, it's probably okay to call this function in the various slave processes, but not in the daemon itself.

I'm guessing the problem may be that one of the calls in one of the other functions (gdm.c or misc.c) is causing the problem.  If you could identify which one, I'd be more comfortable removing some of the calls to the function than removing the function itself.  Or do you find that any call to this function hozes the system for you?



Comment 21 Daniel Kahn Gillmor 2006-03-16 19:47:09 UTC
Thanks for the prompt response!  I understand your reluctance to yank the function entirely...  i went ahead and tried removing calls to gdm_close_all_descriptors() in different places in the code.  i initially tried your suggestion of removing the calls in misc.c and in gdm.c, but that had no effect.  It turns out that removing the call in daemon/display.c (line 310 in my version here) does the trick!

To be precise:  *only* the invocation of gdm_close_all_descriptors() in daemon/display.c was disabled (i re-enabled the other invocations i had experimented with), and gdm_close_all_descriptors() (the function itself) was left untouched.  With this single modification, i'm no longer seeing the misbehavior i reported on Wednesday. 

i still don't understand enough of the architecture of gdm (and where it invokes the NSS) to know why that particular call is the troublesome one, but it works for me now.  Let me know if i can help further diagnose anything.
Comment 22 Stian Jordet 2006-03-17 22:30:39 UTC
Since this is "my" bug, I guess it's time for me to comment on it again :) I'm not able to reproduce this with gdm 2.14.0. I'll try some more, but it seems that whatever was causing me problems is gone.
Comment 23 Daniel Kahn Gillmor 2006-05-03 18:48:30 UTC
A followup: I am also not seeing this misbehavior any more in gdm 2.14.4-0ubuntu3, though it's not clear to me whether this is due to the ubuntu packaging or whether gnome 2.14.4 has resolved the issue.
Comment 24 Brian Cameron 2006-05-03 19:03:00 UTC
okay, closing this bug.