GNOME Bugzilla – Bug 315846
GDM hangs after entering password almost every time
Last modified: 2006-05-03 19:03:00 UTC
This bug has been opened here: https://bugzilla.ubuntu.com/show_bug.cgi?id=14763 "GDM in Breezy is at least 9 out of 10 times hanging after I enter the password. I know this is a lousy bug report, but I really don't know what to write. It worked fine with Hoary. I'm using LDAP authentication, but I have removed every trace of it, and it's stille the same. I have purged gdm and deleted every conf-file, after reinstallation the problem is the same. The (weirdest) thing is that when gdm hangs, if I switch to a console, I can't login, it does the same, just hangs after I write the password. If I'm already logged in to a console, that console works just fine. This _never_ happens if I shut down gdm first, and then log in from console. Please tell me what info I should give you to diagnose this problem. Best regards, Stian"
I've added Stian to the CC: list from the Ubuntu report, since he'll need to be involved to fix this bug. This problem sounds like it is going to be tricky to resolve. I haven't gotten a mass of similar bug reports so I suspect this is not a widely seen problem. I suspect the problem is in the way GDM is interacting with PAM and the kernel. PAM is the system that all programs that require authentication use in order to verify passwords. There really should be no way to get PAM into a state where other login programs can not use it. It's also possible that gdm is consuming a system resource and causing your system to hang or become insanely slow. Do you have access to another machine so that you can telnet into your machine and watch the GDM process as it fails? It would be useful to see what processes are running and the resources being consumed (top or ps output). A stack trace of the running GDM processes would be useful. I know you use pstack on Solaris to get this, I'm sure the command on Linux is different. I believe the trace command on Linux can be used to print a report showing system functions called on a running program. If it is hanging in a system function, this might also highlight the code path causing the hang. If you don't have access to another machine, perhaps you can try starting "/usr/bin/xterm &" in the /etc/gdm/Init/Default file which will cause an xterm to get launched while the login program is running. Although login is freezing, hopefully the xterm will still be useable. I'll warn you that these sorts of problems tend to be difficult to debug without digging into the source code a bit. So it would be helpful if you are able to recompile the source code for testing. I will likely need to ask you to add gdm_debug() statements to the code to help track down where the problem is happening. You can also try emailing gdm-list@gnome.org. There are a lot of people on that list who know the innards of PAM and the kernel than I, so perhaps you might get useful suggestions there.
Another thing you can try is turning enable=true in the "[debug]" section of the /etc/gdm/gdm.conf file and see if there's any useful information passes to the system log (/var/log/messages). Attaching the gdm.conf output to this bugreport for evaluation would be a good first step.
First sorry for being so late replying, and thank you for looking into this. It's been a busy week. I'm not sure if the linux commando "strace" is the same as yours "pstack", but I hope so. Thing is, when I started gdm with "strace /usr/sbin/gdm -nodaemon" it was almost impossible to get it to hang. I must have tried logging in and out 20 times before it hanged. On the (soon to be) attached output, I was lucky and it hanged on the second login. Doesn't make me any wiser. I'll attach the output from /var/log/messages as well (Which neither doesn't seem problematic to me). Two things; I haven't tested without for ages (but will now), but I have to start gdm with the "debug=enable" in gdm.conf, else gdm (or the xserver) just restarts when I try to log in. I know it sounds weird. Before this hang started to happen, gdm just restarted when I pressed enter at the password prompt. In a try to find the reason for this, I enabled debugging, and then the restart disappeared. It's just a couple of months ago (I think) I started to get the hang. Second; I was 98% sure I had removed every trace of ldap from my system, and still got the hang. But since several other Ubuntu users have reported this problem with ldap auth, and noone without, I might very well be mistaken. Anyway, thanks so far :) I'll be happy to recompile or do whatever you need to get this fixed. And I'll be faster around this time :) Thanks!
Created attachment 52365 [details] output from "strace /usr/sbin/gdm -nodaemon"
Created attachment 52366 [details] debug output from gdm This is from the same session as the strace-output. I had two logins; the first sucessful, the second hanged so I had to press reset.
Ok, I just confirmed that gdm still restarts when I press "enter" after entering my password without debug enabled. I guess this can have something to do with each other... I'll try again to remove every trace of ldap from my system, to see if I can reproduce one or both of these issues without ldap.
Ok, neither of these problems occur without ldap. Sorry for the confusion. I must have forgot to reboot the last time I tried to check it. Btw. everything worked fine with Ubuntu Hoary. And everything works fine from the console (if GDM hasn't hanged. If I have logged in on the console before gdm hangs, I can use that console just fine. But I can't log on from another console (or even remotely). Thanks.
Created attachment 52372 [details] strace from gdm without debug enabled This is strace of gdm. I logon once, gdm restarts, and then I aborted the strace. Hope this helps?
Okay, I'm a little confused. Are you saying that the problems go away when you remove LDAP? If so, then this is probably not a GDM bug and has something to do with LDAP being broken, I'd think. Should we close this bug?
First, sorry for my bad english :( But I don't think the bug is with ldap, I think ldap is exposing a bug in gdm. There is never, ever a problem with console logins, and it was never a problem with older gdm (in Ubuntu Hoary). But if you it's a ldap bug, please explain to me why. Best regards, Stian
GDM should not be able to mess up PAM, and from your earlier description GDM hanging was also causing console login to hang. It's hard to imagine how GDM could cause this sort of problem. It makes much more sense for this type of problem to be caused by PAM misconfiguration, or PAM thinking it should be using LDAP and LDAP being broken/misconfigured. It is very possible that your PAM was configured so that GDM would trigger the problem and console login would not. Searching google, I notice other users complaining about ldap/pam misconfigurations causing hangs: http://albatross.madduck.net/pipermail/debian-unizh/2004-October/000355.html Looking at this thread it looks like people are looking at PAM configuration to figure out what the problem is, and not thinking the problem is with GDM itself. If this is something you want to pursue, I'd recommend looking for help on mailing lists focused on PAM issues (mailing lists forcused on configuring LDAP might also be useful): http://www.kernel.org/pub/linux/libs/pam/ I think we should close this bug and reopen it if we discover that the problem is really caused by GDM and not PAM or LDAP. Since you worked out your problem by removing LDAP completely from your system, I assume you do not really need to use LDAP and will not be tracking down the problem futher. Does this sound reasonable?
Hmm. doesn't seem like they came to an conclusion in that thread. Besides, there are at least two (maybe three) others that has reported the same bug in Ubuntu. Don't think we all have misconfigured our ldap setup. I _do_ really need to use ldap, I was just testing with a newly created local user to see if ldap was the cause of the problem. But I don't really need gdm, so it's not that a big deal for me :) So, go ahead and close it if you want. I just have one question to you: why is gdm restarting when I press enter at the password prompt with my system setup with ldap? 100% reproducible. Is that also ldap's fault? Should gdm restart like that when(/if) pam/ldap is behaving wrong?
I can leave the bug open for a while and see what happens. Here are some pointers to some other threads where people seem to be issues with the libnss-ldap and pam-ldap modules triggering problems with GDM. http://people.redhat.com/alikins/ldap/ldap.html http://groups.google.com/group/linux.debian.bugs.dist/browse_thread/thread/e67e486a9098ca0a/5069e74a95053809?lnk=st&q=ldap+restarts+gdm&rnum=1&hl=en#5069e74a95053809 I suspect you will find the solution to your problem exploring down this path. When I say PAM or LDAP is misconfigured, this could include the distro shipping a combination of modules (ldap/pam/gdm) that just don't work well together.
After upgrading from Hoary to Breezy in Ubuntu, I am also experiencing this problem. Everything worked fine in Hoary. Greg
Same problem here. Ubuntu Breezy. I have confirmed this to be a problem with GDM, since KDM works just fine. I got a strange error message first time i tried loggin in (GDM) with a LDAP user, saying; "Cannot set your user group; you will not be able to log in. Please contact you system administrator". Running 'getent passwd' and 'getent group' from the commandline gives me all groups and users listed in the LDAP directory, so there cannot be any misconfiguration in my setup. In addition, i can onfirm that the exact same setup worked under hoary (GDM).
This is strange. The error message you are seeing is defined in 3 files in GDM. In verify-pam.c, verify-crypt.c, and verify-shadow.c. It would be useful to know which you are using. I'm assuming PAM. Looking at the code it only prints this message if the following fails: pwent = getpwnam (login); if (/* paranoia */ pwent == NULL || ! gdm_setup_gids (login, pwent->pw_gid)) { gdm_error (_("Cannot set user group for %s"), login); So, if you write a simple program that calls getpwnam on the login name and run it as the gdm user, does it work? Note that the gdm_setup_gids function will only fail if setgid (gid) returns a value < 0, or if the initgroups function fails. However, if either failure happens, you should also see a syslog message like "Could not setgid %d. Aborting" or "initgroups() failed for %s. Aborting". But you are not mentioning you see either of these errors, so I'm guessing the getpwnam is somehow failing. Perhaps we can dig into this and find out why.
oh, sorry, you should run setpwnam as the root user. The daemon runs as root, not as the gdm user. Could you test this, or let me know if you are seeing other error messages in syslog as well? Turning on enable=true in the debug section of the gdm.conf file might also cause useful error message to start appearing in the syslog.
For what it's worth, I "fixed" this by changing: auth sufficient pam_unix.so likeauth nullok auth sufficient pam_ldap.so use_first_pass auth required pam_deny.so into: auth sufficient pam_unix.so likeauth nullok auth required pam_ldap.so use_first_pass Seems to work mostly the same, except that gdm won't hang :) Since it used to work with gdm, and it always worked with regular console logins - I still want to claim there is a gdm bug somewhere. Anyway, since I was the original reporter; the bug is fixed for me.
I'm seeing this problem (or a slightly-related one) in version 2.8.0.5-0ubuntu1 (Ubuntu's Breezy Badger release). i'm not authenticating against LDAP at all: i'm actually authenticating against kerberos. i'm using libnss-ldap and the standard unix passwd files for a user database, but authentication for the LDAP users uses krb5. i'm not running nscd at all. at the text-mode console, login works fine. kdm works fine. gdm has a problem. the authentication step doesn't seem to be the problem, because i'm not even getting to a password prompt. With libnss-ldap installed, i type in a username in the box for gdm, and hit <tab> or <enter>. the box goes grey (disabled) and then gdm just hangs. permanently. i've left it running for several days like this. There are a different things that i have found as workarounds. Consider one of these the following modifications: 0) if I set Enabled=true in the [debug] section of /etc/gdm/gdm.conf, the login works exactly as expected. 1) if i rebuild gdm and patch it to do nothing in close_all_descriptors() (in daemon/misc.c) the login works exactly as expected. 2) if i remove libnss-ldap from the system, the login works exactly as expected (though of course it fails for LDAP-based accounts because they have no entry in the passwd db anymore) the fact that (1) works makes me think that gdm is clobbering an LDAP session descriptor in close_all_descriptors, which then of course causes problems for the libnss query... Given that i need libnss-ldap, option (2) is not really a workaround. i'm concerned about unknown stability/security implications of option (1). This leaves me with option (0), which i don't like because it doesn't make sense to me as to why this should make any difference at all. I'd be happy to test any changes that might get this resolved. I'd also be curious to hear the rationale behind close_all_descriptors(). It seems to me like a program with as many library hooks as gdm (using nss, pam, etc) can't really responsibly say "i want to close all file descriptors i have open because i won't need them anymore." What if the background libraries are actually using those descriptors?
Yes, it sounds like you have tracked down the problem. Would you be willing to investigate this a bit further? It would be useful if we could identify which call (or calls) to close_all_descriptors is causing the problem. It probably is okay to call this function, for example, in server.c and display.c since that code only gets called by the slave process that starts the Xserver. It also is probably okay to call this function in slave.c since the slave just launches the GUI programs. Likewise errorgui.c is probably okay since those errorgui's only get shown in the event of a fatal error that causes GDM to not start. In other words, it's probably okay to call this function in the various slave processes, but not in the daemon itself. I'm guessing the problem may be that one of the calls in one of the other functions (gdm.c or misc.c) is causing the problem. If you could identify which one, I'd be more comfortable removing some of the calls to the function than removing the function itself. Or do you find that any call to this function hozes the system for you?
Thanks for the prompt response! I understand your reluctance to yank the function entirely... i went ahead and tried removing calls to gdm_close_all_descriptors() in different places in the code. i initially tried your suggestion of removing the calls in misc.c and in gdm.c, but that had no effect. It turns out that removing the call in daemon/display.c (line 310 in my version here) does the trick! To be precise: *only* the invocation of gdm_close_all_descriptors() in daemon/display.c was disabled (i re-enabled the other invocations i had experimented with), and gdm_close_all_descriptors() (the function itself) was left untouched. With this single modification, i'm no longer seeing the misbehavior i reported on Wednesday. i still don't understand enough of the architecture of gdm (and where it invokes the NSS) to know why that particular call is the troublesome one, but it works for me now. Let me know if i can help further diagnose anything.
Since this is "my" bug, I guess it's time for me to comment on it again :) I'm not able to reproduce this with gdm 2.14.0. I'll try some more, but it seems that whatever was causing me problems is gone.
A followup: I am also not seeing this misbehavior any more in gdm 2.14.4-0ubuntu3, though it's not clear to me whether this is due to the ubuntu packaging or whether gnome 2.14.4 has resolved the issue.
okay, closing this bug.