GNOME Bugzilla – Bug 658481
gdm 3.1.90: fallback gdm greeter session not properly terminated and blocking login
Last modified: 2011-09-09 20:18:28 UTC
When using the fallback gdm greeter session, the login doesn't really work well as it seems metacity from the gdm greeter session is still running (and therefore blocking gnome-shell). So I end up with the fail whale dialog. I'm unsure why this happens, though.
ioni reported that too, I'm guessing it's because of this hack in the pam code: http://git.gnome.org/browse/gdm/tree/daemon/gdm-session-worker-job.c#n292 but i haven't had a chance to reproduce yet to be sure. if that is the problem changing the number from 3 to say 45 would confirm that's the cause of the problem. I think the right thing to do there is not have a timeout and instead let the child finish asynchronously, potentially blocking later on in the process if it doesn't finish.
Okay, it's a bit late, so I'm not sure I'm right, but here's something that might be happening: - gdm starts gnome-session for the greeter session - gnome-session sees there's no dbus running, so exec "dbus-launch --exit-with-session gnome-session..." - user logs in - gdm sends a TERM signal to what it believe is gnome-session, but it's actually dbus - dbus forwards the TERM signal to gnome-session, and exits - gnome-session tries to contact a few clients that are connected via dbus, and fails miserably because it didn't notice the bus is gone - gnome-session waits for the timeout - gnome-session doesn't kill metacity in the meantime (the only non-dbus client in the session...) - gnome-shell from the user is not happy
gdm does create a dbus-daemon for the greeter session before starting gnome-session (or it's supposed to anyway)
Hrm, indeed. FWIW, I think I tried changing the timeout mentioned in comment 1, but that wasn't the issue. For some reason, we don't wait on the right child process, I guess, as dbus is clearly killed while gnome-session (from the greeter session) is still running. Here's some bits from the gnome-session debug log I kept: Sep 7 18:06:14: DEBUG(+): Got callback for signal 15 [...] Sep 7 18:06:14: DEBUG(+): GsmAutostartApp: (pid:1952) done (status:1) Sep 7 18:06:14: DEBUG(+): GsmAutostartApp: (pid:1920) done (status:0) Sep 7 18:06:15: DEBUG(+): GsmManager: query end session timed out I suspect that the two GsmAutostartApp quitting here, at the same time as we receive SIGTERM, are not using xsmp but dbus to talk to gnome-session, and just leave when the bus disappears (and that's why they don't reply to gnome-session). That would indicate that dbus is indeed stopped just after SIGTERM is sent.
So gdm_welcome_session_stop now does: if (welcome_session->priv->session != NULL) { gdm_session_stop_conversation (GDM_SESSION (welcome_session->priv->session), "gdm-welcome"); gdm_session_close (GDM_SESSION (welcome_session->priv->session)); g_object_unref (welcome_session->priv->session); welcome_session->priv->session = NULL; } stop_dbus_daemon (welcome_session); So if the gdm_session_stop_conversation doesn't block until gnome-session finishes then we'll potentially kill the dbus-daemon early. That function just calls gdm_session_worker_job_stop which does: res = gdm_signal_pid (session_worker_job->priv->pid, SIGTERM); gdm_wait_on_and_kill_pid (session_worker_job->priv->pid, 3); Presumably the "Got callback for signal 15" from your log is the first line, and the second line is supposed to be the blocking (which I theorized was potentially insufficient with its 3 second timeout) But you're saying raising the 3 second timeout isn't helping so there must be something more going on. maybe DBUS_SESSION_BUS_ADDRESS isn't getting forwarded to the child so gnome-session is starting it's own on top of the started one? We should get rid of the the timeout regardless I guess. it's just wrong. i'll try to reproduce tomorrow. I wanted to get to it today but got side tracked by some shell stuff.
oh, thinking about it more, what's going on is we're killing the worker managing the greeter session and not the greeter session itself. The worker then just exits its main loop and dies a pretty fast exit. We need to do one of 1) make the session worker propagate SIGTERM to its child pid when getting SIGTERM, and wait for the child. This way gdm_session_worker_job_stop will magically do the right thing. 2) add a new StopProgram call to bracket StartProgram and do that instead of gdm_session_worker_job_stop 3) kill the child pid emitted at SessionStarted in gdm_welcome_session_stop instead of instead of gdm_session_worker_job_stop 3 is the simplest so I'll probably go for that tomorrow after trying to reproduce. I also sketched out a patch to get rid of the 3 second timeout in the common path tonight. I'll test that tomorrow too.
*** Bug 658243 has been marked as a duplicate of this bug. ***
I just pushed fixes to gnome-session that makes this work again. GDM is still wrong (likely because of what Ray is saying in comment 6), but at least, gnome-session realizes that the dbus clients are gone, and so don't timeout on them. This means I can log in again now :-)
I can confirm that vuntz' gnome-session patches fix login. I've tested gdm -> gnome gdm-fallback -> gnome gdm-fallback -> gnome-fallback
Okay i'll be careful not to update gnome-session until I finish fixing things on my end :-)
The following fixes have been pushed: 6849f6f daemon: stop greeter session not greeter worker d5aa5a1 daemon: don't forcible kill pam after 3 seconds
Created attachment 196139 [details] [review] daemon: stop greeter session not greeter worker Since changing the greeter to run in its own PAM session, we've started killing the worker process instead of the session process right before the user's session is started. This is wrong, because it means we don't give the greeter session time to kill itself in an orderly way before we rip away its session bus. This commit changes the welcome session stop function to kill the pid of the welcome session instead of the pid of its worker parent. This change requires reinstrumenting all callers of gdm_welcome_session_stop to not free the session until the "stopped" signal following gdm_welcome_session_stop.
Created attachment 196140 [details] [review] daemon: don't forcible kill pam after 3 seconds Right now when stopping a conversation we give it 3 seconds to die and then kill kill kill. This commit changes the killing to be asynchronous and not time out until absolutely necessary.
we also probably need a way to tell gnome-session not to show the logout button on the fail whale. we initially tried to fix that yesterday by locking down logout, but that has deeper consequences that I ran into when doing the above two fixes (we now depend on logout working since we wait for it to finish, so we can't lock it down...)
(In reply to comment #14) > we also probably need a way to tell gnome-session not to show the logout button > on the fail whale. > > we initially tried to fix that yesterday by locking down logout, but that has > deeper consequences that I ran into when doing the above two fixes (we now > depend on logout working since we wait for it to finish, so we can't lock it > down...) That should be easy. We already check if we're running in gdm in several places (look for GSM_CONSOLEKIT_SESSION_TYPE_LOGIN_WINDOW in the code). I'm leaving for the openSUSE conference, so won't have time to do it right now, but I can do that later...