After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 658481 - gdm 3.1.90: fallback gdm greeter session not properly terminated and blocking login
gdm 3.1.90: fallback gdm greeter session not properly terminated and blocking...
Status: RESOLVED FIXED
Product: gdm
Classification: Core
Component: general
3.1.x
Other Linux
: Normal normal
: ---
Assigned To: GDM maintainers
GDM maintainers
: 658243 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-09-07 15:59 UTC by Vincent Untz
Modified: 2011-09-09 20:18 UTC
See Also:
GNOME target: 3.2
GNOME version: ---


Attachments
daemon: stop greeter session not greeter worker (10.91 KB, patch)
2011-09-09 19:00 UTC, Ray Strode [halfline]
committed Details | Review
daemon: don't forcible kill pam after 3 seconds (14.21 KB, patch)
2011-09-09 19:00 UTC, Ray Strode [halfline]
committed Details | Review

Description Vincent Untz 2011-09-07 15:59:58 UTC
When using the fallback gdm greeter session, the login doesn't really work well as it seems metacity from the gdm greeter session is still running (and therefore blocking gnome-shell).

So I end up with the fail whale dialog. I'm unsure why this happens, though.
Comment 1 Ray Strode [halfline] 2011-09-07 16:26:24 UTC
ioni reported that too, I'm guessing it's because of this hack in the pam code:

http://git.gnome.org/browse/gdm/tree/daemon/gdm-session-worker-job.c#n292

but i haven't had a chance to reproduce yet to be sure.  if that is the problem changing the number from 3 to say 45 would confirm that's the cause of the problem.

I think the right thing to do there is not have a timeout and instead let the child finish asynchronously, potentially blocking later on in the process if it doesn't finish.
Comment 2 Vincent Untz 2011-09-07 22:32:09 UTC
Okay, it's a bit late, so I'm not sure I'm right, but here's something that might be happening:

 - gdm starts gnome-session for the greeter session
 - gnome-session sees there's no dbus running, so exec "dbus-launch --exit-with-session gnome-session..."
 - user logs in
 - gdm sends a TERM signal to what it believe is gnome-session, but it's actually dbus
 - dbus forwards the TERM signal to gnome-session, and exits
 - gnome-session tries to contact a few clients that are connected via dbus, and fails miserably because it didn't notice the bus is gone
 - gnome-session waits for the timeout
 - gnome-session doesn't kill metacity in the meantime (the only non-dbus client in the session...)
 - gnome-shell from the user is not happy
Comment 3 Ray Strode [halfline] 2011-09-08 14:00:28 UTC
gdm does create a dbus-daemon for the greeter session before starting gnome-session (or it's supposed to anyway)
Comment 4 Vincent Untz 2011-09-08 21:41:42 UTC
Hrm, indeed. FWIW, I think I tried changing the timeout mentioned in comment 1, but that wasn't the issue.

For some reason, we don't wait on the right child process, I guess, as dbus is clearly killed while gnome-session (from the greeter session) is still running.

Here's some bits from the gnome-session debug log I kept:

Sep  7 18:06:14: DEBUG(+): Got callback for signal 15
[...]
Sep  7 18:06:14: DEBUG(+): GsmAutostartApp: (pid:1952) done (status:1)
Sep  7 18:06:14: DEBUG(+): GsmAutostartApp: (pid:1920) done (status:0)
Sep  7 18:06:15: DEBUG(+): GsmManager: query end session timed out

I suspect that the two GsmAutostartApp quitting here, at the same time as we receive SIGTERM, are not using xsmp but dbus to talk to gnome-session, and just leave when the bus disappears (and that's why they don't reply to gnome-session). That would indicate that dbus is indeed stopped just after SIGTERM is sent.
Comment 5 Ray Strode [halfline] 2011-09-09 03:13:01 UTC
So gdm_welcome_session_stop now does:

if (welcome_session->priv->session != NULL) {
        gdm_session_stop_conversation (GDM_SESSION (welcome_session->priv->session),
                                       "gdm-welcome");
        gdm_session_close (GDM_SESSION (welcome_session->priv->session));

        g_object_unref (welcome_session->priv->session);
        welcome_session->priv->session = NULL;
}

stop_dbus_daemon (welcome_session);

So if the gdm_session_stop_conversation doesn't block until gnome-session finishes then we'll potentially kill the dbus-daemon early.  That function just calls gdm_session_worker_job_stop which does:

res = gdm_signal_pid (session_worker_job->priv->pid, SIGTERM); 
gdm_wait_on_and_kill_pid (session_worker_job->priv->pid, 3);

Presumably the "Got callback for signal 15" from your log is the first line, and the second line is supposed to be the blocking (which I theorized was potentially insufficient with its 3 second timeout)

But you're saying raising the 3 second timeout isn't helping so there must be something more going on.  maybe DBUS_SESSION_BUS_ADDRESS isn't getting forwarded to the child so gnome-session is starting it's own on top of the started one?  

We should get rid of the the timeout regardless I guess. it's just wrong.
i'll try to reproduce tomorrow.  I wanted to get to it today but got side tracked by some shell stuff.
Comment 6 Ray Strode [halfline] 2011-09-09 05:24:35 UTC
oh, thinking about it more, what's going on is we're killing the worker managing the greeter session and not the greeter session itself. The worker then just exits its main loop and dies a pretty fast exit.  We need to do one of

1) make the session worker propagate SIGTERM to its child pid when getting SIGTERM, and wait for the child.  This way gdm_session_worker_job_stop will magically do the right thing.
2) add a new StopProgram call to bracket StartProgram and do that instead of gdm_session_worker_job_stop
3) kill the child pid emitted at SessionStarted in gdm_welcome_session_stop instead of instead of gdm_session_worker_job_stop

3 is the simplest so I'll probably go for that tomorrow after trying to reproduce.  I also sketched out a patch to get rid of the 3 second timeout in the common path tonight.  I'll test that tomorrow too.
Comment 7 Ray Strode [halfline] 2011-09-09 05:36:46 UTC
*** Bug 658243 has been marked as a duplicate of this bug. ***
Comment 8 Vincent Untz 2011-09-09 07:58:45 UTC
I just pushed fixes to gnome-session that makes this work again. GDM is still wrong (likely because of what Ray is saying in comment 6), but at least, gnome-session realizes that the dbus clients are gone, and so don't timeout on them. This means I can log in again now :-)
Comment 9 Matthias Clasen 2011-09-09 12:54:00 UTC
I can confirm that vuntz' gnome-session patches fix login. I've tested

gdm -> gnome
gdm-fallback -> gnome
gdm-fallback -> gnome-fallback
Comment 10 Ray Strode [halfline] 2011-09-09 13:26:03 UTC
Okay i'll be careful not to update gnome-session until I finish fixing things on my end :-)
Comment 11 Ray Strode [halfline] 2011-09-09 19:00:24 UTC
The following fixes have been pushed:
6849f6f daemon: stop greeter session not greeter worker
d5aa5a1 daemon: don't forcible kill pam after 3 seconds
Comment 12 Ray Strode [halfline] 2011-09-09 19:00:28 UTC
Created attachment 196139 [details] [review]
daemon: stop greeter session not greeter worker

Since changing the greeter to run in its own PAM
session, we've started killing the worker process
instead of the session process right before the
user's session is started.

This is wrong, because it means we don't give the
greeter session time to kill itself in an orderly
way before we rip away its session bus.

This commit changes the welcome session stop function
to kill the pid of the welcome session instead of
the pid of its worker parent. This change requires
reinstrumenting all callers of gdm_welcome_session_stop
to not free the session until the "stopped" signal
following gdm_welcome_session_stop.
Comment 13 Ray Strode [halfline] 2011-09-09 19:00:31 UTC
Created attachment 196140 [details] [review]
daemon: don't forcible kill pam after 3 seconds

Right now when stopping a conversation we give it 3 seconds
to die and then kill kill kill.

This commit changes the killing to be asynchronous and not
time out until absolutely necessary.
Comment 14 Ray Strode [halfline] 2011-09-09 19:42:03 UTC
we also probably need a way to tell gnome-session not to show the logout button on the fail whale.

we initially tried to fix that yesterday by locking down logout, but that has deeper consequences that I ran into when doing the above two fixes (we now depend on logout working since we wait for it to finish, so we can't lock it down...)
Comment 15 Vincent Untz 2011-09-09 20:18:28 UTC
(In reply to comment #14)
> we also probably need a way to tell gnome-session not to show the logout button
> on the fail whale.
> 
> we initially tried to fix that yesterday by locking down logout, but that has
> deeper consequences that I ran into when doing the above two fixes (we now
> depend on logout working since we wait for it to finish, so we can't lock it
> down...)

That should be easy. We already check if we're running in gdm in several places (look for GSM_CONSOLEKIT_SESSION_TYPE_LOGIN_WINDOW in the code). I'm leaving for the openSUSE conference, so won't have time to do it right now, but I can do that later...