Bug 700901 – Unusable shell after switching user

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 700901 - Unusable shell after switching user


Summary:	Unusable shell after switching user


Status:	RESOLVED FIXED

Product:	gnome-shell
Classification:	Core
Component:	lock-screen
Version:	3.8.x
Hardware:	Other Linux

Importance:	Normal critical
Target Milestone:	---
Assigned To:	gnome-shell-maint
QA Contact:	gnome-shell-maint

URL:
Whiteboard:

Duplicates:	700839 (view as bug list)
Depends on:
Blocks:

Reported:	2013-05-23 15:16 UTC by Jean-Benoit MARTIN
Modified:	2013-08-06 14:11 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
set screnshield to false in lock dialog (920 bytes, patch) 2013-05-23 15:43 UTC, Jean-Benoit MARTIN	none	Details \| Review
screenShield: prevent blocking the UI at inconvenient time (3.96 KB, patch) 2013-05-28 10:30 UTC, Lionel Landwerlin	none	Details \| Review
ScreenShield: remove curtain animation when hiding without animation (1.37 KB, patch) 2013-06-03 21:49 UTC, Giovanni Campagna	committed	Details \| Review
ScreenShield: when the user goes idle, check for active before pushing a modal (2.76 KB, patch) 2013-06-04 17:00 UTC, Giovanni Campagna	committed	Details \| Review

Description Jean-Benoit MARTIN 2013-05-23 15:16:27 UTC

The focus is lost after lock the screen and select log in as another user

open a terminal
lock the computer
Select Login as a different user
login as the same user

the focus on the terminal is lost

a reboot is mandatory

Comment 1 Jean-Benoit MARTIN 2013-05-23 15:43:32 UTC

Created attachment 245151 [details] [review]
set screnshield to false in lock dialog

Comment 2 Lionel Landwerlin 2013-05-23 17:10:12 UTC

Review of attachment 245151 [details] [review]:

Although there is definitely a bug here, this patch looks fairly wrong to me.
The reason is that, let's say you have a session open on tty7, when you click 'Login as another user', gdm spawns another X instance on a different tty on let's say tty1.
Your patch unlocks the session on tty7, so basically you've created a security issue.

Comment 3 Lionel Landwerlin 2013-05-23 17:27:56 UTC

My mistake, my setup is probably screwed. No security issue here.

All you need is probably to move the sync call to gdm after emitting the 'failed' signal, and the screenShield will lock itself up.

Comment 4 Jean-Benoit MARTIN 2013-05-24 09:07:42 UTC

I have tested the move the sync call of gdm

But it doesn't work, the screenshield lock is set to "true" when the user lock the screen. In switch user case the scrienshield is tested and set to false if needed.

Or a reset screenshield variables when user session is open

Comment 5 Lionel Landwerlin 2013-05-24 16:52:30 UTC

Moving the sync call to the end of the method works for me.

Comment 6 Lionel Landwerlin 2013-05-28 10:30:12 UTC

Apparently it doesn't work every time...
I dig a bit into the problem and came up with a better solution I think.

Explanations attached in the patch.

Comment 7 Lionel Landwerlin 2013-05-28 10:30:43 UTC

Created attachment 245444 [details] [review]
screenShield: prevent blocking the UI at inconvenient time

Comment 8 Jean-Benoit MARTIN 2013-05-29 07:59:25 UTC

Review of attachment 245444 [details] [review]:

Looks good to me

I have tested. it works

Comment 9 Ray Strode [halfline] 2013-06-03 20:20:02 UTC

*** Bug 700839 has been marked as a duplicate of this bug. ***

Comment 10 Ray Strode [halfline] 2013-06-03 20:26:54 UTC

some irc chatter:

<Jasper> halfline, did you see dj-death's analysis in the commit message in https://bug700901.bugzilla-attachments.gnome.org/attachment.cgi?id=245444 ?
<halfline> ah that analysis isn't quite right
<halfline> it's not because of a blocking dbus call
<halfline> it's blocking because of the dri lock
<Jasper> halfline, well right
<halfline> GL freezes as soon as you switch vt
<dj-death> wow.
<dj-death> didn't know that
<Jasper> dj-death, ajax has patches to get rid of it somewhere.
<dj-death> Jasper: mesa patches?
<Jasper> dj-death, X server patches
<dj-death> ok
...
<Jasper> dj-death, http://lists.x.org/archives/xorg-devel/2012-October/034005.html
...
<halfline> dj-death: so with your patch does the user end up waiting until the screen resets before the vt switch happens?
<dj-death> halfline: it waits for the shield's animation to finish
<Jasper> which animation?
<Jasper> Are we supposed to put the shield back down when we VT switch?
<dj-death> Jasper: the code is written in a way that log in as another user is the same signal as failed to login 
<dj-death> the signal's callback pulls the shield back down
<Jasper> dj-death, that can be changed
...
<halfline> putting it back down might not be that bad i guess
<halfline> how does it "feel" ? does it make the whole process seem too slow?
<dj-death> it feels better than an abrupt vt switch
<dj-death> the shield comes down in 300ms
<dj-death> halfline: but after the dri patch, I guess that's just a workaround
<halfline> that dri patch has been pending for years
<halfline> i wouldn't count on it landing any time soon
<halfline> oct 2012 wasn't the first time it was submitted
<halfline> dj-death: if you want to avoid the animation though, you could just change the "failed" signal to emit true or false
<halfline> (with none of the other changes)
<halfline> and then check if it's true or false in the signal handler and avoid the tween
<halfline> if it's false
<halfline> honestly though, i think having the animation might be more "right"
<halfline> especially if we could animate up the curtain on the other end
<halfline> then we'd have curtain come down, hopefully flickerless vt switch, curtain come up
<halfline> not a bad transition given we don't have wayland

Comment 11 Giovanni Campagna 2013-06-03 21:16:26 UTC

I won't comment on the idea of having the curtain fall down when you click "Log in with another user" (although it seems like papering over the problem to me, the transition from one login dialog to the other should be seamless), but either the problem analysis is wrong, or there is a state-machine bug ought to be fixed.

In particular, deactivate() is supposed to be idem-potent, and it should be possible to call it at any time (including during animations) without breaking anything. That's because it comes from login1, and can be called by the user administrator or other subsystems that have no knowledge of gnome.

So, what happens here, assuming that the lock involved is the DRI lock, not the DBus call to Gdm.
- The user clicks "Login as another user", _otherUserClicked()
- We cancel the userVerifier - that stops gdm conversations
  gdm schedules a Reset, and a few ConversationStopped *in that order*
  (first it emits the Reset, and then goes to kill the child session workers, and waits until they're reaped to emit ConversationStopped)
- We signal screenshield that the user failed to authenticate
- We start animating the curtain back to position, _resetLockScreen
  Main.sessionMode = 'lock-screen'
- DRI lock!
  meanwhile, we get the Reset and ConversationStopped
  the Reset is handled by UserVerifier._onReset and then UnlockDialog._onReset
  the DBusProxy is run_dispose()'d, further DBus signals are dropped on the ground
  we get another _resetLockScreen because of the 'failed' signal, but it does nothing because _lockScreenState != HIDDEN
- after the user authenticates, but before the VT switch, we get Unlock from login1, this is handled by deactivate(false)
  that calls _hideLockScreen, that is supposed to block the showing animation from _resetLockScreen - *bug*, we need a removeTweens() in the !animate path
  we then free the lock screen resources (protected with _hasLockScreen), and pop both modes
  Main.sessionMode = 'user'
  at this point, we still have a dialog, because _lockScreenShown (scheduled from _resetLockScreen above) hasn't been called yet, so we popModal() on it
  and we pop modal from ourselves - this should relieve all our grabs
  finally, we schedule a 0 timeout for _completeDeactivate, that destroys the dialog and finally hides everything

Now, the question becomes... what is broken in the above sequence of events?

I'm sorry, but the proposed patch just adds more complication in already hard to understand code, so I'm inclined to reject it, and try to debug this more, to find out exactly how it is broken.

PS: with regards to the analysis from bug 701516, I disagree with the conclusion.
- In _onStatusChanged, we only take the early return if _isActive or lightbox.visible, in which case _isModal is true and _becomeModal did nothing
- Even if we get the spurious presence change (it's possible that gnome-session watches login1 and computes the transition from that), the net effect of it is just destroying the dialog, which if any avoids the useless Reset signal above

Comment 12 Giovanni Campagna 2013-06-03 21:31:02 UTC

Oh, except that the invariant "_isActive implies _isModal is not true", since a2590164368fc28100a4475644f2d55827de3e7b, so _onStatusChanged (which is delivered late because of the DRI lock somewhere) happens right in the middle of deactivate() and _deactivateComplete(), which breaks.

Now, my opinion is that commit just be reverted, and everyone be happy with the previous working code, that was also correct in theory.
Otherwise, flipping the return login in _onStatusChanged should work, but that it breaks again if you add bug 699112 to the combination.

Comment 13 Giovanni Campagna 2013-06-03 21:49:42 UTC

Created attachment 245967 [details] [review]
ScreenShield: remove curtain animation when hiding without animation

If we don't remove the animation, we might leave a pending call
to _lockScreenShown() which would confuse our state tracking into
thinking we're active when we're not.

Ok, this the other path breaking the invariant. With this patch,
and a2590164368fc28100a4475644f2d55827de3e7b reverted, the bug
is fully solved.
If you have a better idea, I'm happy to consider it, but I'm
quite convinced it's hard to get any simpler.

Comment 14 Jasper St. Pierre (not reading bugmail) 2013-06-03 21:52:39 UTC

(In reply to comment #12)
> Now, my opinion is that commit just be reverted, and everyone be happy with the
> previous working code, that was also correct in theory.

It was done on purpose for several reasons. The modal should be dropped as soon as the user action is initiated.

Comment 15 Giovanni Campagna 2013-06-03 21:57:03 UTC

(In reply to comment #14)
> (In reply to comment #12)
> > Now, my opinion is that commit just be reverted, and everyone be happy with the
> > previous working code, that was also correct in theory.
> 
> It was done on purpose for several reasons. The modal should be dropped as soon
> as the user action is initiated.

And you'd let the user type on a partially invisible window? That doesn't sound right to me...

Comment 16 Giovanni Campagna 2013-06-03 21:57:03 UTC

(In reply to comment #14)
> (In reply to comment #12)
> > Now, my opinion is that commit just be reverted, and everyone be happy with the
> > previous working code, that was also correct in theory.
> 
> It was done on purpose for several reasons. The modal should be dropped as soon
> as the user action is initiated.

And you'd let the user type on a partially invisible window? That doesn't sound right to me...

Comment 17 Jasper St. Pierre (not reading bugmail) 2013-06-03 22:27:31 UTC

There's use cases for it. You can type <Super>term as soon as you enter your password instead of waiting until a 1 second animation finished.

It also makes sense so that the user timestamp is correct and tied to the Enter key. We can of course save the timestamp from the time we hit enter, but that's OK.

It's possible we should remove the cover pane on login too for the same reasons.

Comment 18 Jasper St. Pierre (not reading bugmail) 2013-06-04 06:34:26 UTC

(In reply to comment #12)
> Oh, except that the invariant "_isActive implies _isModal is not true", since
> a2590164368fc28100a4475644f2d55827de3e7b, so _onStatusChanged (which is
> delivered late because of the DRI lock somewhere) happens right in the middle
> of deactivate() and _deactivateComplete(), which breaks.

I'm curious if we could still maintain the invariant here. Why would the screen shield be _isActive while deactivated?

Comment 19 Giovanni Campagna 2013-06-04 16:55:59 UTC

(In reply to comment #18)
> (In reply to comment #12)
> > Oh, except that the invariant "_isActive implies _isModal is not true", since
> > a2590164368fc28100a4475644f2d55827de3e7b, so _onStatusChanged (which is
> > delivered late because of the DRI lock somewhere) happens right in the middle
> > of deactivate() and _deactivateComplete(), which breaks.
> 
> I'm curious if we could still maintain the invariant here. Why would the screen
> shield be _isActive while deactivated?

Uhm... because it was like so originally, and because various actors are visible, so it makes sense to be _isActive, in case something happens in the middle.
In any case...

- on further debug, this bug has nothing to do with popModal in deactivate(), because onStatusChanged happens after _deactivateComplete - the isActive comes from the _lockScreenShown that my patch prevents
- for robustness, it makes sense to invert the logic in _onStatusChanged and ignore the invariant - there is very little to lose

Comment 20 Giovanni Campagna 2013-06-04 17:00:19 UTC

Created attachment 246018 [details] [review]
ScreenShield: when the user goes idle, check for active before pushing a modal

We can't assume "isActive implies isModal", so there is a risk
of pushing a modal that nothing else will ever pop, because we
take the early return and don't activate the user active watch.

Comment 21 Matthias Clasen 2013-06-24 03:04:01 UTC

Can this get reviewed and merged ?

Comment 22 Jasper St. Pierre (not reading bugmail) 2013-06-24 03:07:15 UTC

Review of attachment 246018 [details] [review]:

::: js/ui/screenShield.js
@@ +777,3 @@
+            // so becomeModal() would take the fast path too. But that
+            // assumption is wrong during the deactivation animation, because
+            // we drop the modal early.

I don't think history needs to be in a comment like this. Remove it.

Comment 23 Giovanni Campagna 2013-07-04 13:19:44 UTC

Comment on attachment 246018 [details] [review]
ScreenShield: when the user goes idle, check for active before pushing a modal

Attachment 246018 [details] pushed as d509ab7 - ScreenShield: when the user goes idle, check for active before pushing a modal

The other patch is still needed and pending review.

Comment 24 Giovanni Campagna 2013-08-06 14:10:44 UTC

Attachment 245967 [details] pushed as 7652f42 - ScreenShield: remove curtain animation when hiding without animation
Reviewed by Jasper St. Pierre at Guadec.