After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 642497 - Try restarting window manager on Control-Alt-Delete if necessary
Try restarting window manager on Control-Alt-Delete if necessary
Status: RESOLVED FIXED
Product: gnome-session
Classification: Core
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: Session Maintainers
Session Maintainers
: 642980 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-02-16 19:22 UTC by Colin Walters
Modified: 2011-03-18 19:37 UTC
See Also:
GNOME target: 3.0
GNOME version: ---


Attachments
Try restarting window manager on Control-Alt-Delete if necessary (15.40 KB, patch)
2011-02-16 19:22 UTC, Colin Walters
reviewed Details | Review
Add "fail whale" dialog, will be shown on component failure (14.69 KB, patch)
2011-03-11 14:34 UTC, Colin Walters
committed Details | Review
gsm-manager: Add comment with historical context (1.33 KB, patch)
2011-03-11 14:35 UTC, Colin Walters
committed Details | Review
Handle required component failure by showing fail whale (9.66 KB, patch)
2011-03-11 14:35 UTC, Colin Walters
committed Details | Review
Add "Try Recovery" option to fail whale (11.52 KB, patch)
2011-03-13 20:55 UTC, Colin Walters
committed Details | Review
gsm-fail-whale-dialog: Fix height request (1.25 KB, patch)
2011-03-18 19:11 UTC, Colin Walters
committed Details | Review
Consistently translate component failure message (3.23 KB, patch)
2011-03-18 19:11 UTC, Colin Walters
committed Details | Review

Description Colin Walters 2011-02-16 19:22:26 UTC
Practically speaking right now GNOME Shell is a huge new code
base and there are still some crasher bugs remaining.  This hack
allows you to press Control-Alt-Delete (well, the default for Log Out),
and if there's no window manager running (and we know a provider)
gnome-session will try restarting it then.

The reason for doing this on C-A-D instead of automatically
is detailed in the documentation comment.
Comment 1 Colin Walters 2011-02-16 19:22:28 UTC
Created attachment 181039 [details] [review]
Try restarting window manager on Control-Alt-Delete if necessary
Comment 2 Vincent Untz 2011-02-16 23:15:27 UTC
Need to think if it's the best way to deal with that:  calling _gsm_manager_maybe_initiate_recovery() in user_logout() sounds a bit weird...

At least: do we really need a WM monitor object? I'd think we could just inline the call to XGetSelectionOwner() right there.
Comment 3 Colin Walters 2011-02-17 14:18:57 UTC
(In reply to comment #2)
> Need to think if it's the best way to deal with that:  calling
> _gsm_manager_maybe_initiate_recovery() in user_logout() sounds a bit weird...

Well, it seems like the least invasive way to listen for Control-Alt-Delete right now.  I'm totally open to alternative suggestions.

One that occurred to me was to, if we lost the WM, add a global filter for *any* keypress and try to do recovery then.  But it's not clear it's a lot better than C-A-D.

One thing I'd note too is that C-A-D has the nice benefit of defaulting to a logout timeout.  So we can tell people who have stuck session problems "Press Control-Alt-Delete, and if nothing happens, wait a minute".  

> At least: do we really need a WM monitor object? I'd think we could just inline
> the call to XGetSelectionOwner() right there.

Originally this patch basically auto-restarted the WM if it died, and that necessitated a WM monitor, so I already had the code.  I like that now it encapsulates the weirdness with the WM selection at least.
Comment 4 Vincent Untz 2011-02-21 21:18:47 UTC
Comment on attachment 181039 [details] [review]
Try restarting window manager on Control-Alt-Delete if necessary

Ok, thinking a bit more about this:

 - we certainly don't want to do this in user_logout() if show_confirmation is FALSE or if forceful_logout is TRUE

 - I'd argue that we probably want to map ctrl-alt-del to a specific method that would do this. I don't think we want to do this every time user_logout() is called. We can add a mode to the Logout dbus method for this.

 - in general, I think what would make sense is to check that all required component of the session are still there, and to restart them if necessary. I think what I disliked is that it's specific to the WM.
Comment 5 William Jon McCann 2011-02-22 01:51:15 UTC
I don't buy the argument that we can't or shouldn't handle the shell crash-looping.  The whole reason for a session manager is to keep things like this on track.  Having a secret keybinding (I've never used c-a-d to logout) instead of just doing the right thing seems like an odd choice to me.

Regardless of this we'll still have to handle the case where the shell doesn't start or hangs on start so I don't really see what this patch buys us.  And if any of this stuff is happening you really want to show something to the user.

I'd say we should:
 * auto restart failed core components
 * detect fail loops
 * show a fail whale when we fail to start a core component

also see http://live.gnome.org/GnomeShell/Design/Whiteboards/ProblemReporting
Comment 6 Colin Walters 2011-02-22 15:34:58 UTC
(In reply to comment #5)
> I don't buy the argument that we can't or shouldn't handle the shell
> crash-looping.

Depends on what you mean by "handle".

> Regardless of this we'll still have to handle the case where the shell doesn't
> start or hangs on start so I don't really see what this patch buys us. 

The patch buys us something to tell users to do if the session is wedged - "Press Control-Alt-Delete, if nothing happens wait a minute".

> And if
> any of this stuff is happening you really want to show something to the user.

Agreed.

> I'd say we should:
>  * auto restart failed core components
>  * detect fail loops

Can you elaborate on what you think are good heuristics for detecting failure loops?  I considered this a bit but I just didn't want to get into it.  

> also see http://live.gnome.org/GnomeShell/Design/Whiteboards/ProblemReporting

The whiteboard has a lot of good stuff on it.  But I want to point out that I was going for a specific failure condition I've seen a lot:

* gnome-shell crashes on startup due to something random (missing symbols from xulrunner, gfx driver, gtk mismatch, etc.)

The patch is pretty targeted at this specific thing, I admit.

In your whiteboard here, I would have to:

* Detect the failure (my patch does this)
* Implement fail loop heuristics (not implemented, depending on complexity could be a day to several or more of hacking)
* Implement basic fail whale dialog (first version may be easy?  But there's no mockups so I'm not sure.  Maybe just text with a "log out" button?)

Note without the shell though, we have little hope of showing something to the user in a reasonable way.  *Especially* if we're in the middle of an auto-restart loop, since the perceived effect here is all the windows vanishing for a second, then reappearing, then vanishing, etc.
Comment 7 William Jon McCann 2011-02-22 17:43:59 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > I don't buy the argument that we can't or shouldn't handle the shell
> > crash-looping.
> 
> Depends on what you mean by "handle".
> 
> > Regardless of this we'll still have to handle the case where the shell doesn't
> > start or hangs on start so I don't really see what this patch buys us. 
> 
> The patch buys us something to tell users to do if the session is wedged -
> "Press Control-Alt-Delete, if nothing happens wait a minute".
> 
> > And if
> > any of this stuff is happening you really want to show something to the user.
> 
> Agreed.
> 
> > I'd say we should:
> >  * auto restart failed core components
> >  * detect fail loops
> 
> Can you elaborate on what you think are good heuristics for detecting failure
> loops?  I considered this a bit but I just didn't want to get into it.  

We already have some lame code in gnome-session to stop trying to restart a component that is fail looping.  We should probably hook into that and branch on whether the component is part of the core or an app.

> > also see http://live.gnome.org/GnomeShell/Design/Whiteboards/ProblemReporting
> 
> The whiteboard has a lot of good stuff on it.  But I want to point out that I
> was going for a specific failure condition I've seen a lot:
> 
> * gnome-shell crashes on startup due to something random (missing symbols from
> xulrunner, gfx driver, gtk mismatch, etc.)

Yeah I've seen that too.  Similar with gnome-settings-daemon (it loads plugins too).  There are two main failure cases I think: fail looping and hanging.  The first we can detect already as mentioned above and the second we can detect if one of the startup phases of the session doesn't complete (iirc).

> The patch is pretty targeted at this specific thing, I admit.
> 
> In your whiteboard here, I would have to:
> 
> * Detect the failure (my patch does this)
> * Implement fail loop heuristics (not implemented, depending on complexity
> could be a day to several or more of hacking)
> * Implement basic fail whale dialog (first version may be easy?  But there's no
> mockups so I'm not sure.  Maybe just text with a "log out" button?)

Yeah I've been meaning to add those to the wiki page.  I'll try to do that.

> Note without the shell though, we have little hope of showing something to the
> user in a reasonable way.  *Especially* if we're in the middle of an
> auto-restart loop, since the perceived effect here is all the windows vanishing
> for a second, then reappearing, then vanishing, etc.

Yeah we have a similar problem with the fallback dialog.  It looks really assy without a window decoration.  We probably need to do client side drawing or something there.
Comment 8 Colin Walters 2011-02-22 19:50:28 UTC
One idea actually is that the session can write a file which GDM then picks up and displays.  Basically we just drop you back to the login screen, and show the fail whale there.
Comment 9 Matthias Clasen 2011-03-09 19:49:17 UTC
So, currently, if the shell crashes for any reason, you end up without a window manager and with no keyboard focus, which is really unacceptable. Tentatively putting on blocker for now.
Comment 10 Colin Walters 2011-03-09 19:58:36 UTC
*** Bug 642980 has been marked as a duplicate of this bug. ***
Comment 11 Colin Walters 2011-03-11 14:34:55 UTC
Created attachment 183143 [details] [review]
Add "fail whale" dialog, will be shown on component failure

Named the "fail whale" after Twitter, this dialog tells the user we
hit a serious problem and offers to log out.  Will be used
by following patches.
Comment 12 Colin Walters 2011-03-11 14:35:05 UTC
Created attachment 183144 [details] [review]
gsm-manager: Add comment with historical context
Comment 13 Colin Walters 2011-03-11 14:35:09 UTC
Created attachment 183145 [details] [review]
Handle required component failure by showing fail whale

When a required component either times out on startup, or
is killed by a fatal signal, toss up a fail whale dialog.
Comment 14 Matthias Clasen 2011-03-11 15:39:51 UTC
Should this use the sad computer we show in the 'sorry, fallback' dialog ?
Comment 15 Colin Walters 2011-03-13 20:55:09 UTC
Created attachment 183302 [details] [review]
Add "Try Recovery" option to fail whale

Also resize the fail whale dialog to a minimum of 640x480
to ensure visibility.
Comment 16 Matthias Clasen 2011-03-16 15:47:56 UTC
Review of attachment 183143 [details] [review]:

Looks good to me; we need to get this landed soon, and with an announcement / string freeze break request, since it adds new, highly visible strings.
Comment 17 Matthias Clasen 2011-03-16 15:48:44 UTC
Review of attachment 183143 [details] [review]:

splinter should let me update the status without adding a comment, really
Comment 18 Matthias Clasen 2011-03-16 15:50:39 UTC
Review of attachment 183144 [details] [review]:

I think it would be great if we also had some _developer_ documentation about what their responsibilities are when their thingie starts in certain phases of the session, and what the supported ways of interacting with the sm are. But this is a start, at least...
Comment 19 Matthias Clasen 2011-03-16 15:55:42 UTC
Review of attachment 183145 [details] [review]:

::: gnome-session/gsm-manager.c
@@ +115,3 @@
         GsmManagerPhase         phase;
         guint                   phase_timeout_id;
+        GSList                 *required_apps;

Seems the required_apps list only ever gets added to. Do we need to clean that up somewhere, and is it conceivable that apps might loose their required status during the lifetime of a session ?

@@ +3775,3 @@
 
+static gboolean
+_gsm_manager_add_autostart_app_internal (GsmManager *manager,

No need for the underscore, since its static, right ?
I think it would be even nicer to shorten this to just add_autostart_app(), but not a big deal. Other statics in here don't seem to carry the gsm_manager prefix.
Comment 20 Matthias Clasen 2011-03-16 15:56:40 UTC
Review of attachment 183302 [details] [review]:

::: gnome-session/gsm-manager.c
@@ +241,3 @@
+        }
+        return res;
+}

Factoring start_app_or_warn out looks like a nice, independent fix that could be committed separately.
Comment 21 Matthias Clasen 2011-03-16 15:57:50 UTC
Review of attachment 183302 [details] [review]:

set status to reviewed
Comment 22 Colin Walters 2011-03-16 20:44:28 UTC
(In reply to comment #19)
> Review of attachment 183145 [details] [review]:
> 
> ::: gnome-session/gsm-manager.c
> @@ +115,3 @@
>          GsmManagerPhase         phase;
>          guint                   phase_timeout_id;
> +        GSList                 *required_apps;
> 
> Seems the required_apps list only ever gets added to. Do we need to clean that
> up somewhere,

I guess we can slow down logout a little by freeing it in dispose(), yes.

> and is it conceivable that apps might loose their required status
> during the lifetime of a session ?

I can't think how that would be a useful concept.

> @@ +3775,3 @@
> 
> +static gboolean
> +_gsm_manager_add_autostart_app_internal (GsmManager *manager,
> 
> No need for the underscore, since its static, right ?
> I think it would be even nicer to shorten this to just add_autostart_app(), but
> not a big deal. Other statics in here don't seem to carry the gsm_manager
> prefix.

Fixed, thanks.
Comment 23 Colin Walters 2011-03-17 16:14:59 UTC
Attachment 183143 [details] pushed as a2fedc4 - Add "fail whale" dialog, will be shown on component failure
Attachment 183144 [details] pushed as c01831c - gsm-manager: Add comment with historical context
Attachment 183145 [details] pushed as 9a4ff4b - Handle required component failure by showing fail whale
Attachment 183302 [details] pushed as 2e92bb0 - Add "Try Recovery" option to fail whale
Comment 24 Vincent Untz 2011-03-18 14:13:17 UTC
Review of attachment 183302 [details] [review]:

One thing that isn't clear to me is why we need to set the minimum dialog size to ensure visibility.

::: gnome-session/gsm-fail-whale-dialog.c
@@ +167,3 @@
+                *minimum_size = 480;
+        if (natural_size && *natural_size < 640)
+                *natural_size = 480;

You check for 640 but use 480. Typo, I guess?
Comment 25 Vincent Untz 2011-03-18 14:24:40 UTC
Comment on attachment 183145 [details] [review]
Handle required component failure by showing fail whale

>+                on_required_app_failure (manager, app, "Killed by signal");                

and later on:

>+                                on_required_app_failure (manager, app, _("Timed out"));

One string is translated and not the other. Please decide what's right :-)
Comment 26 Colin Walters 2011-03-18 19:11:45 UTC
Created attachment 183757 [details] [review]
gsm-fail-whale-dialog: Fix height request

We want a minimum of 480 here.
Comment 27 Colin Walters 2011-03-18 19:11:47 UTC
Created attachment 183758 [details] [review]
Consistently translate component failure message
Comment 28 Vincent Untz 2011-03-18 19:24:22 UTC
Review of attachment 183757 [details] [review]:

Thanks, please push!
Comment 29 Vincent Untz 2011-03-18 19:27:53 UTC
Review of attachment 183758 [details] [review]:

Hrm, I don't think the strings from gsm-process-helper.c appear anywhere in the UI -- they just appear in a g_warning(), as far as I can tell. Also, "Killed by signal" in gsm-manager.c is still untranslated, which sounds wrong.
Comment 30 Colin Walters 2011-03-18 19:30:54 UTC
(In reply to comment #29)
> Review of attachment 183758 [details] [review]:
> 
> Hrm, I don't think the strings from gsm-process-helper.c appear anywhere in the
> UI -- they just appear in a g_warning(), as far as I can tell. 

True - but following the rules from my DDL post, they should be translated since they're g_error_ and thus potentially could appear in a dialog.

> Also, "Killed by
> signal" in gsm-manager.c is still untranslated, which sounds wrong.

Oops, fixed!  Good catch.
Comment 31 Colin Walters 2011-03-18 19:37:17 UTC
Attachment 183757 [details] pushed as c0f8768 - gsm-fail-whale-dialog: Fix height request
Attachment 183758 [details] pushed as a18e2a2 - Consistently translate component failure message