After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 645251 - gsm-manager: recovery design improvements
gsm-manager: recovery design improvements
Status: RESOLVED FIXED
Product: gnome-session
Classification: Core
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: Session Maintainers
Session Maintainers
[gnome3-important]
: 609774 634762 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-03-19 19:16 UTC by Colin Walters
Modified: 2011-03-22 20:02 UTC
See Also:
GNOME target: 3.0
GNOME version: ---


Attachments
gsm-manager: Don't fail whale autorestart components (1.49 KB, patch)
2011-03-19 19:16 UTC, Colin Walters
needs-work Details | Review
gsm-manager: (14.12 KB, patch)
2011-03-21 19:32 UTC, Colin Walters
none Details | Review
gsm-manager: Improve the experience when required components fail (28.46 KB, patch)
2011-03-21 22:14 UTC, William Jon McCann
none Details | Review
gsm-manager: Improve the experience when required components fail (28.55 KB, patch)
2011-03-21 22:56 UTC, William Jon McCann
reviewed Details | Review
gsm-manager/gsm-app: Fix restart logic (5.11 KB, patch)
2011-03-21 23:48 UTC, Colin Walters
committed Details | Review
gsm-manager: Improve the experience when required components fail (29.61 KB, patch)
2011-03-22 18:37 UTC, William Jon McCann
committed Details | Review

Description Colin Walters 2011-03-19 19:16:42 UTC
This is the least bad thing for now; we could also just ignore
autorestart, and wedge it into "Try Recovery" later.
Comment 1 Colin Walters 2011-03-19 19:16:44 UTC
Created attachment 183809 [details] [review]
gsm-manager: Don't fail whale autorestart components
Comment 2 William Jon McCann 2011-03-19 22:02:27 UTC
Try recovery should be removed as well.  It makes no sense.  The computer should try to recover and not bother me if it can.
Comment 3 Colin Walters 2011-03-19 23:08:34 UTC
(In reply to comment #2)
> Try recovery should be removed as well.  It makes no sense.  The computer
> should try to recover and not bother me if it can.

Well, the reason I didn't just restart is that it won't be transparent.  For example, you will lose any persistent notifications, application tracking, also the _WM_STATE stickiness for windows on other monitors...basically we aren't stateless.  That's for the shell, I'm not sure about other things like gnome-settings-daemon.

So while we can restart some stuff automatically again it won't be 100% transparent, and there are definitely cases where it will leave you in a buggy state.  Do you want a restart anyways and...no notification?  Or try to restart but show the fail dialog?
Comment 4 Colin Walters 2011-03-19 23:18:57 UTC
Actually I forgot that at least in Fedora at least, if abrt is running (which it will be if you've logged in successfully, then at least there's something).

So...change it so that:

* If required app fails to start up the first time, just offer log out
* If required app fails after login, try restart, and if it fails again within say a minute, offer log out?
Comment 5 William Jon McCann 2011-03-19 23:26:59 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Try recovery should be removed as well.  It makes no sense.  The computer
> > should try to recover and not bother me if it can.
> 
> Well, the reason I didn't just restart is that it won't be transparent.  For
> example, you will lose any persistent notifications, application tracking, also
> the _WM_STATE stickiness for windows on other monitors...basically we aren't
> stateless.  That's for the shell, I'm not sure about other things like
> gnome-settings-daemon.

That's why you need to show a different dialog in that case.  Something that says "Shit, we crashed.  Sorry.  But we restarted.  Please file a bug if you want."

What is the point of the logout timer?
Comment 6 William Jon McCann 2011-03-20 00:08:02 UTC
Added mockups to http://live.gnome.org/GnomeShell/Design/Whiteboards/ProblemReporting

1. If a session component fails and the system cannot recover/start it after a number of tries, display a fail whale and offer the options to report a bug and log out.

2. If a session component fails and the system is able to recover/restart it, display a notification to inform the user and beg for a bug report.
Comment 7 William Jon McCann 2011-03-20 05:13:49 UTC
*** Bug 609774 has been marked as a duplicate of this bug. ***
Comment 8 William Jon McCann 2011-03-20 05:39:19 UTC
*** Bug 634762 has been marked as a duplicate of this bug. ***
Comment 9 Colin Walters 2011-03-20 18:21:12 UTC
(In reply to comment #5)
>
> 
> That's why you need to show a different dialog in that case.  Something that
> says "Shit, we crashed.  Sorry.  But we restarted.  Please file a bug if you
> want."

Ok; these look like good designs, though to implement the first one is there a reason not to just drop back down to gdm, like you suggest earlier (shell problems handled by gdm?).

For the restart one, for 3.0 in Fedora I guess we'll just rely on the ABRT notification?  It's gross since it uses packages, but...
 
> What is the point of the logout timer?

The idea was for situations like graphics driver failure where we aren't showing anything to the screen; I guess though we can just classify this similar to lockup and hope that control-alt-delete recovers, if not then you have to hold the power button.

So let me see if I can break this down into stuff for 3.0, and stuff for 3.2:

3.0:
* change gnome-shell.desktop to have X-GNOME-Autorestart
* tweak the restart limit stuff in gsm-app.c to be saner (once per minute seems a lot better to me than 20 times as fast as possible, then never again; but open to suggestions here)
* if we fail past the recovery limit, then show the fail whale dialog
* remove try recovery from fail whale

3.2:
* take over UI from abrt
* have some way to tell gdm we blew up so it can display something?
* more stuff to come
Comment 10 William Jon McCann 2011-03-20 18:25:06 UTC
Seems fine.  And most importantly,

3.0: Fix the dialog to look more like mine :)  We should already have that icon in the session module.
Comment 11 Vincent Untz 2011-03-21 09:59:18 UTC
Comment on attachment 183809 [details] [review]
gsm-manager: Don't fail whale autorestart components

Based on comment #9, my understanding is that this will be implemented another way.
Comment 12 William Jon McCann 2011-03-21 13:49:32 UTC
computer-fail has been added to the icon theme so we can use that here too.
Comment 13 Colin Walters 2011-03-21 19:32:50 UTC
Created attachment 183989 [details] [review]
gsm-manager:

Design updates.
Comment 14 William Jon McCann 2011-03-21 22:14:16 UTC
Created attachment 184015 [details] [review]
gsm-manager: Improve the experience when required components fail

Instead of just failing and leaving the user no way out, present
a "fail whale" that allows the user to logout and try again.
Comment 15 William Jon McCann 2011-03-21 22:56:08 UTC
Created attachment 184022 [details] [review]
gsm-manager: Improve the experience when required components fail

Instead of just failing and leaving the user no way out, present
a "fail whale" that allows the user to logout and try again.

Fix label justification
Comment 16 Colin Walters 2011-03-21 23:48:12 UTC
Created attachment 184024 [details] [review]
gsm-manager/gsm-app: Fix restart logic

We only care about restart time; if the app crashes within
a minute of initial start, we should still restart.

Also, ignore "died" for autorestart apps; we'll lose
the connection and do the restart there.
Comment 17 Vincent Untz 2011-03-22 00:41:55 UTC
Comment on attachment 184022 [details] [review]
gsm-manager: Improve the experience when required components fail

One thing I wonder: if the window manager is not dead, but only something like g-s-d dies, it might be nice to give a way to users to save their documents... Not sure how to best deal with that, though.

(sorry, splinter doesn't work for this patch here :/)

>-/* Shared with gsm-logout-dialog.c */
>-#define AUTOMATIC_ACTION_TIMEOUT 60

I've no major objection removing the timeout; I'm just wondering if users depending on a11y will be able to use the dialog (if things crashed). If no, they might get stuck; the timeout could possibly help.

In general, I don't know if we really need the whole _window_move_resize_window() function. We could just call gdk_window_move_resize(), since this should not be called really often.

>+static void
>+update_geometry (GsmFailWhaleDialog *fail_dialog)
>+{
>+        int monitor;
>+        GdkScreen *screen;
> 
>-        g_type_class_add_private (klass, sizeof (GsmFailWhaleDialogPrivate));
>+        screen = gdk_screen_get_default ();

Please use gtk_widget_get_screen (GTK_WIDGET (fail_dialog))

>+static void
>+gsm_fail_whale_dialog_realize (GtkWidget *widget)
>+{
>+        if (GTK_WIDGET_CLASS (gsm_fail_whale_dialog_parent_class)->realize) {
>+                GTK_WIDGET_CLASS (gsm_fail_whale_dialog_parent_class)->realize (widget);
>+        }
>+
>+        _window_override_user_time (GSM_FAIL_WHALE_DIALOG (widget));

Why do we need this?

>+        update_geometry (GSM_FAIL_WHALE_DIALOG (widget));
>+        _window_move_resize_window (GSM_FAIL_WHALE_DIALOG (widget), TRUE, TRUE);
>+
>+        g_signal_connect (gtk_window_get_screen (GTK_WINDOW (widget)),
>+                          "size_changed",
>+                          G_CALLBACK (on_screen_size_changed),
>+                          widget);

Just to be on the safe side, you need to disconnect this signal handler on unrealize.

>diff --git a/gnome-session/gsm-fail-whale-dialog.h b/gnome-session/gsm-fail-whale-dialog.h
>index 98f7288..85aedbc 100644
>--- a/gnome-session/gsm-fail-whale-dialog.h
>+++ b/gnome-session/gsm-fail-whale-dialog.h
>@@ -40,26 +40,24 @@ typedef struct _GsmFailWhaleDialogClass    GsmFailWhaleDialogClass;
> typedef struct _GsmFailWhaleDialogPrivate  GsmFailWhaleDialogPrivate;
> 
> typedef enum {
>-        GSM_FAIL_WHALE_DIALOG_FAIL_TYPE_RECOVERABLE,
>         GSM_FAIL_WHALE_DIALOG_FAIL_TYPE_FATAL
> } GsmFailWhaleDialogFailType;

I'd simply get rid of the enum; it's not needed anymore if there's only one type and no use for a second type.
Comment 18 Vincent Untz 2011-03-22 00:43:05 UTC
Comment on attachment 184024 [details] [review]
gsm-manager/gsm-app: Fix restart logic

This one is fine, and can be committed once we get the first part in.
Comment 19 William Jon McCann 2011-03-22 05:11:35 UTC
(In reply to comment #17)
> (From update of attachment 184022 [details] [review])
> One thing I wonder: if the window manager is not dead, but only something like
> g-s-d dies, it might be nice to give a way to users to save their documents...
> Not sure how to best deal with that, though.

Basically we're already in the deeply hosed regime.  Just like at logout, apps should not expect that the user will have an opportunity to interact with them.  For 3.2 we *really* need to figure out how to make apps just save state automatically.  Anyway, if g-s-d dies and it was running long enough for the user to do work it is very likely to restart fine.

> (sorry, splinter doesn't work for this patch here :/)
> 
> >-/* Shared with gsm-logout-dialog.c */
> >-#define AUTOMATIC_ACTION_TIMEOUT 60
> 
> I've no major objection removing the timeout; I'm just wondering if users
> depending on a11y will be able to use the dialog (if things crashed). If no,
> they might get stuck; the timeout could possibly help.

We don't want the user to walk away from the computer and come back and see the login screen - that really sucks.  They should at least know it died.  And eventually have the opportunity to file a bug (we need problem reporting built into GNOME first).

> In general, I don't know if we really need the whole
> _window_move_resize_window() function. We could just call
> gdk_window_move_resize(), since this should not be called really often.

We probably don't need it all.  I copied it from screensaver for simplicity since I am sure it works.

> >+static void
> >+update_geometry (GsmFailWhaleDialog *fail_dialog)
> >+{
> >+        int monitor;
> >+        GdkScreen *screen;
> > 
> >-        g_type_class_add_private (klass, sizeof (GsmFailWhaleDialogPrivate));
> >+        screen = gdk_screen_get_default ();
> 
> Please use gtk_widget_get_screen (GTK_WIDGET (fail_dialog))
> 
> >+static void
> >+gsm_fail_whale_dialog_realize (GtkWidget *widget)
> >+{
> >+        if (GTK_WIDGET_CLASS (gsm_fail_whale_dialog_parent_class)->realize) {
> >+                GTK_WIDGET_CLASS (gsm_fail_whale_dialog_parent_class)->realize (widget);
> >+        }
> >+
> >+        _window_override_user_time (GSM_FAIL_WHALE_DIALOG (widget));
> 
> Why do we need this?

Copied from the screensaver.  My recollection is that it bypassed focus stealing.  Don't know if we need it here but wanted to go with what I knew worked.

> >+        update_geometry (GSM_FAIL_WHALE_DIALOG (widget));
> >+        _window_move_resize_window (GSM_FAIL_WHALE_DIALOG (widget), TRUE, TRUE);
> >+
> >+        g_signal_connect (gtk_window_get_screen (GTK_WINDOW (widget)),
> >+                          "size_changed",
> >+                          G_CALLBACK (on_screen_size_changed),
> >+                          widget);
> 
> Just to be on the safe side, you need to disconnect this signal handler on
> unrealize.

Indeed.  Forgot to add that.
Comment 20 William Jon McCann 2011-03-22 18:37:57 UTC
Created attachment 184112 [details] [review]
gsm-manager: Improve the experience when required components fail

Instead of just failing and leaving the user no way out, present
a "fail whale" that allows the user to logout and try again.
Comment 21 Vincent Untz 2011-03-22 20:02:09 UTC
Thanks, committed!