GNOME Bugzilla – Bug 648384
restart by typing "r" in the gnome-shell run-dialog 2x in one minute brings up the "oops" window
Last modified: 2021-07-05 14:33:26 UTC
Restarting gnome-shell by bringing up the run-dialog and typing "r" twice in one minute, brings up the "oops dialog." Ubuntu 11.04 Beta GNOME Shell 3.0.0.2 installed from Ricotz PPA
If I alt-F4 out of the "oops dialog" I can do gnome-shell --restart or "r" from the run dialog, and no more "oops." So it appears if I restart twice in a minute, I get the "oops dialog," but if I alt-F4 out of the dialog, the bug disappears and I can restart g-s without further issue.
Same issue en Fedora 15.
Yeah this is quite bad. The designed/intended behavior is for the "fail whale" to only be shown for unrecoverable errors. Basically when the user session oops its pants. So it should certainly not be shown when: a. a component is manually stopped and successfully restarted b. a component fails/crashes and can be restarted successfully IIRC there is a limit on the number of times a component can crash before this screen is shown. I suspect that was in order to catch fail loops where a component seems to restart successfully but then fails again quickly. Obviously the key here is understanding what quickly means. However, the current implementation does not use any kind of time window for diagnosis. So there are two bugs I suppose. First is not correctly detecting a successful recovery. Second is not using a time window to detect fail loops.
Since the bug was originally opened, gnome-session was changed to only show the fail whale if a required component died with a signal, rather than generically exit (gsm-autostart-app.c:924). This covers the "custom-wm --replace" and "alt-f2 r" use cases, as they call exit(); additionally all unexpected exit paths should end with abort(), so we get traces. The only case not covered is replacing gnome-settings-daemon with a jhbuilt version (or restarting it), as you need to send int or term to release the dbus name, and if done twice you get the fail whale. Not a big deal, for a dev replacing g-s-d at runtime, and if g-s-d goes down, you really need the fail whale (apps like to crash without it). Anyway fixing it is just a matter of adding the replace flag to RequestName. So my question... is this really a blocker for 3.4?
Its not working, though. If I hit alt-f2 r in the shell, I see: gnome-session[19681]: WARNING: Application 'gnome-shell.desktop' killed by signal
Found the code problem (it's not restarted because it dies with a signal, but because it disconnects from XSMP), here is a fix for gnome-shell. A similar fix would be needed for gnome-settings-daemon too (along with the allow_replace+replace flags when requesting the name).
Created attachment 209398 [details] [review] Don't set the autorestart hint for gnome-session If the autorestart hint is set, the process is forcefully killed and restarted everytime it disconnects from XSMP, which break replacing the wm and breaks alt-f2 r. If instead the hint is not set, the process is monitored via SIGCHLD and only restarted when it dies by a signal.
*** Bug 671694 has been marked as a duplicate of this bug. ***
Review of attachment 209398 [details] [review]: Makes sense.
Comment on attachment 209398 [details] [review] Don't set the autorestart hint for gnome-session Attachment 209398 [details] pushed as 9bb9999 - Don't set the autorestart hint for gnome-session Not closing since I don't have a 3.4 gnome-shell started by gnome-session that I can test.
I think this is causing bug 648384, which is a major regression when gnome-shell crashes. The right fix is to tell gnome-session to not autorestart when "r" is typed in the run dialog; telling this all the time is wrong.
Why does autorestart affect fail whaling ?
(In reply to comment #12) > Why does autorestart affect fail whaling ? Alt+F2 'r' kills XSMP, so gnome-session tries to start a new gnome-shell but by that time gnome-shell has already re-captured DBus/XSMP, so the gnome-session-started shell fails to start, which means we have a fail whale.
(In reply to comment #12) > Why does autorestart affect fail whaling ? See bug 648384 comment 6. Stepping back a bit, I think it's wrong to not try to restart gnome-shell before showing the fail whale dialog, when gnome-shell crashes. It's better to try to recover the session at least once.
(In reply to comment #11) > I think this is causing bug 648384, which is a major regression when > gnome-shell crashes. > > The right fix is to tell gnome-session to not autorestart when "r" is typed in > the run dialog; telling this all the time is wrong. Wait, what is "this"? This is bug 648384. Anyway... (In reply to comment #14) > (In reply to comment #12) > > Why does autorestart affect fail whaling ? > > See bug 648384 comment 6. > > Stepping back a bit, I think it's wrong to not try to restart gnome-shell > before showing the fail whale dialog, when gnome-shell crashes. It's better to > try to recover the session at least once. gnome-shell is an autostarted app (GsmAutostartApp), so it is watched via SIGCHLD, which is turned into ::app-died and restarted at gsm-manager.c:605 (as far as I understand this code)
(In reply to comment #15) > (In reply to comment #11) > > I think this is causing bug 648384, which is a major regression when > > gnome-shell crashes. > > > > The right fix is to tell gnome-session to not autorestart when "r" is typed in > > the run dialog; telling this all the time is wrong. > > Wait, what is "this"? This is bug 648384. By "this", I meant "telling gnome-session to not autorestart". [...] > gnome-shell is an autostarted app (GsmAutostartApp), so it is watched via > SIGCHLD, which is turned into ::app-died and restarted at gsm-manager.c:605 (as > far as I understand this code) See discussion in the other bug. Short summary: gnome-shell doesn't crash, but does exit(1), so :app-died is not used.
(In reply to comment #16) > > gnome-shell is an autostarted app (GsmAutostartApp), so it is watched via > > SIGCHLD, which is turned into ::app-died and restarted at gsm-manager.c:605 (as > > far as I understand this code) > > See discussion in the other bug. Short summary: gnome-shell doesn't crash, but > does exit(1), so :app-died is not used. The other bug being? And anyway, why does gnome-shell exits instead of calling abort(), which gives us stack traces? What code path are we talking about, btw? There is no exit(1) is gnome-shell, except for repeatable launching failures. (Yes, it's bad we don't have a fail whale there, but it's really an edge case)
(In reply to comment #17) > (In reply to comment #16) > > > gnome-shell is an autostarted app (GsmAutostartApp), so it is watched via > > > SIGCHLD, which is turned into ::app-died and restarted at gsm-manager.c:605 (as > > > far as I understand this code) > > > > See discussion in the other bug. Short summary: gnome-shell doesn't crash, but > > does exit(1), so :app-died is not used. > > The other bug being? Grr, I keep pasting the wrong bug number, sorry. I'm talking about bug 672419. > And anyway, why does gnome-shell exits instead of calling abort(), which gives > us stack traces? > What code path are we talking about, btw? I think this happens when a typelib is not found, see attachment 210132 [details] from bug 672419. > There is no exit(1) is gnome-shell, > except for repeatable launching failures. (Yes, it's bad we don't have a fail > whale there, but it's really an edge case) That's exactly what occurs in bug 672419: launching failure because of a typelib not being found.
(In reply to comment #18) > (In reply to comment #17) > > There is no exit(1) is gnome-shell, > > except for repeatable launching failures. (Yes, it's bad we don't have a fail > > whale there, but it's really an edge case) > > That's exactly what occurs in bug 672419: launching failure because of a > typelib not being found. I see that now. I'm inclined on the "exit(!= 0) => crash", rather than resetting Autostart=True and reengineering gnome-session to fix this again. Btw, it only affects autostarted apps, which are like "session daemons" and are expected to be always around, so autostarting once in a minute if exit(1) is not that risk.
Removing the GNOME 3.4 target.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/gnome-shell/-/issues/ Thank you for your understanding and your help.