After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 672419 - shell crashed at login and ended up in a dead end session
shell crashed at login and ended up in a dead end session
Status: RESOLVED OBSOLETE
Product: gnome-session
Classification: Core
Component: gnome-session
unspecified
Other Linux
: High blocker
: ---
Assigned To: Session Maintainers
Session Maintainers
: 645928 674840 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2012-03-20 01:02 UTC by William Jon McCann
Modified: 2021-06-14 18:22 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
screenshot (839.45 KB, image/png)
2012-03-20 01:02 UTC, William Jon McCann
  Details
xsession errors trimmed of noisy warnings (54.30 KB, text/plain)
2012-03-20 01:20 UTC, William Jon McCann
  Details
gsm: Properly move to next phase if an app dies on startup (3.02 KB, patch)
2012-03-23 16:39 UTC, Vincent Untz
committed Details | Review
gsm: Share code to restart an app (3.07 KB, patch)
2012-03-23 16:39 UTC, Vincent Untz
committed Details | Review
gsm: Stop disconnecting "registered" signal for GsmApp (1.47 KB, patch)
2012-03-23 16:39 UTC, Vincent Untz
committed Details | Review
gsm: Pass exit code in "exited" signal of GsmApp (4.20 KB, patch)
2012-03-23 16:39 UTC, Vincent Untz
committed Details | Review
gsm: Consider that a required component that exits with 1 has crashed (1.33 KB, patch)
2012-03-23 16:39 UTC, Vincent Untz
committed Details | Review
gsm: Remove duplicated code (1.94 KB, patch)
2012-03-23 16:39 UTC, Vincent Untz
committed Details | Review
gsm: Pass signal id in "died" signal of GsmApp (3.83 KB, patch)
2012-03-23 16:39 UTC, Vincent Untz
committed Details | Review
gsm: On an app crash, only depend on autorestart for apps with a client (2.62 KB, patch)
2012-03-23 16:39 UTC, Vincent Untz
needs-work Details | Review
manager: treat non-0 exit status for required components as fail (1.60 KB, patch)
2012-07-17 02:22 UTC, Ray Strode [halfline]
committed Details | Review
var log messages (442.81 KB, text/plain)
2013-03-13 14:12 UTC, Nick G
  Details

Description William Jon McCann 2012-03-20 01:02:00 UTC
Created attachment 210131 [details]
screenshot

I am testing a jhbuild session and after logging in I see nothing but an empty background. It is entirely possible that I've misconfigured something but even so we should *never* end up in a state like this where there is no way out.
Comment 1 William Jon McCann 2012-03-20 01:20:26 UTC
Created attachment 210132 [details]
xsession errors trimmed of noisy warnings
Comment 2 Matthias Clasen 2012-03-21 11:10:56 UTC
I've faced this too, today. The problem in my case was a 3.3.92 shell + gcr without introspection data. So, I guess to reproduce, just remove the gcr gir
Comment 3 Vincent Untz 2012-03-21 11:26:20 UTC
gnome-session[23923]: DEBUG(+): GsmXSMPClient: getting restart style
gnome-session[23923]: DEBUG(+): GsmManager: autorestart not set, not restarting application

Is gnome-shell telling us to not autorestart?
Comment 4 Vincent Untz 2012-03-21 11:28:25 UTC
Likely caused by http://git.gnome.org/browse/gnome-shell/commit/?id=9bb9999b46cc2c759d4e0a5c5f7515e32eafc0f0 which was an attempt to fix bug 648384.
Comment 5 Matthias Clasen 2012-03-21 12:52:32 UTC
That a component is set to autorestart or not should certainly not affect whether we show a fail whale ?!
Comment 6 Vincent Untz 2012-03-21 13:00:09 UTC
(In reply to comment #5)
> That a component is set to autorestart or not should certainly not affect
> whether we show a fail whale ?!

Because if we don't need autorestart, then it means we don't care if the component goes away (also, fail whale is triggered if the app crashes twice in a minute, which can't happen without autorestart).
Comment 7 Matthias Clasen 2012-03-21 13:47:46 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > That a component is set to autorestart or not should certainly not affect
> > whether we show a fail whale ?!
> 
> Because if we don't need autorestart, then it means we don't care if the
> component goes away (also, fail whale is triggered if the app crashes twice in
> a minute, which can't happen without autorestart).

My expectation would be that we show the fail whale right away for a required component that is not set to autorestart, and show it after the second crash for one that is set to autorestart. But then we are back in the alt-f2-r-fail-whales territory where this thing started...
Comment 8 Vincent Untz 2012-03-21 14:09:06 UTC
I just noticed this:
gnome-session[23923]: DEBUG(+): GsmAutostartApp: (pid:24094) done (status:1)

This actually means that gnome-shell exited properly (as in WIFEXITED), and didn't crash; that's why it's not restarted. A real crash would result in automatic autorestart of the required component.
Comment 9 Ray Strode [halfline] 2012-03-21 15:08:32 UTC
an exit status of 1 is effectively the same as a crash for most X apps, so probably should be treated the same as a crash.
Comment 10 Vincent Untz 2012-03-21 15:17:42 UTC
(In reply to comment #9)
> an exit status of 1 is effectively the same as a crash for most X apps, so
> probably should be treated the same as a crash.

My point is not about the exit status. It's that if this is a crash, WIFEXITED() should return false, and WIFSIGNALED() should return true. Why would this be different for most X apps?
Comment 11 Ray Strode [halfline] 2012-03-21 15:37:23 UTC
because when X crashes the app because of BadDrawable (or whatever) it does exit(1) instead of raise(SIGABRT).  Same with libdbus, when it crashes the app it does exit(1) too.
Comment 12 Vincent Untz 2012-03-21 16:07:05 UTC
Ah, didn't know that. I'm a bit reluctant to still assume "exit(1) == crash", though, as it's perfectly valid exit status otherwise.

Also, most crashes I've experienced during the last 10 years or so are app-specific, and not related to BadDrawable (or similar X errors), so I wouldn't think it's that much of an issue here.

If people care really strongly about this, though, we can try this out -- but I don't think it's wise to change this just before a stable release :-)
Comment 13 Vincent Untz 2012-03-23 16:39:12 UTC
Created attachment 210446 [details] [review]
gsm: Properly move to next phase if an app dies on startup

There is no reason to wait for the timeout if an app dies and fails to
be restarted.

Also, only do this if we're in a startup phase.
Comment 14 Vincent Untz 2012-03-23 16:39:16 UTC
Created attachment 210447 [details] [review]
gsm: Share code to restart an app
Comment 15 Vincent Untz 2012-03-23 16:39:20 UTC
Created attachment 210448 [details] [review]
gsm: Stop disconnecting "registered" signal for GsmApp

The reason we were doing this is that the code to move to the next phase
when an app is registered was not checking for the current phase. This
is done now.
Comment 16 Vincent Untz 2012-03-23 16:39:24 UTC
Created attachment 210449 [details] [review]
gsm: Pass exit code in "exited" signal of GsmApp
Comment 17 Vincent Untz 2012-03-23 16:39:28 UTC
Created attachment 210450 [details] [review]
gsm: Consider that a required component that exits with 1 has crashed

This way, we will attempt to restart it.
Comment 18 Vincent Untz 2012-03-23 16:39:32 UTC
Created attachment 210451 [details] [review]
gsm: Remove duplicated code
Comment 19 Vincent Untz 2012-03-23 16:39:36 UTC
Created attachment 210452 [details] [review]
gsm: Pass signal id in "died" signal of GsmApp
Comment 20 Vincent Untz 2012-03-23 16:39:40 UTC
Created attachment 210453 [details] [review]
gsm: On an app crash, only depend on autorestart for apps with a client

If an app has no registered client, the autorestart behavior cannot
work (since it occurs when the client gets disconnected). So if there's
no registered client, just proceed with a manual restart.
Comment 21 Vincent Untz 2012-03-23 16:41:31 UTC
This patch series is an attempt to fix this; it will only consider that exit(1) = crash for required components.

It needs some testing, though -- I've barely played with it.
Comment 22 Vincent Untz 2012-03-23 16:50:05 UTC
Comment on attachment 210453 [details] [review]
gsm: On an app crash, only depend on autorestart for apps with a client

This part needs to be clever: there might be a client already being started but not registered yet.
Comment 23 Vincent Untz 2012-03-26 11:21:05 UTC
So, any opinion on pushing this for 3.4.0 (at least the patches up to attachment 210450 [details] [review])?

Again, I'm a bit reluctant to push this at this point because I'd prefer to have some real testing over a longer period of time.

FWIW, another thing we could do to reduce the risks would be to consider that exit(1)=crash only during the startup phases.
Comment 24 Ray Strode [halfline] 2012-03-26 20:01:04 UTC
seems tight for 3.4.0
Comment 25 Vincent Untz 2012-03-27 08:39:06 UTC
So I've released 3.4.0 without this to be on the safe side, and then pushed the code reorg in the patch series that shouldn't affect the behavior of gnome-session.

We're left with attachment 210450 [details] [review] (comment 17). I'll likely push this to 3.5.x, but I'm not sure this will help get enough testing to help deciding if we want this in 3.4.1.
Comment 26 Vincent Untz 2012-04-26 06:58:01 UTC
*** Bug 674840 has been marked as a duplicate of this bug. ***
Comment 27 Ray Strode [halfline] 2012-06-04 22:09:46 UTC
Comment on attachment 210450 [details] [review]
gsm: Consider that a required component that exits with 1 has crashed

Attachment 210450 [details] pushed as c6e23f8 - gsm: Consider that a required component that exits with 1 has crashed
Comment 28 William Jon McCann 2012-06-26 13:37:30 UTC
*** Bug 645928 has been marked as a duplicate of this bug. ***
Comment 29 William Jon McCann 2012-06-26 13:39:02 UTC
Today I'm getting this:
gnome-session[12056]: DEBUG(+): GsmAutostartApp: (pid:12238) done (status:127)
gnome-session[12056]: DEBUG(+): App gnome-shell.desktop exited with 127

And ending up in a dead end.
Comment 30 Vincent Untz 2012-06-26 14:05:20 UTC
And do you know why gnome-shell exits with 127?

I don't think it's reasonable to consider that all exit codes != 0 mean a crash in general, but maybe we can do that for the shell?
Comment 31 William Jon McCann 2012-06-26 14:15:09 UTC
A mismatch between mutter and the shell. But the fact remains that we never want the shell to fail and not show the fail whale.
Comment 32 Ray Strode [halfline] 2012-06-27 20:06:40 UTC
127 means "command not found" to the shell, so we're probably running the command through a shell and the program wasn't installed where it thought it was.

I don't think it's "wrong" to consider anything but 0 as a failure fwiw.  certainly the test command and the if command etc treat them all as false.
Comment 33 Ray Strode [halfline] 2012-07-17 02:13:36 UTC
"the shell" in comment 32 meant e.g. bash, yay for namespace clashes
Comment 34 Ray Strode [halfline] 2012-07-17 02:22:05 UTC
Created attachment 218964 [details] [review]
manager: treat non-0 exit status for required components as fail

The only exit status that truely, definitely means 'success' is 0.

Anything else is almostly certainly a failure of some sort.  For
required components, we can be extra sure that's true, so enforce
it there.

This avoids cases where exec() fails in a subshell, and other cases.
Comment 35 Ray Strode [halfline] 2012-07-17 02:23:14 UTC
Comment on attachment 218964 [details] [review]
manager: treat non-0 exit status for required components as fail

(pushed as e79b73a3)
Comment 36 André Klapper 2012-08-20 13:22:33 UTC
All patches committed.
Any specific reasons / undone work to not close this ticket?
Comment 37 Ray Strode [halfline] 2012-08-20 17:04:07 UTC
I believe there's till one patch pending that I need to look at and finish up.
Comment 38 Nick G 2013-03-13 14:04:35 UTC
I'm just wondering if this is what I'm seeing on a Fedora 18 session.

I get GDM coming up and it looks fine, then the desktop background I chose appears after logging in... then after about 30 seconds of no activity I get an "Oops something went wrong screen" telling me to logout.

Interestingly it shows a cut out on the back oops screen where the gnome-shell top bar should appear but it instead shows the wallpaper through.

I was in the middle of delivering a Linux training session which makes this look doubly bad - I can't figure out what's wrong at the moment and have taken to launching nautilus and metacity --replace from a separate tty and then using ALT+SPACE and selecting 'Close' to remove the Oops screen in order to get at a nautilus window.

I'm happy to post logs or try things as I really need to be able to use this  for work.
Comment 39 Nick G 2013-03-13 14:12:47 UTC
Created attachment 238789 [details]
var log messages

I didn't receive any a .xsession-errors file but I did see a lot about gnome-screensaver and shell stuff in my /var/log/messages so I've attached that in hope of resolving.
Comment 40 koko 2014-04-09 13:57:41 UTC
might be related to: https://bugzilla.gnome.org/show_bug.cgi?id=727817
Comment 41 c_breitf 2014-05-20 18:21:46 UTC
Version 3.12.2
when logging in with a user one gnoem wayland it saves the session for next user
this time got a blank screen after reboot on autologin different user still blank screen, had to kill xserver via tty
Comment 42 Kjartan Maraas 2017-04-06 21:32:51 UTC
Removing the GNOME 3.4 target. Is the last patch still neded?
Comment 43 André Klapper 2021-06-14 18:22:58 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version of gnome-session, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/gnome-session/-/issues/

Thank you for your understanding and your help.