After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 752388 - gdm doesn't always restart cleanly
gdm doesn't always restart cleanly
Status: RESOLVED FIXED
Product: gdm
Classification: Core
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: GDM maintainers
GDM maintainers
Depends on:
Blocks:
 
 
Reported: 2015-07-14 20:27 UTC by Ray Strode [halfline]
Modified: 2015-07-14 21:02 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Make gdm-session-worker exit cleanly (3.91 KB, patch)
2015-07-14 20:27 UTC, Ray Strode [halfline]
committed Details | Review
gdm-x-session: Do not cancel cancelable on SIGTERM (3.75 KB, patch)
2015-07-14 20:27 UTC, Ray Strode [halfline]
none Details | Review
gdm-x-session: kill subprocesses on sigterm (7.55 KB, patch)
2015-07-14 20:59 UTC, Ray Strode [halfline]
committed Details | Review
gdm-wayland-session: kill subprocesses on sigterm (4.68 KB, patch)
2015-07-14 21:02 UTC, Ray Strode [halfline]
committed Details | Review

Description Ray Strode [halfline] 2015-07-14 20:27:48 UTC
Sometimes when gdm is stopped, it's not done so in an orderly manner.
The main daemon process dies before all its children have been reaped.

This leads to a race that makes restart sometimes fail.
Comment 1 Ray Strode [halfline] 2015-07-14 20:27:52 UTC
Created attachment 307431 [details] [review]
Make gdm-session-worker exit cleanly

Calling gdm_session_stop_conversation() in gdm_launch_environment_stop()
sends a SIGTERM to gdm-session-worker without waiting for it to die. The
next step is calling gdm_session_close() to close the session, which
stops all conversations of that session object, sending a 2nd SIGTERM to
gdm-session-worker, this time waiting on its PID.

On gdm-session-worker side, the first SIGTERM is caught by
on_shutdown_signal(), its custom SIGTERM handler, which quits the
mainloop and unrefs the worker object. Quiting the mainloop replaces the
custom SIGTERM handler with the system default one (exit immediately).
During the worker object class finalization gdm-session-worker may
receive the 2nd SIGTERM, which leads to its immediate termination,
without waiting for its children, which in turn leads to the main gdm
process exit.

Since systemd relies on the SIGCHLD from the main gdm process to tell
when the service has stopped, this behavior breaks any unit that has a
Conflicts=gdm.service entry and relies on the X server not being around
when it is started.

This commit removes the call to gdm_session_stop_conversation() in
gdm_launch_environment_stop() and leaves it to be stopped in
gdm_session_close().

[endlessm/eos-shell#4921]
Comment 2 Ray Strode [halfline] 2015-07-14 20:27:56 UTC
Created attachment 307432 [details] [review]
gdm-x-session: Do not cancel cancelable on SIGTERM

<dsd> Jasper: can you explain the logic bug? just curious
<Jasper> dsd, GCancellable is a way of cancelling an ongoing operation.
<Jasper> dsd, so we start up a wait_async, which says "call this
         callback when the subprocess exits". And then when we get a
         SIGTERM, we fire off the cancellable, saying "no no, don't
         actually wait for to finish, let's just cancel that now"
<Jasper> dsd, the callback is called, but wait_finish returns FALSE,
         meaning that the wait was cancelled. And then the callback says
         "oh, cool, we're done here" and clears the x_session object.
<Jasper> dsd, I'd have to inspect closer to determine a proper fix
         should we still cancel the wait but not actually clear the
         object, so we wait synchronously in cleanup?)
<Jasper> Depends on what else uses the cancellable.
<dsd> ahh
<dsd> got it
<Jasper> dsd, the other thing that should fix it would be moving the
         g_clear_object above the "out" label here:
         https://git.gnome.org/browse/gdm/tree/daemon/gdm-x-session.c?h=gnome-3-16#n185
<Jasper> And doing that for all of session / bus / x subprocesses.
<Jasper> Maybe that's the cleaner fix.
<Jasper> Then we could also remove the main_loop_quit from the sigterm
         handler as well and simply relying on the cancellable.

[endlessm/eos-shell#4921]
Comment 3 Ray Strode [halfline] 2015-07-14 20:57:09 UTC
Review of attachment 307431 [details] [review]:

yea seems right, thanks for the analysis and fix
Comment 4 Ray Strode [halfline] 2015-07-14 20:58:38 UTC
Review of attachment 307432 [details] [review]:

I agree with Jasper that the second fix is better, since in theory the cancellable can be used for other things.  I don't think it's okay to drop the main_loop_quit since the handlers that call g_main_loop_quit may not be hooked up yet at the time SIGTERM is called
Comment 5 Ray Strode [halfline] 2015-07-14 20:59:40 UTC
Created attachment 307435 [details] [review]
gdm-x-session: kill subprocesses on sigterm

<dsd> Jasper: can you explain the logic bug? just curious
<Jasper> dsd, GCancellable is a way of cancelling an ongoing operation.
<Jasper> dsd, so we start up a wait_async, which says "call this
         callback when the subprocess exits". And then when we get a
         SIGTERM, we fire off the cancellable, saying "no no, don't
         actually wait for to finish, let's just cancel that now"
<Jasper> dsd, the callback is called, but wait_finish returns FALSE,
         meaning that the wait was cancelled. And then the callback says
         "oh, cool, we're done here" and clears the x_session object.
<Jasper> dsd, I'd have to inspect closer to determine a proper fix
         should we still cancel the wait but not actually clear the
         object, so we wait synchronously in cleanup?)
<Jasper> Depends on what else uses the cancellable.
<dsd> ahh
<dsd> got it
<Jasper> dsd, the other thing that should fix it would be moving the
         g_clear_object above the "out" label here:
         https://git.gnome.org/browse/gdm/tree/daemon/gdm-x-session.c?h=gnome-3-16#n185
<Jasper> And doing that for all of session / bus / x subprocesses.
<Jasper> Maybe that's the cleaner fix.

Based on a patch by João Paulo Rechi Vita <jprvita@endlessm.com>

[endlessm/eos-shell#4921]
Comment 6 Ray Strode [halfline] 2015-07-14 21:00:25 UTC
Attachment 307431 [details] pushed as 1871802 - Make gdm-session-worker exit cleanly
Attachment 307435 [details] pushed as 90a095e - gdm-x-session: kill subprocesses on sigterm
Comment 7 Ray Strode [halfline] 2015-07-14 21:02:26 UTC
Created attachment 307436 [details] [review]
gdm-wayland-session: kill subprocesses on sigterm

This is like commit 90a095e1a3ef26391da835ebe155deae26eeac5f
but for wayland sessions.

(probably should deduplicate that code at some point, but
 that's a project for another day)
Comment 8 Ray Strode [halfline] 2015-07-14 21:02:49 UTC
Attachment 307436 [details] pushed as 8f21c90 - gdm-wayland-session: kill subprocesses on sigterm