GNOME Bugzilla – Bug 598211
gnome-session crashed with SIGSEGV in SmsDie()
Last modified: 2018-05-27 19:58:20 UTC
Several Ubuntu users seem to be reporting this crash at the moment:
+ Trace 218254
Unfortunately a few symbols are missing, but it sort-of shows what is going on
This is apparently due to an unsafe downcast in xsmp_stop() function. I just replaced the C cast, by a GSM_XSMP_CLIENT(client), and now it's works just fine here. See the attached patch. Regards, Romain.
Created attachment 145330 [details] [review] patch which solves the problem
Created attachment 145331 [details] Complete backtrace
Good catch there. Thanks! We will likely apply this in Ubuntu
Hmmm, is that really the issue, or is it just a coincedence that it worked after? This type of cast is used elsewhere throughout gnome-session without any issues. I'm not saying that your patch isn't correct.
mhhh... I had a quick look at gobject code, in order to understand in more details how cast are handled.. and now... I've a doubt... a GObject cast is just safe because it ensures that the cast is possible and has a sense... (and can be used dynamically) but then re-downcast the GTypeInstance to a c type... so finally why it's works ? ... weird stuff... I meant its works all the time now for me. (and before this patch its failed on each shutdown/reboot/logout) a gnome-session dev around to confirm/explain or fix the issue ?
@Chris: Could you test this patch ?
I can't test it as I've never recreated it, and the downstream reporters aren't being that responsive to my request for information
Okay, I can confirm this patch is wrong, its fails again here, I've exactly the same backtrace, but now I've some warnings : Oct 14 00:17:48 genbox gnome-session[3708]: WARNING: Unable to query client: Client is not registered Oct 14 00:17:48 genbox gnome-session[3708]: WARNING: Unable to query client: Client is not registered weird... I'm sorry again :)
Created attachment 145401 [details] [review] throw an error in xsmp_stop() when client->priv->conn is NULL There is probably a cleaner way to solve the issue (for example why client->priv->conn is NULL, and what's the origin ?) But its should solves **temporarily** the problem and avoids the segfault. xsmp_stop() must throw an error when client->priv->conn is NULL, imho...
I found a weird stuff : Apparently when I set my autostart apps (System-> Startup applications) as default (only gnome apps) I've this reply with dbus : mrpouet@genbox ~ % dbus-send --session --dest=org.gnome.SessionManager --type=method_call --print-reply=yes /org/gnome/SessionManager org.gnome.SessionManager.GetClients array [ /org/gnome/SessionManager/Client1 /org/gnome/SessionManager/Client2 /org/gnome/SessionManager/Client3 /org/gnome/SessionManager/Client4 /org/gnome/SessionManager/Client5 /org/gnome/SessionManager/Client6 /org/gnome/SessionManager/Client7 /org/gnome/SessionManager/Client8 /org/gnome/SessionManager/Client9 ] And seriously all work just fine. Now assuming I want to start fusion-icon or cairo-dock at startup, I've this reply : mrpouet@genbox ~ % dbus-send --session --dest=org.gnome.SessionManager --type=method_call --print-reply=yes /org/gnome/SessionManager org.gnome.SessionManager.GetClients array [ /org/gnome/SessionManager/Client16384 /org/gnome/SessionManager/Client1 /org/gnome/SessionManager/Client2 /org/gnome/SessionManager/Client3 /org/gnome/SessionManager/Client4 /org/gnome/SessionManager/Client5 /org/gnome/SessionManager/Client6 /org/gnome/SessionManager/Client7 /org/gnome/SessionManager/Client8 /org/gnome/SessionManager/Client9 ] there are some clients, with strange id (Client16384), and for each client with a strange id, a call to GetAppId() or GetUnixProcessId() just fails (using d-feet and dbus). And apparently when I've these strange clients, gnome-session segfaults because its unable to authenticate them... but as I said using default startup applications (so gnome applications) all work fine, and logout/reboot/shutdown are fast.
that points at a lack of robustness in handling misbehaving clients somewhere inside gnome-session. Would be good to see what gnome-session puts into /var/log/messages when that happens. To turn on debug output, use 'killall -USR1 gnome-session'
I "re-started" my /var/log/messages with logrotate especially for this bug :) .I compressed it (24Mb -> 16 K), see attachment. the log contains : - the complete session_end phase (I clicked on logout) in debug mode. - the full backtrace handled by gdm (containing the backtrace of gnome-session too) ask me for more details ;)
Created attachment 145441 [details] the full session_end phase using debug mode
Apparently the problem is caused by metacity, in my case I use compiz, so I've a different WindowManager : - Because there aren't keys X-GNOME-AutoRestart in /usr/share/applications/metacity.desktop, egg_desktop_file_get_boolean will always return FALSE for this key (so for this part it's completely correct) - Then have a look in gnome-session/gsm-manager.c : function _disconnect_client() : app_restart = gsm_app_peek_autorestart (app); client_restart_hint = gsm_client_peek_restart_style_hint (client); /* allow legacy clients to override the app info */ if (! app_restart && client_restart_hint != GSM_CLIENT_RESTART_IMMEDIATELY) { g_debug ("GsmManager: autorestart not set, not restarting application"); goto out; } As I said we're sure that app_restart is FALSE, in my case g_debug message is NEVER displayed in my logs. But when I set debug mode from my session (using compiz), I can see that gnome-session starts/stops metacity infinitely and I don't see "GsmManager: autorestart not set, not restarting application". (I searched it with grep,emacs,gedit => nothing) ==> In other words, client_restart_hint is ALWAYS equals to GSM_CLIENT_RESTART_IMMEDIATELY for metacity client... which is very annoying in the case we don't use metacity as default WM. ==> This stuff also explains why there is many Clients with strange ids. @Matthias : Finally this is a gnome-session issue or a metacity issue ? (however gnome-session shoudn't crash, and shound't "freeze" ...)
The metacity issue is already reported at bug 588119, but I'm not convinced that issue is related to this crash (although I've not really had time to look at this in any great depth yet). We already carry a patch to fix the Metacity issue in Ubuntu, and still see users reporting this crash.
I didn't say that I'm sure that issue is related to this crash but it should be related... When I use metacity as WM, all work just fine here (no floods in syslogs, no freezes and no crashes), however see in attachment, I attached a patch to **avoid** the segfault (by throwing an error in xsmp_stop() ), but as I said it's probably not perfect. The segfault should be avoided anyway, but as Matthias said that points at a lack of robustness in handling misbehaving clients somewhere inside gnome-session.
PS: If you prefer it's just an ascertainment
I understand what is happening here now, and your earlier patch is probably enough to fix the issue. What is happening is that a new client (probably metacity in your case) is opening an ICE connection in the GSM_MANAGER_PHASE_END_SESSION phase, which causes a new GsmXSMPClient to be added to the client store. The GSM_MANAGER_PHASE_EXIT phase then begins before the client has had a chance to establish a xsmp connection, which means that client->priv->conn will not be initialized at the point that xsmp_stop is called on the new unregistered client. This is easily reproducible by running something that uses xsmp (such as Metacity) in GDB, and breaking on IceProtocolSetup. Once this call has been reached (and the client interrupted at this point), logging out of your session will make gnome-session crash with 100% repeatability.
The real fix is probably here: gsm-xsmp-server.c: * FIXME: it would probably make more sense to not create a * GsmXSMPClient object until accept_xsmp_connection, below (and to do * the timing-out here in xsmp.c). But that is 2.29 material. The patch above should work ok in the meantime.
Agreed, that would probably make more sense in the longer term. For now, we've applied the patch in Ubuntu. Thanks!
Another strange thing is sometimes we've some "zombie" Clients (ie: An empty appID and a PID == 0) in this case the timeout between "ending phase END SESSION" and "ending phase EXIT" is executed but that all ! finally we've to wait until the timeout is executed , in other words we've to wait 10s for some logout/reboot/shutdown , but that's a random problem. So now I set the debugging mode for each new connection on gnome. And I've found a strange thing Matthias, see yourself : gnome-session[2352]: DEBUG(+): GsmXsmpServer: accept_ice_connection() gnome-session[2352]: DEBUG(+): GsmXSMPClient: Setting up new connection gnome-session[2352]: DEBUG(+): GsmXSMPClient: New client '0x703290 []' gnome-session[2352]: DEBUG(+): GsmStore: Adding object id /org/gnome/SessionManager/Client33 to store gnome-session[2352]: DEBUG(+): GsmManager: Client added: /org/gnome/SessionManager/Client33 gnome-session[2352]: DEBUG(+): GsmXsmpServer: ice_io_error_handler (0x6b4010) gnome-session[2352]: DEBUG(+): GsmXSMPClient: IceProcessMessagesIOError on '0x703290 []' gnome-session[2352]: DEBUG(+): GsmXSMPClient: client_protocol_timeout for client '0x703290 []' in ICE status 3 gnome-session[2352]: DEBUG(+): GsmXsmpServer: ice_io_error_handler (0x7048a0) gnome-session[2352]: DEBUG(+): GsmXSMPClient: IceProcessMessagesIOError on '0x7033d0 [gnome-terminal 10d37eee6a7b72628612562143824376500000023520022]' gnome-session[2352]: DEBUG(+): GsmManager: disconnect client gnome-session[2352]: DEBUG(+): GsmManager: disconnect client: /org/gnome/SessionManager/Client9 gnome-session[2352]: DEBUG(+): GsmManager: unable to find application for client - not restarting gnome-session[2352]: DEBUG(+): GsmStore: Unreffing object: 0x7033d0 gnome-session[2352]: DEBUG(+): GsmManager: Client removed: /org/gnome/SessionManager/Client9 gnome-session[2352]: DEBUG(+): GsmClient: disposing /org/gnome/SessionManager/Client9 gnome-session[2352]: DEBUG(+): GsmXSMPClient: xsmp_finalize (0x7033d0 [gnome-terminal 10d37eee6a7b72628612562143824376500000023520022]) ---- The IceConnection is accepted by accept_ice_connection, which creates a new GsmXsmpClient with the IceConn, then the client is added in the GsmStore of GsmXsmpServer. gsm_store_add() emits the "ADDED" signal, then gsm-manager.c:on_store_client_added() is executed... then this function connects some signal handler... BUT now an IceError is throwed, handled by GsmXsmpClient itself. With simple words what is the default behaviour of gsm_xsmp_client.c:client_iochannel_watch() ? it changes the status of the client, then emits the "disconnect" client... BUT the "disconnect" signal handler isn't connect yet (made by gsm-manager on a new xsmp registration) => so the client is dead but not removed from the table... I Wrote a patch to fix that and it seems to work, I meant I had one zombie client per day... and since 1.5 days nothing... as a FIXME said I just connected the "on_client_disconnect" handler to the "disconnect" handler in gsm-manager.c:on_store_client_added(). You'll find the patch in attachment, it's still theoric but I'm pretty sure that it should work... (I meant the sighandler should be executed in this case...) I hope it will help you :) Regards, Romain
Created attachment 146113 [details] [review] gnome-session-2.28.0-do-not-keep-zombie-clients.patch
I also thought that would be nice to move the GsmXsmpClient instance out of accept_ice_connection() but some changes must to be done... for a release bugfix I thought too much code would have to change... imho but probably I'm wrong :)
PS: client_protocol_timeout for client '0x703290 []' does exactly the same thing as gsm_xsmp_client.c:client_iochannel_watch() ;)
My patch works, see the following proof : gnome-session[1684]: DEBUG(+): GsmXsmpServer: accept_ice_connection() gnome-session[1684]: DEBUG(+): GsmXSMPClient: Setting up new connection gnome-session[1684]: DEBUG(+): GsmXSMPClient: New client '0x704470 []' gnome-session[1684]: DEBUG(+): GsmStore: Adding object id /org/gnome/SessionManager/Client14 to store gnome-session[1684]: DEBUG(+): GsmManager: Client added: /org/gnome/SessionManager/Client14 gnome-session[1684]: DEBUG(+): GsmXsmpServer: ice_io_error_handler (0x711570) gnome-session[1684]: DEBUG(+): GsmXSMPClient: IceProcessMessagesIOError on '0x704470 []' gnome-session[1684]: DEBUG(+): GsmManager: disconnect client gnome-session[1684]: DEBUG(+): GsmManager: disconnect client: /org/gnome/SessionManager/Client14 gnome-session[1684]: DEBUG(+): GsmManager: unable to find application for client - not restarting gnome-session[1684]: DEBUG(+): GsmStore: Unreffing object: 0x704470 gnome-session[1684]: DEBUG(+): GsmManager: Client removed: /org/gnome/SessionManager/Client14 gnome-session[1684]: DEBUG(+): GsmClient: disposing /org/gnome/SessionManager/Client14 gnome-session[1684]: DEBUG(+): GsmXSMPClient: xsmp_finalize (0x704470 []) -------------------- Client4 was a zombie client, now its disconnected and removed from the table ;) I've no zombie clients anymore :)
Comment on attachment 145401 [details] [review] throw an error in xsmp_stop() when client->priv->conn is NULL I committed this, even if this is not the real fix.
Comment on attachment 146113 [details] [review] gnome-session-2.28.0-do-not-keep-zombie-clients.patch As discussed on IRC with Romain, this is not the right way to fix this. Romain will try to fix the XSMP server to implement what Matthias suggested.
Created attachment 152834 [details] [review] gnome-session-xsmp-server-ice-conn.patch
As discussed on IRC with Vincent and Matthias, see the above patch. It works like a charm since 3 days, it "solves" the FIXME said by Matthias (above), and zombie clients are not present anymore :) This patch is probably not perfect ;)
I've committed your patch, with some slight changes. The most visible one is that the timeout is not removed in auth_iochannel_watch, but when destroying the data. That should not affect the behavior, though.
This appears to have been recently assigned the id CVE-2017-11171: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-11171