After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 598211 - gnome-session crashed with SIGSEGV in SmsDie()
gnome-session crashed with SIGSEGV in SmsDie()
Status: RESOLVED FIXED
Product: gnome-session
Classification: Core
Component: gnome-session
2.28.x
Other Linux
: Normal critical
: ---
Assigned To: Session Maintainers
Session Maintainers
Depends on:
Blocks:
 
 
Reported: 2009-10-12 21:44 UTC by Chris Coulson
Modified: 2018-05-27 19:58 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch which solves the problem (824 bytes, patch)
2009-10-13 08:55 UTC, Romain Perier
none Details | Review
Complete backtrace (9.61 KB, text/plain)
2009-10-13 08:56 UTC, Romain Perier
  Details
throw an error in xsmp_stop() when client->priv->conn is NULL (1.15 KB, patch)
2009-10-14 09:15 UTC, Romain Perier
committed Details | Review
the full session_end phase using debug mode (13.64 KB, application/x-lzma)
2009-10-14 17:34 UTC, Romain Perier
  Details
gnome-session-2.28.0-do-not-keep-zombie-clients.patch (1.09 KB, patch)
2009-10-23 16:46 UTC, Romain Perier
needs-work Details | Review
gnome-session-xsmp-server-ice-conn.patch (8.80 KB, patch)
2010-02-02 13:44 UTC, Romain Perier
committed Details | Review

Description Chris Coulson 2009-10-12 21:44:35 UTC
Several Ubuntu users seem to be reporting this crash at the moment:

  • #0 SmsDie
    at ../../src/sm_manager.c line 312
  • #1 xsmp_stop
    at gsm-xsmp-client.c line 741
  • #2 _client_stop
    at gsm-manager.c line 734
  • #3 g_hash_table_find
    from /lib/libglib-2.0.so.0
  • #4 start_phase
    at gsm-manager.c line 754
  • #5 end_phase
    at gsm-manager.c line 488
  • #6 on_client_end_session_response
    at gsm-manager.c line 2035
  • #7 g_closure_invoke
    from /usr/lib/libgobject-2.0.so.0
  • #8 ??
    from /usr/lib/libgobject-2.0.so.0
  • #9 g_signal_emit_valist
    from /usr/lib/libgobject-2.0.so.0
  • #10 g_signal_emit
    from /usr/lib/libgobject-2.0.so.0
  • #11 gsm_client_end_session_response
    at gsm-client.c line 539
  • #12 client_dbus_filter_function
    at gsm-dbus-client.c line 192
  • #13 dbus_connection_dispatch
    at dbus-connection.c line 4446
  • #14 message_queue_dispatch
    at dbus-gmain.c line 101
  • #15 g_main_context_dispatch
    from /lib/libglib-2.0.so.0
  • #16 ??
    from /lib/libglib-2.0.so.0
  • #17 g_main_loop_run
    from /lib/libglib-2.0.so.0
  • #18 IA__gtk_main
    at /build/buildd/gtk+2.0-2.18.0/gtk/gtkmain.c line 1205
  • #19 main
    at main.c line 529

Unfortunately a few symbols are missing, but it sort-of shows what is going on
Comment 1 Romain Perier 2009-10-13 08:54:40 UTC
This is apparently due to an unsafe downcast in xsmp_stop() function.
I just replaced the C cast, by a GSM_XSMP_CLIENT(client), and now it's works just fine here.

See the attached patch.

Regards,
Romain.
Comment 2 Romain Perier 2009-10-13 08:55:14 UTC
Created attachment 145330 [details] [review]
patch which solves the problem
Comment 3 Romain Perier 2009-10-13 08:56:59 UTC
Created attachment 145331 [details]
Complete backtrace
Comment 4 Chris Coulson 2009-10-13 16:57:59 UTC
Good catch there. Thanks!

We will likely apply this in Ubuntu
Comment 5 Chris Coulson 2009-10-13 17:13:52 UTC
Hmmm, is that really the issue, or is it just a coincedence that it worked after? This type of cast is used elsewhere throughout gnome-session without any issues. I'm not saying that your patch isn't correct.
Comment 6 Romain Perier 2009-10-13 20:48:28 UTC
mhhh... I had a quick look at gobject code, in order to understand in more details how cast are handled.. and now... I've a doubt...
a GObject cast is just safe because it ensures that the cast is possible and has a sense... (and can be used dynamically) but then re-downcast the GTypeInstance to a c type...
so finally why it's works ? ... weird stuff... I meant its works all the time now for me. (and before this patch its failed on each shutdown/reboot/logout)

a gnome-session dev around to confirm/explain or fix the issue ?
Comment 7 Romain Perier 2009-10-13 20:49:49 UTC
@Chris: Could you test this patch ?
Comment 8 Chris Coulson 2009-10-13 20:51:29 UTC
I can't test it as I've never recreated it, and the downstream reporters aren't being that responsive to my request for information
Comment 9 Romain Perier 2009-10-14 08:20:41 UTC
Okay, I can confirm this patch is wrong, its fails again here, I've exactly the same backtrace, but now I've some warnings :

Oct 14 00:17:48 genbox gnome-session[3708]: WARNING: Unable to query client: Client is not registered
Oct 14 00:17:48 genbox gnome-session[3708]: WARNING: Unable to query client: Client is not registered

weird... I'm sorry again :)
Comment 10 Romain Perier 2009-10-14 09:15:47 UTC
Created attachment 145401 [details] [review]
throw an error in xsmp_stop() when client->priv->conn is NULL

There is probably a cleaner way to solve the issue (for example why client->priv->conn is NULL, and what's the origin ?)
But its should solves **temporarily** the problem and avoids the segfault. xsmp_stop() must throw an error when client->priv->conn is NULL, imho...
Comment 11 Romain Perier 2009-10-14 10:02:51 UTC
I found a weird stuff :

Apparently when I set my autostart apps (System-> Startup applications) as default (only gnome apps) I've this reply with dbus :

mrpouet@genbox ~ % dbus-send --session --dest=org.gnome.SessionManager --type=method_call --print-reply=yes /org/gnome/SessionManager org.gnome.SessionManager.GetClients

   array [
      /org/gnome/SessionManager/Client1      /org/gnome/SessionManager/Client2      /org/gnome/SessionManager/Client3      /org/gnome/SessionManager/Client4      /org/gnome/SessionManager/Client5      /org/gnome/SessionManager/Client6      /org/gnome/SessionManager/Client7      /org/gnome/SessionManager/Client8      /org/gnome/SessionManager/Client9   ]

And seriously all work just fine. Now assuming I want to start fusion-icon or cairo-dock at startup, I've this reply :

mrpouet@genbox ~ % dbus-send --session --dest=org.gnome.SessionManager --type=method_call --print-reply=yes /org/gnome/SessionManager org.gnome.SessionManager.GetClients

   array [ /org/gnome/SessionManager/Client16384
      /org/gnome/SessionManager/Client1      /org/gnome/SessionManager/Client2      /org/gnome/SessionManager/Client3      /org/gnome/SessionManager/Client4      /org/gnome/SessionManager/Client5      /org/gnome/SessionManager/Client6      /org/gnome/SessionManager/Client7      /org/gnome/SessionManager/Client8      /org/gnome/SessionManager/Client9   ]

there are some clients, with strange id (Client16384), and for each client with a strange id, a call to GetAppId() or GetUnixProcessId() just fails (using d-feet and dbus). And apparently when I've these strange clients, gnome-session segfaults because its unable to authenticate them... but as I said using default startup applications (so gnome applications) all work fine, and logout/reboot/shutdown are fast.
Comment 12 Matthias Clasen 2009-10-14 15:11:45 UTC
that points at a lack of robustness in handling misbehaving clients somewhere inside gnome-session.

Would be good to see what gnome-session puts into /var/log/messages when that happens. To turn on debug output, use 'killall -USR1 gnome-session'
Comment 13 Romain Perier 2009-10-14 17:33:06 UTC
I "re-started" my /var/log/messages with logrotate especially for this bug :) .I compressed it (24Mb -> 16 K), see attachment.

the log contains :
- the complete session_end phase (I clicked on logout) in debug mode.
- the full backtrace handled by gdm (containing the backtrace of gnome-session too)

ask me for more details ;)
Comment 14 Romain Perier 2009-10-14 17:34:13 UTC
Created attachment 145441 [details]
the full session_end phase using debug mode
Comment 15 Romain Perier 2009-10-15 13:38:27 UTC
Apparently the problem is caused by metacity, in my case I use compiz, so I've a different WindowManager :

- Because there aren't keys X-GNOME-AutoRestart in /usr/share/applications/metacity.desktop, egg_desktop_file_get_boolean will always return FALSE for this key (so for this part it's completely correct)

- Then have a look in gnome-session/gsm-manager.c : function _disconnect_client() :

        app_restart = gsm_app_peek_autorestart (app);
        client_restart_hint = gsm_client_peek_restart_style_hint (client);

        /* allow legacy clients to override the app info */
        if (! app_restart
            && client_restart_hint != GSM_CLIENT_RESTART_IMMEDIATELY) {
                g_debug ("GsmManager: autorestart not set, not restarting application");
                goto out;
        }

As I said we're sure that app_restart is FALSE, in my case g_debug message is NEVER displayed in my logs.

But when I set debug mode from my session (using compiz), I can see that gnome-session starts/stops metacity infinitely and I don't see "GsmManager: autorestart not set, not restarting application". (I searched it with grep,emacs,gedit => nothing)

==> In other words, client_restart_hint is ALWAYS equals to GSM_CLIENT_RESTART_IMMEDIATELY for metacity client... which is very annoying in the case we don't use metacity as default WM.

==> This stuff also explains why there is many Clients with strange ids.

@Matthias : Finally this is a gnome-session issue or a metacity issue ? (however gnome-session shoudn't crash, and shound't "freeze" ...)
Comment 16 Chris Coulson 2009-10-15 14:27:57 UTC
The metacity issue is already reported at bug 588119, but I'm not convinced that issue is related to this crash (although I've not really had time to look at this in any great depth yet). We already carry a patch to fix the Metacity issue in Ubuntu, and still see users reporting this crash.
Comment 17 Romain Perier 2009-10-15 14:52:00 UTC
I didn't say that I'm sure that issue is related to this crash but it should be related...

When I use metacity as WM, all work just fine here (no floods in syslogs, no freezes and no crashes), however see in attachment, I attached a patch to **avoid** the segfault (by throwing an error in xsmp_stop() ), but as I said it's probably not perfect. The segfault should be avoided anyway, but as Matthias said that points at a lack of robustness in handling misbehaving clients somewhere
inside gnome-session.
Comment 18 Romain Perier 2009-10-15 14:53:51 UTC
PS: If you prefer it's just an ascertainment
Comment 19 Chris Coulson 2009-10-22 00:38:18 UTC
I understand what is happening here now, and your earlier patch is probably enough to fix the issue. What is happening is that a new client (probably metacity in your case) is opening an ICE connection in the GSM_MANAGER_PHASE_END_SESSION phase, which causes a new GsmXSMPClient to be added to the client store. The GSM_MANAGER_PHASE_EXIT phase then begins before the client has had a chance to establish a xsmp connection, which means that client->priv->conn will not be initialized at the point that xsmp_stop is called on the new unregistered client.

This is easily reproducible by running something that uses xsmp (such as Metacity) in GDB, and breaking on IceProtocolSetup. Once this call has been reached (and the client interrupted at this point), logging out of your session will make gnome-session crash with 100% repeatability.
Comment 20 Matthias Clasen 2009-10-23 14:55:00 UTC
The real fix is probably here:

gsm-xsmp-server.c: 

 * FIXME: it would probably make more sense to not create a
 * GsmXSMPClient object until accept_xsmp_connection, below (and to do
 * the timing-out here in xsmp.c).

But that is 2.29 material. The patch above should work ok in the meantime.
Comment 21 Chris Coulson 2009-10-23 16:04:50 UTC
Agreed, that would probably make more sense in the longer term. For now, we've applied the patch in Ubuntu.

Thanks!
Comment 22 Romain Perier 2009-10-23 16:45:59 UTC
Another strange thing is sometimes we've some "zombie" Clients (ie: An empty appID and a PID == 0) in this case the timeout between "ending phase END SESSION" and "ending phase EXIT" is executed but that all ! finally we've to wait until the timeout is executed , in other words we've to wait 10s for some logout/reboot/shutdown , but that's a random problem.

So now I set the debugging mode for each new connection on gnome.
And I've found a strange thing Matthias, see yourself :

gnome-session[2352]: DEBUG(+): GsmXsmpServer: accept_ice_connection()
gnome-session[2352]: DEBUG(+): GsmXSMPClient: Setting up new connection
gnome-session[2352]: DEBUG(+): GsmXSMPClient: New client '0x703290 []'
gnome-session[2352]: DEBUG(+): GsmStore: Adding object id /org/gnome/SessionManager/Client33 to store
gnome-session[2352]: DEBUG(+): GsmManager: Client added: /org/gnome/SessionManager/Client33
gnome-session[2352]: DEBUG(+): GsmXsmpServer: ice_io_error_handler (0x6b4010)
gnome-session[2352]: DEBUG(+): GsmXSMPClient: IceProcessMessagesIOError on '0x703290 []'
gnome-session[2352]: DEBUG(+): GsmXSMPClient: client_protocol_timeout for client '0x703290 []' in ICE status 3
gnome-session[2352]: DEBUG(+): GsmXsmpServer: ice_io_error_handler (0x7048a0)
gnome-session[2352]: DEBUG(+): GsmXSMPClient: IceProcessMessagesIOError on '0x7033d0 [gnome-terminal 10d37eee6a7b72628612562143824376500000023520022]'
gnome-session[2352]: DEBUG(+): GsmManager: disconnect client
gnome-session[2352]: DEBUG(+): GsmManager: disconnect client: /org/gnome/SessionManager/Client9
gnome-session[2352]: DEBUG(+): GsmManager: unable to find application for client - not restarting
gnome-session[2352]: DEBUG(+): GsmStore: Unreffing object: 0x7033d0
gnome-session[2352]: DEBUG(+): GsmManager: Client removed: /org/gnome/SessionManager/Client9
gnome-session[2352]: DEBUG(+): GsmClient: disposing /org/gnome/SessionManager/Client9
gnome-session[2352]: DEBUG(+): GsmXSMPClient: xsmp_finalize (0x7033d0 [gnome-terminal 10d37eee6a7b72628612562143824376500000023520022])
----


The IceConnection is accepted by accept_ice_connection, which creates a new GsmXsmpClient with the IceConn, then the client is added in the GsmStore of GsmXsmpServer.  gsm_store_add() emits the "ADDED" signal, then gsm-manager.c:on_store_client_added() is executed... then this function connects some signal handler... BUT now an IceError is throwed, handled by GsmXsmpClient itself.
With simple words what is the default behaviour of gsm_xsmp_client.c:client_iochannel_watch() ?
it changes the status of the client, then emits the "disconnect" client...
BUT the "disconnect" signal handler isn't connect yet (made by gsm-manager on a new xsmp registration) => so the client is dead but not removed from the table...

I Wrote a patch to fix that and it seems to work, I meant I had one zombie client per day... and since 1.5 days nothing... as a FIXME said I just connected the "on_client_disconnect" handler to the "disconnect" handler in gsm-manager.c:on_store_client_added().

You'll find the patch in attachment, it's still theoric but I'm pretty sure that it should work... (I meant the sighandler should be executed in this case...) I hope it will help you :)

Regards,
Romain
Comment 23 Romain Perier 2009-10-23 16:46:58 UTC
Created attachment 146113 [details] [review]
gnome-session-2.28.0-do-not-keep-zombie-clients.patch
Comment 24 Romain Perier 2009-10-23 16:48:48 UTC
I also thought that would be nice to move the GsmXsmpClient instance out of accept_ice_connection() but some changes must to be done... for a release bugfix I thought too much code would have to change... imho
but probably I'm wrong :)
Comment 25 Romain Perier 2009-10-23 16:50:10 UTC
PS: client_protocol_timeout for
client '0x703290 []'  does exactly the same thing as gsm_xsmp_client.c:client_iochannel_watch() ;)
Comment 26 Romain Perier 2009-10-24 11:16:31 UTC
My patch works, see the following proof :

gnome-session[1684]: DEBUG(+): GsmXsmpServer: accept_ice_connection()
gnome-session[1684]: DEBUG(+): GsmXSMPClient: Setting up new connection
gnome-session[1684]: DEBUG(+): GsmXSMPClient: New client '0x704470 []'
gnome-session[1684]: DEBUG(+): GsmStore: Adding object id /org/gnome/SessionManager/Client14 to store
gnome-session[1684]: DEBUG(+): GsmManager: Client added: /org/gnome/SessionManager/Client14
gnome-session[1684]: DEBUG(+): GsmXsmpServer: ice_io_error_handler (0x711570)
gnome-session[1684]: DEBUG(+): GsmXSMPClient: IceProcessMessagesIOError on '0x704470 []'
gnome-session[1684]: DEBUG(+): GsmManager: disconnect client
gnome-session[1684]: DEBUG(+): GsmManager: disconnect client: /org/gnome/SessionManager/Client14
gnome-session[1684]: DEBUG(+): GsmManager: unable to find application for client - not restarting
gnome-session[1684]: DEBUG(+): GsmStore: Unreffing object: 0x704470
gnome-session[1684]: DEBUG(+): GsmManager: Client removed: /org/gnome/SessionManager/Client14
gnome-session[1684]: DEBUG(+): GsmClient: disposing /org/gnome/SessionManager/Client14
gnome-session[1684]: DEBUG(+): GsmXSMPClient: xsmp_finalize (0x704470 [])

--------------------
Client4 was a zombie client, now its disconnected and removed from the table ;)

I've no zombie clients anymore :)
Comment 27 Vincent Untz 2010-01-27 15:26:23 UTC
Comment on attachment 145401 [details] [review]
throw an error in xsmp_stop() when client->priv->conn is NULL

I committed this, even if this is not the real fix.
Comment 28 Vincent Untz 2010-01-27 15:26:54 UTC
Comment on attachment 146113 [details] [review]
gnome-session-2.28.0-do-not-keep-zombie-clients.patch

As discussed on IRC with Romain, this is not the right way to fix this. Romain will try to fix the XSMP server to implement what Matthias suggested.
Comment 29 Romain Perier 2010-02-02 13:44:37 UTC
Created attachment 152834 [details] [review]
gnome-session-xsmp-server-ice-conn.patch
Comment 30 Romain Perier 2010-02-02 13:47:00 UTC
As discussed on IRC with Vincent and Matthias, see the above patch.
It works like a charm since 3 days, it "solves" the FIXME said by Matthias (above), and zombie clients are not present anymore :)


This patch is probably not perfect ;)
Comment 31 Vincent Untz 2010-03-09 00:39:55 UTC
I've committed your patch, with some slight changes. The most visible one is that the timeout is not removed in auth_iochannel_watch, but when destroying the data. That should not affect the behavior, though.
Comment 32 Alan Coopersmith 2018-05-27 19:58:20 UTC
This appears to have been recently assigned the id CVE-2017-11171:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-11171