Bug 517526 – segfault in keyfile code on shutdown

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 517526 - segfault in keyfile code on shutdown


Summary:	segfault in keyfile code on shutdown


Status:	RESOLVED FIXED

Product:	gdm
Classification:	Core
Component:	general
Version:	2.20.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	GDM maintainers
QA Contact:	GDM maintainers

URL:
Whiteboard:

Duplicates:	524250 526055 532176 (view as bug list)
Depends on:
Blocks:

Reported:	2008-02-19 20:01 UTC by icare40
Modified:	2008-09-03 19:17 UTC

See Also:
GNOME target:	---
GNOME version:	2.19/2.20

Attachments
patch fixing issue (477 bytes, patch) 2008-04-30 09:53 UTC, Brian Cameron	none	Details \| Review
2nd try for a patch that fixes the problem (2.28 KB, patch) 2008-04-30 19:44 UTC, Brian Cameron	rejected	Details \| Review
patch to test (1.95 KB, patch) 2008-05-09 17:17 UTC, Brian Cameron	none	Details \| Review
updated patch to test (3.63 KB, patch) 2008-05-09 18:00 UTC, Brian Cameron	committed	Details \| Review
use g_free, not g_ptr_array_free (442 bytes, patch) 2008-06-03 10:12 UTC, Frederic Crozat	committed	Details \| Review
Fix assertion failed during shutdown process (1.11 KB, patch) 2008-08-10 18:11 UTC, Pascal Brochart	none	Details \| Review
Fix double assertion failed during shutdown process (2.71 KB, patch) 2008-08-11 12:13 UTC, Pascal Brochart	none	Details \| Review
Minor change to previous patch "Fix assertion failed during shutdown process" (2.71 KB, patch) 2008-08-11 19:42 UTC, Pascal Brochart	none	Details \| Review
Exit after launching the shutdown command (535 bytes, patch) 2008-08-29 15:00 UTC, Josselin Mouette	none	Details \| Review
Remove signal handlers in gdm_final_cleanup and exit after the reboot/shutdown sequence (1.06 KB, patch) 2008-09-03 11:14 UTC, Josselin Mouette	committed	Details \| Review

Description icare40 2008-02-19 20:01:45 UTC

Please describe the problem:
when I shutdown or reboot my computer I can read this message on tty1
Feb 18 17:42:29 laptop-cedric kernel: gdm[2461]: segfault at 10e71e08 ip b78140d5 sp bfb96310 error 4 in libglib-2.0.so.0.1400.5[b77f2000+a0000]
Feb 18 22:16:35 laptop-cedric kernel: gdm[2467]: segfault at 10e71e08 ip b77d60d5 sp bfc59810 error 4 in libglib-2.0.so.0.1400.5[b77b4000+a0000]
Feb 19 13:06:17 laptop-cedric kernel: gdm[2483]: segfault at 10e71e08 ip b78730d5 sp bf8f4d90 error 4 in libglib-2.0.so.0.1400.5[b7851000+a0000]


Steps to reproduce:
every time when a shutdown or reboot

Actual results:


Expected results:


Does this happen every time?


Other information:
I use debian testing whit 2.6.24 git 22 kernel

Comment 1 Brian Cameron 2008-02-19 20:21:04 UTC

What version of GDM are you using?  Could you try to upgrade to the latest GDM 2.20.3 version if you are using an older version and see if the problem still happens.

Also, is there a stacktrace you can provide?  GDM may leave a core file in your root (/) directory or in /var/lib/gdm, or /var/lib/log/gdm.

Comment 2 icare40 2008-02-19 21:23:03 UTC

(In reply to comment #1)
> What version of GDM are you using?  Could you try to upgrade to the latest GDM
> 2.20.3 version if you are using an older version and see if the problem still
> happens.
> 
> Also, is there a stacktrace you can provide?  GDM may leave a core file in your
> root (/) directory or in /var/lib/gdm, or /var/lib/log/gdm.
> 

Yes I use the last version 2.20.3-1 but I don't find the core file!!

Comment 3 Brian Cameron 2008-02-19 21:46:54 UTC

I don't know how things work on debian, but I know on Solaris, I have to monkey around with my "coreadm" settings (/etc/coreadm) by running the coreadm tool to get system programs like GDM to generate core files.

Comment 4 Sam Morris 2008-03-04 19:25:19 UTC

Several Debian bug reports have been filed about this. The one with the most information is at <http://bugs.debian.org/459024>. I am trying to work out how to make gdm generate a core file.

Comment 5 Sam Morris 2008-03-04 20:37:40 UTC

I can't get GDM to generate a core file. I placed 'ulimit -c unlimited' in /etc/init.d/gdm, but no core file appears in /var/lib/gdm.

I also tried attaching to gdm with gdb, but when I restart having done this, gdm does not segafault! Instead, it exits normally.

Comment 6 Brian Cameron 2008-03-05 00:21:57 UTC

Looking at the log in the above bug report, I see that the last gdm messages
indicate that GDM is in the function gdm_cleanup_children.  The last message indicates that it is checking the GDM_KEY_DYNAMIC_XSERVERS.  

Look in the file daemon/gdm.c, in the function gdm_cleanup_children.  Notice
that GDM should print another debug message after the start_autopsy section.
But we don't see any further messages.

It might be useful if you were to add some additional debug messages in this function to see exactly how far in this function GDM gets before it crashes.  
This would probably highlight the problem.

Comment 7 Paul Wise 2008-03-08 07:33:38 UTC

I'm on Debian and have a similar problem; segfault at shutdown and on reload.

If I have gdm waiting for a login, then run /etc/init.d/gdm reload on a virtual console, I get a segfault on one of the 2 gdm processes running.

I was able to capture a full backtrace and a core file by running two gdb instances and attaching them to the running gdm processes. I rebuild gdm with debugging symbols, but unfortunately the Debian package doesn't seem to support building without optimisation so some parts are optimised out.

Here is the backtrace, the core file is attached.

Program received signal SIGSEGV, Segmentation fault.
slab_allocator_free_chunk (chunk_size=<value optimized out>, mem=0x80e87a8) at /tmp/buildd/glib2.0-2.14.6/glib/gslice.c:1091
1091	/tmp/buildd/glib2.0-2.14.6/glib/gslice.c: No such file or directory.
	in /tmp/buildd/glib2.0-2.14.6/glib/gslice.c

+ Trace 191719

#0 slab_allocator_free_chunk
at /tmp/buildd/glib2.0-2.14.6/glib/gslice.c line 1091
#1 magazine_cache_push_magazine
at /tmp/buildd/glib2.0-2.14.6/glib/gslice.c line 614
#2 thread_memory_magazine2_unload
at /tmp/buildd/glib2.0-2.14.6/glib/gslice.c line 744
#3 IA__g_slice_free1
at /tmp/buildd/glib2.0-2.14.6/glib/gslice.c line 868
#4 IA__g_list_free_1
at /tmp/buildd/glib2.0-2.14.6/glib/glist.c line 59
#5 g_key_file_remove_key_value_pair_node
at /tmp/buildd/glib2.0-2.14.6/glib/gkeyfile.c line 3038
#6 g_key_file_remove_group_node
at /tmp/buildd/glib2.0-2.14.6/glib/gkeyfile.c line 3100
#7 g_key_file_clear
at /tmp/buildd/glib2.0-2.14.6/glib/gkeyfile.c line 236
#8 IA__g_key_file_free
at /tmp/buildd/glib2.0-2.14.6/glib/gkeyfile.c line 704
#9 gdm_config_free
at gdm-config.c line 606
#10 gdm_daemon_config_close
at gdm-daemon-config.c line 2343
#11 gdm_final_cleanup
at gdm.c line 394
#12 gdm_restart_now
at gdm.c line 1170
#13 gdm_safe_restart
at gdm.c line 1195
#14 mainloop_sig_callback
at gdm.c line 1304
#15 ve_signal_dispatch
at ve-signal.c line 72
#16 IA__g_main_context_dispatch
at /tmp/buildd/glib2.0-2.14.6/glib/gmain.c line 2061
#17 g_main_context_iterate
at /tmp/buildd/glib2.0-2.14.6/glib/gmain.c line 2694
#18 IA__g_main_loop_run
at /tmp/buildd/glib2.0-2.14.6/glib/gmain.c line 2898
#19 main
at gdm.c line 1790

Comment 8 Paul Wise 2008-03-08 07:49:17 UTC

Core file is here since it is too big for bugzilla:

http://bonedaddy.net/pabs3/files/tmp/gdm.core

I'm using Debian GNU/Linux sid with gdm 2.20.3-2, rebuilt with debugging symbols.

Comment 9 Brian Cameron 2008-03-08 21:37:39 UTC

This seems a bit odd.  I wonder if gdm_final_cleanup is getting called more than once?  I wonder if it would help to make sure to set config->mandatory_key_file, config->default_key_file, config->custom_key_file to NULL after calling g_key_file_free?  This is in the function gdm_config_free in common/gdm-config.c

Could you try this?

Comment 10 Paul Wise 2008-03-08 23:48:00 UTC

Applied the following patch and still got a segfault.

--- gdm-2.20.3.orig/common/gdm-config.c
+++ gdm-2.20.3/common/gdm-config.c
@@ -601,12 +601,15 @@
 
 	if (config->mandatory_key_file != NULL) {
 		g_key_file_free (config->mandatory_key_file);
+		config->mandatory_key_file = NULL;
 	}
 	if (config->default_key_file != NULL) {
 		g_key_file_free (config->default_key_file);
+		config->default_key_file = NULL;
 	}
 	if (config->custom_key_file != NULL) {
 		g_key_file_free (config->custom_key_file);
+		config->custom_key_file = NULL;
 	}
 	if (config->value_hash != NULL) {
 		g_hash_table_destroy (config->value_hash);

Here is the backtrace and core file for pid 15362 (parent gdm):

http://bonedaddy.net/pabs3/files/tmp/gdm.15362.core

Continuing.

Program received signal SIGUSR1, User defined signal 1.

+ Trace 191807

Thread 3075151648 (LWP 15362)

#0 slab_allocator_free_chunk
at /tmp/buildd/glib2.0-2.14.6/glib/gslice.c line 1091
#1 magazine_cache_push_magazine
at /tmp/buildd/glib2.0-2.14.6/glib/gslice.c line 614
#2 thread_memory_magazine2_unload
at /tmp/buildd/glib2.0-2.14.6/glib/gslice.c line 744
#3 IA__g_slice_free1
at /tmp/buildd/glib2.0-2.14.6/glib/gslice.c line 868
#4 IA__g_list_free_1
at /tmp/buildd/glib2.0-2.14.6/glib/glist.c line 59
#5 g_key_file_remove_key_value_pair_node
at /tmp/buildd/glib2.0-2.14.6/glib/gkeyfile.c line 3038
#6 g_key_file_remove_group_node
at /tmp/buildd/glib2.0-2.14.6/glib/gkeyfile.c line 3100
#7 g_key_file_clear
at /tmp/buildd/glib2.0-2.14.6/glib/gkeyfile.c line 236
#8 IA__g_key_file_free
at /tmp/buildd/glib2.0-2.14.6/glib/gkeyfile.c line 704
#9 gdm_config_free
at gdm-config.c line 607
#10 gdm_daemon_config_close
at gdm-daemon-config.c line 2343
#11 gdm_final_cleanup
at gdm.c line 394
#12 gdm_restart_now
at gdm.c line 1170
#13 gdm_safe_restart
at gdm.c line 1195
#14 mainloop_sig_callback
at gdm.c line 1304
#15 ve_signal_dispatch
at ve-signal.c line 72
#16 IA__g_main_context_dispatch
at /tmp/buildd/glib2.0-2.14.6/glib/gmain.c line 2061
#17 g_main_context_iterate
at /tmp/buildd/glib2.0-2.14.6/glib/gmain.c line 2694
#18 IA__g_main_loop_run
at /tmp/buildd/glib2.0-2.14.6/glib/gmain.c line 2898
#19 main
at gdm.c line 1790

Couldn't get registers: No such process.
Continuing.
Couldn't get registers: No such process.
The program is running.  Quit anyway (and detach it)? (y or n) Not confirmed.
Continuing.
Couldn't get registers: No such process.
The program being debugged has been started already.
Start it from the beginning? (y or n) Program not restarted.
Continuing.
Couldn't get registers: No such process.
The program is running.  Quit anyway (and detach it)? (y or n) Detaching from program: /usr/sbin/gdm, process 15363
Quitting: ptrace: No such process.

Comment 11 Ray Strode [halfline] 2008-03-09 04:54:23 UTC

can you 

export G_SLICE=always-malloc

before starting gdm and get a backtrace again?  gslice sometimes makes gdb give bogus backtraces.

Comment 12 Paul Wise 2008-03-09 06:50:56 UTC

I couldn't reproduce it after restarting it with G_SLICE=always-malloc. After restarting it without G_SLICE=always-malloc I could reproduce it again.

Comment 13 Sam Morris 2008-03-09 08:59:14 UTC

How about G_SLICE=debug-blocks ?

Comment 14 Paul Wise 2008-03-10 04:06:21 UTC

Cannot even start GDM when I use G_SLICE=debug-blocks:

Starting GNOME Display Manager: gdm
GSlice: MemChecker: attempt to release non-allocated block: 0x80f44d8 size=12
/lib/lsb/init-functions: line 53:  4345 Aborted                 /sbin/start-stop-daemon --start --nicelevel 0 --quiet --exec /usr/sbin/gdm --oknodo --pidfile /dev/null --

Comment 15 Brian Cameron 2008-03-10 05:47:46 UTC

Thanks for all the information, and for testing the patch I suggested. 
The gdb output you provided shows two stack traces.  The first one (with the SIGUSR1) is with the keyfile issue.  I wonder if this is a red herring.  Perhaps the real issue is with the gdm_fdgetc call from the pam conversation function.

This seems odd.  This code is normally called when the daemon is trying to talk to the greeter, as if the daemon thinks it needed to restart the greeter, but this might be failing because the system is really shutting down.

If so, this might be related to bug #471479.  Not sure.

Comment 16 Paul Wise 2008-03-10 06:09:50 UTC

Wouldn't the second one be the current gdm being requested to shut down, so a new instance can be started? It is a SIGTERM after all.

Comment 17 Brian Cameron 2008-03-10 06:18:37 UTC

Good catch.  I think you are right.  I'm just confused why the code would be crashing in the memory handling when we free the keyfile.  Looking at the code, I just don't see why this would happen...unless there is a bug in the glib gkeyfile stuff.

Comment 18 Brian Cameron 2008-04-06 23:53:03 UTC

*** Bug 524250 has been marked as a duplicate of this bug. ***

Comment 19 Brian Cameron 2008-04-06 23:54:04 UTC

I suspect this bug might be related to bug #471479, but not sure.

Comment 20 Chris Tillman 2008-04-28 05:21:55 UTC

I noticed something that may contribute to an understanding here. That is, that when users which were created post-installation are logged in, and then log out, GDM reliably logs out and redisplays the login screen. At least on my machine, it's only the _first_ user created on my system, which was created within the Debian installer, that crashes on log out or on switch to one of the text consoles. I even created another user and moved all the files from the old one into the new one, resetting group and ownership, and the new user never has a problem when logging out. The first user still does, almost every time. So this may point to some sort of permissions error, where the first user created has a special type of access that gets her into trouble.

Comment 21 Paul Wise 2008-04-28 05:42:50 UTC

Chris: ah, problem with that theory is that the gdm crashes I got were when no user was logged in through gdm, and I did /etc/init.d/gdm reload as root from a virtual console.

Comment 22 Brian Cameron 2008-04-29 00:26:28 UTC

*** Bug 526055 has been marked as a duplicate of this bug. ***

Comment 23 Brian Cameron 2008-04-30 09:53:16 UTC

Created attachment 110150 [details] [review]
patch fixing issue


I was able to spend some time looking into this problem, and I think the attached patch should fix the problem.  Could other people who are experiencing this problem test the attached patch and let me know if it also resolves their problems?

Thanks.

Comment 24 Paul Wise 2008-04-30 10:40:25 UTC

I still get the issue (segfault on /etc/init.d/gdm reload) when applying the patch to Debian's gdm 2.20.5-1.

I note that Debian has a lot of patches to gdm, maybe it is one of those?

Comment 25 Brian Cameron 2008-04-30 19:43:10 UTC

Okay, I think I understand what is going on here.  I recently discovered this problem is also happening on Solaris, which makes it easier to debug.

It seems what is happening is that GDM is calling gdm_final_cleanup.  Then while GDM is in the middle of freeing the memory, it receives a signal, such as a 
SIGABT which causes the main_daemon_abrt function to be called.  Note that
main_daemon_abrt calls gdm_final_cleanup again.  Then when it tries to free
the keyfile memory again, it seg-faults.

I have written a second patch which I think fixes the problem.  Using this patch I no longer see core files on Solaris.  This patch sets local variables equal to
the memory that we intend to free, and then sets the structure variables to NULL
before doing the free.  This way if the function is called again, we do not try
to free the same data structures again.

This function could probably be made more thread safe.  If someone wants to
spend further time making this code smarter, I'd be happy to accept a better
patch.  However, this patch at least seems to fix GDM so it does not crash
consistently on shutdown.

Could you please test this patch and let me know if it works better.

Comment 26 Brian Cameron 2008-04-30 19:44:41 UTC

Created attachment 110185 [details] [review]
2nd try for a patch that fixes the problem


Note I had previously committed my last patch, which didn't work.  I backed out that modification from the GDM 2.20 branch and have committed this patch instead.

Comment 27 Paul Wise 2008-05-01 02:22:43 UTC

Wouldn't a mutex be the right way to fix this?

Comment 28 Brian Cameron 2008-05-01 04:25:58 UTC

To be honest, I'm not really sure what the "right" way to fix this problem would be.  The complications of writing thread-safe code that needs to work within the context of signal handlers has been an ongoing source of problems within GDM.

I realize that my patch isn't that great, so if you or anybody else wants to provide a better patch that would be great too.  However, I think in the meantime  my patch should at least make it stop crashing so consistently.

Can people verify that my patch improves the situation.  That way we can at least be more confident that we've identified the source of the problem.

Comment 29 Pascal Brochart 2008-05-04 18:49:40 UTC

I try the 2nd patch and it doesn't work for me (linux debian x86 with gdm 2.20.5).
There is always a segfault on shutdown/restart.

Comment 30 Paul Wise 2008-05-05 03:40:23 UTC

The second patch prevents Debian gdm 2.20.5-1 from segfaulting on /etc/init.d/gdm reload but gdm exits instead of restarting when /etc/init.d/gdm reload is run.

Comment 31 Brian Cameron 2008-05-05 16:55:35 UTC

Pascal, did you make sure to restart GDM by running gdm-restart or rebooting after updating GDM with the patch?  If so, and it is still crashing, then could you please provide a stack trace and any other additional information about the segfault that you continue to see.  To be honest, I am a bit mystified if you are seeing a consistent segfault with the same stack trace after applying the patch.

Paul, I am glad to hear that the patch fixes the core dumping issue.  Could you explain what "/etc/init.d/gdm reload" does?  The /etc/init.d/gdm script is not a 
part of GDM, and is likely something added by your distro.  I use Solaris, which does not use any /etc/init.d/gdm scripts so I am unfamiliar with it.  Does it just run gdm-restart?  If so, I am not sure why it is broken, gdm-restart seems 
to work reasonably for me.

Note, I do notice that gdm-restart does not always behave so well immediately after installing a new version of GDM.  I notice I generally have to reboot once after installing a new version before gdm-restart works right.

At any rate, Paul, could you provide more information about what specifically isn't working when you try to reload?  Perhaps turn on GDM debug and share the output, that might help.

Comment 32 Paul Wise 2008-05-06 03:22:56 UTC

/etc/init.d/gdm reload sends the USR1 signal to the gdm process so that it reloads its configuration and restarts.

I'll try to get debug output for you later today.

PS: Debian gdm has a several patches that may be interfering with this.

Comment 33 Pascal Brochart 2008-05-07 14:25:33 UTC

After more testing, I can say this:

I rebuild GDM-2.20.5-1 on Debian Testing/Lenny with debugging symbols and patch "gdm-try2.diff".
A segfault only occurs when GDM shutdown/restart the computer (Actions menu).
It's really strange because the segfault appears when the script "K01gdm" is not launched yet.

When the shutdown/restart is done from gnome (gnome-session), there is no segfault but only one of the two processes of GDM was stopped (with K01gdm).
The last chance script "sendsigs" launched just before the shutdown sends a signal -15 (a warning of GDM appears) and after a KILL -9 (the second processe is killed).

And when I run "init 6" or "init 0" (on TTY or Gnome Terminal), GDM is stopped properly.

For the moment, I can't get a backtrace but I will try as soon as possible.

Comment 34 Pascal Brochart 2008-05-08 22:43:15 UTC

Here is the backtrace (when shutdown/restart is done from GDM):

Program received signal SIGSEGV, Segmentation fault.

+ Trace 197269

Thread 3075020576 (LWP 2831)

#0 gdm_config_lookup_entry
at gdm-config.c line 725
#1 gdm_daemon_config_key_to_string
at gdm-daemon-config.c line 612
#2 gdm_daemon_config_key_to_string_per_display
at gdm-daemon-config.c line 556
#3 gdm_daemon_config_get_value_bool_per_display
at gdm-daemon-config.c line 467
#4 mainloop_sig_callback
at gdm.c line 936
#5 g_main_context_dispatch
from /usr/lib/libglib-2.0.so.0
#6 ??
from /usr/lib/libglib-2.0.so.0
#7 ??
#8 ??
#0 gdm_config_lookup_entry
at gdm-config.c line 725

Comment 35 Brian Cameron 2008-05-09 16:49:01 UTC

*** Bug 532176 has been marked as a duplicate of this bug. ***

Comment 36 Brian Cameron 2008-05-09 17:17:32 UTC

Created attachment 110654 [details] [review]
patch to test

That looks like a new issue.  I notice that the code is always calling gdm_daemon_config_get_value_bool_per_display regardless of the signal, or whether it will use the variable.  Since mainloop_sig_callback calls 
gdm_cleanup_children in a while() loop, it is probably bad to always refer to the configuration data in the signal handler - unless dealing with a signal where we are confident that GDM isn't in the process of going down.

I am providing a patch which moves the call to get the sysmenu configuration option so it is only done in the one signal cases which actually seems to need it (when processing chooser).  In the cases when GDM receives this signals, the configuration should be okay.

I removed the belt-and-braces code that tries to avoid remanaging the display if the configuration is not set up right.  There shouldn't be any harm in restarting the GUI even if the configuration isn't right.  If, for example, the reboot configuration command is NULL, this would just mean that the GUI would restart and the reboot wouldn't happen.

Could you test this patch and see if it makes things work better?

Comment 37 Brian Cameron 2008-05-09 18:00:19 UTC

Created attachment 110658 [details] [review]
updated patch to test


That last patch is wrong.  After reviewing the code a bit more, it is necessary to reset status from REBOOT/HALT/SUSPEND to REMANAGE if the configuration doesn't support reboot/halt/suspend.

However, this patch is smarter and only accesses the configuration values in the reboot/halt/suspend/chooser cases.  I think this may fix your crashing issue. 
Can you test it?

If this doesn't fix the crashing issue, it may be necessary to cache the various configuration options needed in the signal handlers into globals or something so we don't have to worry about them getting freed out from under us during the shutdown sequence.

Comment 38 Pascal Brochart 2008-05-09 21:10:18 UTC

The last patch works fine for me, good !
With this one and "gdm-try2.diff",  there is no more segfault.

Comment 39 Brian Cameron 2008-05-09 21:26:58 UTC

Great. I committed the patch upstream, so hopefully this issue is now fixed.  I'm marking this bug as fixed.  Reopen or file a new bug if you find more issues.
I will likely do a new GDM 2.20 release with these fixes since these seem to be causing users a lot of issues.

Comment 40 Josselin Mouette 2008-05-11 00:29:07 UTC

The patch fixes some of the issues (especially the SIGABRT upon shutdown) but I’m still getting several segfaults during the shutdown process.

Comment 41 Pascal Brochart 2008-05-11 18:35:55 UTC

The patch on the file "common/gdm-config.c" is missing in GDM-2.20.5-2/debian unstable (it resolves at least one segfault).
But there are still problems...

Ex: Sending a SIGTERM to GDM, the following error appears:

g_hash_table_lookup_extended: assertion `hash_table != NULL'

Comment 42 Brian Cameron 2008-05-12 18:21:43 UTC

Pascal & Josselin:  Could you please provide more detail on the segfault issues you are seeing.  Could you both provide a stacktrace?  I'm reopening this bug until we get more clarity that the issues are actually resolved.

By the way, I am going to do a 2.20.6 release since there are a lot of fixes that I'd like to make more available, so it might be good to check again with the new release when it comes out.  That way, we make sure that everybody is testing with the same codebase/patches, etc.

Comment 43 Frederic Crozat 2008-06-03 10:03:53 UTC

I can confirm gdm 2.20.6 fixes the crash but I'm sometime getting the following assertion when stopping it :

***MEMORY-ERROR***: gdm-binary[5799]: GSlice: assertion failed: sinfo->n_allocated > 0

Program received signal SIGABRT, Aborted.
0xffffe410 in __kernel_vsyscall ()

+ Trace 199468

#0 __kernel_vsyscall
#1 raise
from /lib/i686/libc.so.6
#2 abort
from /lib/i686/libc.so.6
#3 mem_error
at gslice.c line 1197
#4 slab_allocator_free_chunk
at gslice.c line 1079
#5 magazine_cache_push_magazine
at gslice.c line 614
#6 thread_memory_magazine2_unload
at gslice.c line 744
#7 IA__g_slice_free1
at gslice.c line 868
#8 IA__g_list_free_1
at glist.c line 78
#9 g_key_file_remove_key_value_pair_node
at gkeyfile.c line 3058
#10 g_key_file_remove_group_node
at gkeyfile.c line 3120
#11 g_key_file_clear
at gkeyfile.c line 236
#12 IA__g_key_file_free
at gkeyfile.c line 705
#13 gdm_config_free
at gdm-config.c line 639
#14 gdm_daemon_config_close
at gdm-daemon-config.c line 2362
#15 mainloop_sig_callback
at gdm.c line 1306
#16 IA__g_main_context_dispatch
at gmain.c line 2009
#17 g_main_context_iterate
at gmain.c line 2642
#18 IA__g_main_loop_run
at gmain.c line 2850
#19 main
at gdm.c line 1816
#20 __libc_start_main
at libc-start.c line 220

Comment 44 Frederic Crozat 2008-06-03 10:12:33 UTC

Created attachment 112037 [details] [review]
use g_free, not g_ptr_array_free

this patch fixes invalid memory free. It allows gdm to be run with G_SLICE=debug-blocks (which was triggering the error) and fixes assertion I posted some minutes ago.

Comment 45 Brian Cameron 2008-06-03 16:44:30 UTC

Thanks for the patch.  This is now committed to the 2.20 branch.  

So, is anybody else experiencing similar crashing or assertion problems on exit.  Hopefully we have now fixed all the problems in this area.  If I don't hear any more issues, I will close this bug in a week or so.

Comment 46 Josselin Mouette 2008-06-10 20:11:02 UTC

Even with 2.20.6 and Frédéric’s patch, I still get an invalid free on shutdown. However simply stopping gdm does not seem to trigger it, so I can’t get a stack trace.

Comment 47 Brian Cameron 2008-06-10 21:14:08 UTC

That's disappointing.  I'm not seeing any more issues myself using the latest GDM.  Hopefully someone who is seeing this problem can provide a stack trace.
Otherwise, it's really hard to debug.

On Solaris, you need to configure coreadm to get core files from system processes like GDM.  You might need to do something similar on your system.

Comment 48 bugreports 2008-06-14 15:26:43 UTC

I am also still seeing crashes on shutting down the machine were gdm either hangs or crashes. This is on debian sid 2.20.6 + patch + own compile - no use.

Comment 49 Josselin Mouette 2008-06-17 15:49:15 UTC

Also, something that I suspected and that was confirmed by another user: the crash does not occur at the time of the gdm shutdown, but later in the shutdown process, when init kills all remaining processes. Which is probably why there is no core file created.

The error message is:

Asking all remaining processes to terminate...done.
gdm[3459]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion 'hash_table != NULL' failed
gdm[3459]: WARNING: Request for invalid configuration key xdmcp/Enable=false
*** glibc detected *** /usr/sbin/gdm: double free or corruption (!prev): 0x0000000000664cf0 ****

Comment 50 Brian Cameron 2008-06-17 16:29:49 UTC

I notice that there are a few calls where GDM_KEY_XDMCP is references in daemon/gdm.c.  One in gdm_final_cleanup, one in deal_with_x_crashes, two in main, and one in gdm_handle_message.

Based on our experience so far, I'd bet the problem is happening to the place where we call in gdm_final_cleanup.

Since we can't get a core file, then the next best thing is to debug via gdm_debug statements.  More time-consuming, but hopefully will highlight the problem.

One thing we should do is figure out exactly which reference to the xdmcp/Enable key is causing this problem.  I would add gdm_debug calls immediately before each reference to GDM_KEY_XDMCP in daemon/gdm.c, so we can identify which one is causing the problem.  Also, I notice that there are a half-dozen places where the message "Request for invalid configuration key" is shown in daemon/gdm-daemon-config.c.  If we could modify the messages so they are different (perhaps add a unique number to the end of each string), then it would be easier to figure out what is going wrong.

Could you add such gdm_debug messages to the code, and re-run, and let me know what you find?

Comment 51 Neon Knight 2008-08-04 16:31:44 UTC

My system have this bug too (gdm 2.27 gnome 2.22.3 i386 core duo2 2.24.19 kernel)
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=486657 have attached screenshot
I hope it help you, couse it is very annoying problem.

Comment 52 Brian Cameron 2008-08-04 22:46:52 UTC

Thanksf or the reference to the Debug bug report.  However, I don't see any additional information there that isn't already reported in this report.  As I say in Comment #50, someone who is experiencing this problem needs to do some debug work to track down the problem.

Comment 53 Neon Knight 2008-08-05 15:48:16 UTC

I am not guru in linux, I make my first steps. But I am J2EE developer,
If someone give me detail TO:DO list, I will try to get debug report (on ubuntu 8.04.1).

Comment 54 Morita Sho 2008-08-07 10:22:48 UTC

Hi,

I have the same problem on Debian GNU/Linux unstable.
Unfortunately, I can't use gdb because of the problem is gone on gdb.
Thus I put so many g_warning to many places for investigation.

While restarting the computer, I noticed that gdm_final_cleanup() is called twice.
The first one was called from restart_machine(), and
the second one was called from mainloop_sig_callback().

gdm_final_cleanup() has following code:
	if (gdm_daemon_config_get_value_bool (GDM_KEY_XDMCP))
		gdm_xdmcp_close ();

That seems to cause following errors since daemon_config were already free'd in first gdm_final_cleanup() call.
gdm[3459]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion 'hash_table != NULL' failed
gdm[3459]: WARNING: Request for invalid configuration key xdmcp/Enable=false

Presumably, restart_machine() executes "/sbin/reboot" and then calls gdm_final_cleanup().
Executing /sbin/reboot causes to send a SIGTERM signal to processes including gdm.
When gdm has got a SIGTERM, mainloop_sig_callback() will be called to handle the signal.
mainloop_sig_callback() calls gdm_final_cleanup() then exit(EXIT_SUCCESS).

Comment 55 Brian Cameron 2008-08-07 21:41:21 UTC

Oh, thanks for tracking this down.  This does sound like the origin of the many problems we've fixed in this bug report.  I think we should remove the call from restart_machine and just let it cleanup once.  

Would you be able to test this small change, and let us know if it solves your problem?  If so, I can put this into the 2.20 branch and re-spin a new release.

Comment 56 Morita Sho 2008-08-08 01:01:24 UTC

I think we should also consider about halt_machine since it calls gdm_final_cleanup.
I removed the call for gdm_final_cleanup from restart_machine and halt_machine and I tested shutdown/restart repeatedly.
As a result, no error messages is shown in the console nor syslog while shutdown/restarting.
The problem has been solved for me by this small changes.

Comment 57 Pascal Brochart 2008-08-10 18:11:57 UTC

Created attachment 116298 [details] [review]
Fix assertion failed during shutdown process

It does not work for me, gdm freezes during the shutdown process.
However, I propose a patch which resolved for me this problem and one other (here is the output of daemon.log):


Aug 10 18:45:36 debian init: Switching to runlevel: 6
Aug 10 18:45:36 debian gdm-binary[3073]: CRITICAL: gdm_connection_close: assertion `conn != NULL' failed
Aug 10 18:45:36 debian last message repeated 2 times
Aug 10 18:45:36 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed
Aug 10 18:45:36 debian gdm-binary[3073]: WARNING: Request for invalid configuration key xdmcp/PingIntervalSeconds=15
Aug 10 18:45:36 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed
Aug 10 18:45:36 debian gdm-binary[3073]: WARNING: Request for invalid configuration key xdmcp/PingIntervalSeconds=15
Aug 10 18:45:37 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed
Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: Request for invalid configuration key daemon/ServAuthDir=/var/lib/gdm
Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: gdm_slave_send| : impossible d'ouvrir la FIFO
Aug 10 18:45:37 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed
Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: Request for invalid configuration key daemon/ServAuthDir=/var/lib/gdm
Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: gdm_auth_secure_display| : impossible d'ouvrir  de manière sûre
Aug 10 18:45:37 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed
Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: Request for invalid configuration key daemon/ConsoleCannotHandle=am,ar,az,bn,el,fa,gu,hi,ja,ko,ml,mr,pa,ta,zh
Aug 10 18:45:37 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed
Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: Request for invalid configuration key daemon/ConsoleNotify=true


someone can test this patch ?

Comment 58 Pascal Brochart 2008-08-10 21:22:55 UTC

This patch is wrong for the first bug because the function "mainloop_sig_callback()" do nothing when a SIGTERM signal is sent.
Maybe a solution is to call gdm_final_cleanup in "mainloop_sig_callback()" when the system is not in a Shutdown/Restart process.

Comment 59 Pascal Brochart 2008-08-11 12:13:16 UTC

Created attachment 116339 [details] [review]
Fix double assertion failed during shutdown process

I propose another patch, maybe this one is more acceptable...

Comment 60 Morita Sho 2008-08-11 13:26:45 UTC

Oops! I only tested shutdown/restart from gnome-session for my patch, forgot to test from Action menu in gdm...
When I shutdown/restart from Action menu in gdm, I see the same errors as Comment #57.

The last patch Pascal proposed works fine. I can see no errors/warnings in the console nor syslog.
Thanks for your patch.

Comment 61 Pascal Brochart 2008-08-11 19:42:47 UTC

Created attachment 116380 [details] [review]
Minor change to previous patch "Fix assertion failed during shutdown process"

I am happy to see that somebody can reproduce the second problem.
It was similar to the first, gdm_cleanup_children() tries to un-manage display and other but gdm_final_cleanup() has already done. 
I re-uploaded the patch with minor change (affection to the variable should be done before the execution of gdm_final_cleanup).

Comment 62 Brian Cameron 2008-08-11 21:41:00 UTC

I'd like to understand this code a bit more and fix it properly.

I don't really like the proposed patch.  It tried to work around these problems by setting flags and checking them.  This just moves the race condition around.
While it possibly makes the code less likely to fall into the race condition, we should just fix the code so it isn't racy.

Also, adding "in_final_cleanup" to the halt_machine, restart_machine, custom_cmd_restart functions makes no logical sense to me.  Nothing about these 
three functions has anything to do with whether we are in a "final cleanup" state or not.  You can obviously shut down GDM without having selected shutdown or reboot.

So we seem to have discovered is the following:

1) When running shutdown, reboot, suspend from the Actions Menu from the GDM 
   GUI, now gdm_final_cleanup only gets called once.  I am guessing there are
   no shutdown warnings or problems in the latest GDM 2.20 release running this 
   way.  Note that in this case, the GDM GUI exits with a return code like
   DISPLAY_HALT, and exits.  The GDM daemon picks this up when it
   notices the SIGCHLD.  I'm guessing that in this case there is no SIGTERM
   generated.
   
2) When running shutdown, reboot, suspend from the gnome-session, it causes
   gdm_final_cleanup to get called twice, causing problems.  

   I suspect the first time is in gdm_do_logout_action, called by 
   gdm_try_logout_action, called when when the GDM daemon receives the 
   SET_LOGOUT_ACTION message (refer to sup_handle_set_logout_action function).
   Then the second time is in when the TERM signal is received.

Is that what other people are seeing?  Have I summarized what is going on well?

However, perhaps I am confused.  The attached patch seems to indicate that adding racy belts and braces to gdm_cleanup_children will fix this problem. 
If my above analysis is correct, then this function is only called in case #1
which didn't have any problems before.  Or do we also get the SIGCHLD in case
#2?

Or perhaps the latest patch just works because case #2 calls
gdm_try_logout_action, which calls halt_machine which sets the 
"in_final_cleanup" flag, so that when the SIGTERM happens, it doesn't try to call gdm_final_cleanup again?  If so, then I suspect that the code changes to gdm_cleanup_children aren't really needed to make this hacky racy patch work.

Perhaps it would be better to fix case #2 so that we don't have so much complicated logic.  Why don't we just make the main daemon send a SIGTERM to
itself in this case and just let the SIGTERM handler logic just take care of
all cleanup needed?

Something that looks odd to me is the following.  I notice that the halt_machine/restart_machine/suspend_machine functions are called in 2 places: gdm_cleanup_children and gdm_do_logout_action.  gdm_try_logout_action calls 
gdm_do_logout_action.

Now note that gdm_cleanup_children has a switch statement where it will call 
functions like restart_machine, halt_machine or gdm_try_logout_action.  Then at 
the end of the function it calls gdm_try_logout_action again.  So it looks to 
me like the child handler could call these functions multiple times also.

Comment 63 Pascal Brochart 2008-08-21 14:33:15 UTC

Maybe this will be more comprehensible with this ouput from GDM-2.20.7 (original version without patchs).
Reboot from Action Menu in GDM:

Aug 21 15:16:15 debian gdm-binary[4229]: DEBUG: term_quit: Final cleanup
Aug 21 15:16:15 debian gdm-binary[4229]: DEBUG: gdm_slave_quick_exit: Will kill everything from the display
Aug 21 15:16:15 debian gdm-binary[4229]: DEBUG: Running gdm_verify_cleanup and pamh != NULL
Aug 21 15:16:15 debian gdm-binary[4229]: DEBUG: gdm_server_stop: Server for :0 going down!
Aug 21 15:16:15 debian gdm-binary[4229]: DEBUG: gdm_server_stop: Killing server pid 4237
Aug 21 15:16:16 debian gdm-binary[4229]: DEBUG: gdm_server_stop: Server pid 4237 dead
Aug 21 15:16:16 debian gdm-binary[4229]: DEBUG: gdm_slave_quick_exit: Killed everything from the display

*** Ok, no problem here...


Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: mainloop_sig_callback: Got signal 17
Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: gdm_cleanup_children: child 4229 returned 8
Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Looking up per display value for greeter/SystemMenu=true
Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: greeter/SystemMenu=true
Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: greeter/SystemMenu=true
Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Requesting group=greeter key=SystemMenu locale=(null)
Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: greeter/SystemMenu=true
Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: daemon/RebootCommand=/usr/bin/reboot;/sbin/reboot;/sbin/shutdown -r now;/usr/sbin/shutdown -r now
Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Redémarrage de l'ordinateur en cours...
Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: daemon/RebootCommand=/usr/bin/reboot;/sbin/reboot;/sbin/shutdown -r now;/usr/sbin/shutdown -r now
Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Running /sbin/shutdown -r now "Rebooted via gdm."
Aug 21 15:16:18 debian init: Switching to runlevel: 6

*** A signal 17 (SIGCHLD) is sent "to mainloop_sig_callback" and "gdm_cleanup_children" is executed.


Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: gdm_final_cleanup
Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: gdm_display_unmanage: Stopping :0 (slave pid: 0)
Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: gdm_display_unmanage: Display stopped
Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: xdmcp/Enable=false
Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: daemon/ServAuthDir=/var/lib/gdm

*** We are in "start_autopsy" label with status=DISPLAY_REBOOT.
*** So restart_machine is executed and gdm_display is stopped (gdm_final_cleanup is called by restart_machine).

Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: gdm_child_action: In remanage
Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: gdm_display_manage: Managing :0
Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: loop check: last_start 1219324566, last_loop 1219324566, now: 1219324578, retry_count: 1
Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: Forking slave process
Aug 21 15:16:18 debian gdm-binary[4265]: CRITICAL: gdm_connection_close: assertion `conn != NULL' failed
Aug 21 15:16:18 debian last message repeated 2 times
Aug 21 15:16:18 debian gdm-binary[4265]: DEBUG: Attempting to parse key string: xdmcp/PingIntervalSeconds=15

...

*** Value of status = DISPLAY_REMANAGE just after the call to restart_machine, so gdm_display is restarted and another slave process is created but we are on a SHUTDOWN/REBOOT PROCESS:

        case DISPLAY_REBOOT:    /* Restart machine */
                restart_machine ();

                status = DISPLAY_REMANAGE;
                goto start_autopsy;
                break;

It's true or not ?
I don't understand why there is no condition to affect "DISPLAY_REMANAGE" to "status" in this case ?

Comment 64 Brian Cameron 2008-08-25 15:48:13 UTC

Hmmm.  It doesn't seem like it makes sense to try to REMANAGE the display when doing a restart or halt operation.  Perhaps we need to add a DISPLAY_EXIT which just calls exit, and set status to DISPLAY_EXIT.  Could you try something like that and see if that fixes the problem?

Comment 65 Pascal Brochart 2008-08-26 22:04:49 UTC

I have tested something like this:

in gdm.h

#define DISPLAY_EXIT 255	/* Exit GDM */

in gdm.c

        case DISPLAY_REBOOT:    /* Restart machine */
                restart_machine ();

                status = DISPLAY_EXIT;
                goto start_autopsy;
                break;

        case DISPLAY_HALT:      /* Halt machine */
                halt_machine ();

                status = DISPLAY_EXIT;
                goto start_autopsy;
                break;

...

        case DISPLAY_EXIT:
                _exit(EXIT_SUCCESS);
                break;


After rebooting or halting machine with action menu in GDM, the errors i mentionned in Comment #63 are disappeared.

Comment 66 Brian Cameron 2008-08-28 02:37:28 UTC

That looks good to me.  Can others verify that this addresses the problem?  If
we have this problem fixed, I think it's time to do another GDM 2.20 release.  I
think this is the most serious known regression in the 2.20 code.

Comment 67 Morita Sho 2008-08-29 03:40:07 UTC

After applying the changes proposed in Comment #65, I can see no errors during shutdown or rebooting from the Action menu in gdm.

However, when running shutdown or reboot from the gnome-session, I still got the following errors.
  GLib-CRITICAL: g_hash_table_lookup_extended: assertion 'hash_table != NULL' failed
  WARNING: Request for invalid configuration key xdmcp/Enable=false

I think we need to add a bit more code to prevent the multiple call for gdm_final_cleanup.

Comment 68 Brian Cameron 2008-08-29 04:51:39 UTC

Could you propose a patch that fixes the problem that you see?

Comment 69 Morita Sho 2008-08-29 09:46:36 UTC

I would like to summarize this problem again.

AFAIK, there is two problems.

1) When running shutdown, reboot, suspend from the Actions Menu from the GDM 
   GUI, it tries to REMANAGE the display after the call to restart_machine,
   halt_machine, suspend_machine. It causes the error mentioned in
   Comment #57.

   => Solution: Just calls exit after the call to restart_machine, halt_machine,
      suspend_machine. (Mentioned in Comment #65.)

2) When running shutdown, reboot, suspend from the gnome-session, it causes
   gdm_final_cleanup to get called twice. The error mentioned in Comment #54
   and Comment #67 will appear.

   => Solution: Setting flags and checking them. (Mentioned in Comment #61.)

Perhaps merging the change in Comment #65 and Comment #61 solves the problem properly.
However, Comment #62 says that the proposed patch in Comment #61 is racy and should be fixed.
Hmmmm... Unfortunately, I'm not sure how to make it isn't racy so I can't propose a patch.

Comment 70 Josselin Mouette 2008-08-29 15:00:39 UTC

Created attachment 117589 [details] [review]
Exit after launching the shutdown command

The changes proposed in Comment #65 are not enough since they don’t cover all code paths. However the simple attached patch fixes the issue for me.

Comment 71 Brian Cameron 2008-08-29 18:05:11 UTC

Josselin, are you saying that your patch as well as patch #65 are needed? I
suspect only your patch in #70 should fix the problem.  Pascal, can you verify 
that Josselin's patch also fixes your issues?

I prefer the solution in patch #70.  I think it is a more simple way to fix
things.

Thoughts?  Please let me know right away, if possible.  I'd like to do a new GDM 2.20 release with this bug fix quickly if we can verify this is fixed properly.

Comment 72 Josselin Mouette 2008-08-29 19:57:55 UTC

(In reply to comment #71)
> Josselin, are you saying that your patch as well as patch #65 [edit] are needed? I
> suspect only your patch in #70 should fix the problem.

Yes, my patch should be enough, at least to fix the crash that happens when gdm receives SIGTERM after the termination code is executed.

There are still a few error messages in the syslog, as described in comment #67, so this looks like a different issue (although much less visible to the user).

Comment 73 Brian Cameron 2008-08-30 01:21:35 UTC

I think we should still try to resolve the error messages in the syslog from comment #67.  Can we track down which hash table code is causing this issue?

If I were to guess, it is almost surely a hash table that is in the common/gdm-config.c code.  Note the error message complains about
g_hash_table_lookup_extended, which is used in gdm_config_peek_value and the internal_set_value (called by gdm_config_set_value and gdm_config_process_entry)
functions.  The error seems to happen when trying to access the xdmcp/Enable key.  In the daemon code, this is only accessed via GDM_KEY_XDMCP key in 5 places in the GDM code, in gdm and one place in misc.c. 

Perhaps if one of you could add an additional debug message to just before these 6 places so we can make sure we understand which one is causing the issue, and whether this is a problem?

Comment 74 Josselin Mouette 2008-08-30 02:56:52 UTC

Actually the error messages in my syslog predate the upgrade to a version with the patch #117589, and they don’t appear anymore in the tests I’ve just conducted. So I think we’re fine now.

The only remaining messages in the syslog are some "Gtk-WARNING: Ignoring the separator setting" when the session lasts less than 10 seconds. But as the code in gdm.c already says "FIXME: this is really bad", I guess you already know it could be improved :)

Comment 75 Pascal Brochart 2008-08-30 11:25:12 UTC

Yes, this patch resolves also the second bug, it prevents multiple calls to gdm_final_cleanup because it exits before.
Why not, but setting status to DISPLAY_REMANAGE and "go to start_autopsy" should be removed to do really properly :)

Comment 76 Josselin Mouette 2008-08-30 13:10:18 UTC

(In reply to comment #75)
> Why not, but setting status to DISPLAY_REMANAGE and "go to start_autopsy"
> should be removed to do really properly :)

I’m not sure, since this code will still be executed if the shutdown command is not found.

Comment 77 Morita Sho 2008-08-30 14:09:10 UTC

That patch works fine for me. Thank you.

Can I make a suggestion?
I suggest to put signal(SIGTERM, SIG_IGN) after the execution of shutdown/reboot command.
Otherwise, it is possible that a SIGTERM signal are arrived between gdm_final_cleanup and _exit.
I'm not sure, but it causes multiple calls for gdm_final_cleanup, therefore it is better to ignoring SIGTERM signal.

Comment 78 Josselin Mouette 2008-08-30 14:13:58 UTC

(In reply to comment #77)
> I suggest to put signal(SIGTERM, SIG_IGN) after the execution of
> shutdown/reboot command.
> Otherwise, it is possible that a SIGTERM signal are arrived between
> gdm_final_cleanup and _exit.

That’s definitely a good idea. It should even be done at the beginning of gdm_final_cleanup so that nothing in this function can be executed twice.

Comment 79 Brian Cameron 2008-09-02 16:27:59 UTC

Could you provide an updated final patch that includes the signal() call?  Then I will commit and respin a new release for everybody to test. 

Thanks.

Comment 80 Brian Cameron 2008-09-02 21:56:06 UTC

In other words, I'd like you to provide an updated patch that has been tested and we make sure works properly before I commit and respin a new GDM release.  If you could do this, that would be great.

Comment 81 Josselin Mouette 2008-09-03 11:14:12 UTC

Created attachment 117920 [details] [review]
Remove signal handlers in gdm_final_cleanup and exit after the reboot/shutdown sequence

Here is an updated patch doing what is suggested.

Comment 82 Brian Cameron 2008-09-03 19:17:10 UTC

I think this should fix the bug.  I committed the patch to the 2.20 branch and spun the new 2.20.8 release.  If there are any more issues, please reopen or file a new bug.