GNOME Bugzilla – Bug 517526
segfault in keyfile code on shutdown
Last modified: 2008-09-03 19:17:10 UTC
Please describe the problem: when I shutdown or reboot my computer I can read this message on tty1 Feb 18 17:42:29 laptop-cedric kernel: gdm[2461]: segfault at 10e71e08 ip b78140d5 sp bfb96310 error 4 in libglib-2.0.so.0.1400.5[b77f2000+a0000] Feb 18 22:16:35 laptop-cedric kernel: gdm[2467]: segfault at 10e71e08 ip b77d60d5 sp bfc59810 error 4 in libglib-2.0.so.0.1400.5[b77b4000+a0000] Feb 19 13:06:17 laptop-cedric kernel: gdm[2483]: segfault at 10e71e08 ip b78730d5 sp bf8f4d90 error 4 in libglib-2.0.so.0.1400.5[b7851000+a0000] Steps to reproduce: every time when a shutdown or reboot Actual results: Expected results: Does this happen every time? Other information: I use debian testing whit 2.6.24 git 22 kernel
What version of GDM are you using? Could you try to upgrade to the latest GDM 2.20.3 version if you are using an older version and see if the problem still happens. Also, is there a stacktrace you can provide? GDM may leave a core file in your root (/) directory or in /var/lib/gdm, or /var/lib/log/gdm.
(In reply to comment #1) > What version of GDM are you using? Could you try to upgrade to the latest GDM > 2.20.3 version if you are using an older version and see if the problem still > happens. > > Also, is there a stacktrace you can provide? GDM may leave a core file in your > root (/) directory or in /var/lib/gdm, or /var/lib/log/gdm. > Yes I use the last version 2.20.3-1 but I don't find the core file!!
I don't know how things work on debian, but I know on Solaris, I have to monkey around with my "coreadm" settings (/etc/coreadm) by running the coreadm tool to get system programs like GDM to generate core files.
Several Debian bug reports have been filed about this. The one with the most information is at <http://bugs.debian.org/459024>. I am trying to work out how to make gdm generate a core file.
I can't get GDM to generate a core file. I placed 'ulimit -c unlimited' in /etc/init.d/gdm, but no core file appears in /var/lib/gdm. I also tried attaching to gdm with gdb, but when I restart having done this, gdm does not segafault! Instead, it exits normally.
Looking at the log in the above bug report, I see that the last gdm messages indicate that GDM is in the function gdm_cleanup_children. The last message indicates that it is checking the GDM_KEY_DYNAMIC_XSERVERS. Look in the file daemon/gdm.c, in the function gdm_cleanup_children. Notice that GDM should print another debug message after the start_autopsy section. But we don't see any further messages. It might be useful if you were to add some additional debug messages in this function to see exactly how far in this function GDM gets before it crashes. This would probably highlight the problem.
I'm on Debian and have a similar problem; segfault at shutdown and on reload. If I have gdm waiting for a login, then run /etc/init.d/gdm reload on a virtual console, I get a segfault on one of the 2 gdm processes running. I was able to capture a full backtrace and a core file by running two gdb instances and attaching them to the running gdm processes. I rebuild gdm with debugging symbols, but unfortunately the Debian package doesn't seem to support building without optimisation so some parts are optimised out. Here is the backtrace, the core file is attached. Program received signal SIGSEGV, Segmentation fault. slab_allocator_free_chunk (chunk_size=<value optimized out>, mem=0x80e87a8) at /tmp/buildd/glib2.0-2.14.6/glib/gslice.c:1091 1091 /tmp/buildd/glib2.0-2.14.6/glib/gslice.c: No such file or directory. in /tmp/buildd/glib2.0-2.14.6/glib/gslice.c
+ Trace 191719
Core file is here since it is too big for bugzilla: http://bonedaddy.net/pabs3/files/tmp/gdm.core I'm using Debian GNU/Linux sid with gdm 2.20.3-2, rebuilt with debugging symbols.
This seems a bit odd. I wonder if gdm_final_cleanup is getting called more than once? I wonder if it would help to make sure to set config->mandatory_key_file, config->default_key_file, config->custom_key_file to NULL after calling g_key_file_free? This is in the function gdm_config_free in common/gdm-config.c Could you try this?
Applied the following patch and still got a segfault. --- gdm-2.20.3.orig/common/gdm-config.c +++ gdm-2.20.3/common/gdm-config.c @@ -601,12 +601,15 @@ if (config->mandatory_key_file != NULL) { g_key_file_free (config->mandatory_key_file); + config->mandatory_key_file = NULL; } if (config->default_key_file != NULL) { g_key_file_free (config->default_key_file); + config->default_key_file = NULL; } if (config->custom_key_file != NULL) { g_key_file_free (config->custom_key_file); + config->custom_key_file = NULL; } if (config->value_hash != NULL) { g_hash_table_destroy (config->value_hash); Here is the backtrace and core file for pid 15362 (parent gdm): http://bonedaddy.net/pabs3/files/tmp/gdm.15362.core Continuing. Program received signal SIGUSR1, User defined signal 1.
+ Trace 191807
Thread 3075151648 (LWP 15362)
Couldn't get registers: No such process. Continuing. Couldn't get registers: No such process. The program is running. Quit anyway (and detach it)? (y or n) Not confirmed. Continuing. Couldn't get registers: No such process. The program being debugged has been started already. Start it from the beginning? (y or n) Program not restarted. Continuing. Couldn't get registers: No such process. The program is running. Quit anyway (and detach it)? (y or n) Detaching from program: /usr/sbin/gdm, process 15363 Quitting: ptrace: No such process.
can you export G_SLICE=always-malloc before starting gdm and get a backtrace again? gslice sometimes makes gdb give bogus backtraces.
I couldn't reproduce it after restarting it with G_SLICE=always-malloc. After restarting it without G_SLICE=always-malloc I could reproduce it again.
How about G_SLICE=debug-blocks ?
Cannot even start GDM when I use G_SLICE=debug-blocks: Starting GNOME Display Manager: gdm GSlice: MemChecker: attempt to release non-allocated block: 0x80f44d8 size=12 /lib/lsb/init-functions: line 53: 4345 Aborted /sbin/start-stop-daemon --start --nicelevel 0 --quiet --exec /usr/sbin/gdm --oknodo --pidfile /dev/null --
Thanks for all the information, and for testing the patch I suggested. The gdb output you provided shows two stack traces. The first one (with the SIGUSR1) is with the keyfile issue. I wonder if this is a red herring. Perhaps the real issue is with the gdm_fdgetc call from the pam conversation function. This seems odd. This code is normally called when the daemon is trying to talk to the greeter, as if the daemon thinks it needed to restart the greeter, but this might be failing because the system is really shutting down. If so, this might be related to bug #471479. Not sure.
Wouldn't the second one be the current gdm being requested to shut down, so a new instance can be started? It is a SIGTERM after all.
Good catch. I think you are right. I'm just confused why the code would be crashing in the memory handling when we free the keyfile. Looking at the code, I just don't see why this would happen...unless there is a bug in the glib gkeyfile stuff.
*** Bug 524250 has been marked as a duplicate of this bug. ***
I suspect this bug might be related to bug #471479, but not sure.
I noticed something that may contribute to an understanding here. That is, that when users which were created post-installation are logged in, and then log out, GDM reliably logs out and redisplays the login screen. At least on my machine, it's only the _first_ user created on my system, which was created within the Debian installer, that crashes on log out or on switch to one of the text consoles. I even created another user and moved all the files from the old one into the new one, resetting group and ownership, and the new user never has a problem when logging out. The first user still does, almost every time. So this may point to some sort of permissions error, where the first user created has a special type of access that gets her into trouble.
Chris: ah, problem with that theory is that the gdm crashes I got were when no user was logged in through gdm, and I did /etc/init.d/gdm reload as root from a virtual console.
*** Bug 526055 has been marked as a duplicate of this bug. ***
Created attachment 110150 [details] [review] patch fixing issue I was able to spend some time looking into this problem, and I think the attached patch should fix the problem. Could other people who are experiencing this problem test the attached patch and let me know if it also resolves their problems? Thanks.
I still get the issue (segfault on /etc/init.d/gdm reload) when applying the patch to Debian's gdm 2.20.5-1. I note that Debian has a lot of patches to gdm, maybe it is one of those?
Okay, I think I understand what is going on here. I recently discovered this problem is also happening on Solaris, which makes it easier to debug. It seems what is happening is that GDM is calling gdm_final_cleanup. Then while GDM is in the middle of freeing the memory, it receives a signal, such as a SIGABT which causes the main_daemon_abrt function to be called. Note that main_daemon_abrt calls gdm_final_cleanup again. Then when it tries to free the keyfile memory again, it seg-faults. I have written a second patch which I think fixes the problem. Using this patch I no longer see core files on Solaris. This patch sets local variables equal to the memory that we intend to free, and then sets the structure variables to NULL before doing the free. This way if the function is called again, we do not try to free the same data structures again. This function could probably be made more thread safe. If someone wants to spend further time making this code smarter, I'd be happy to accept a better patch. However, this patch at least seems to fix GDM so it does not crash consistently on shutdown. Could you please test this patch and let me know if it works better.
Created attachment 110185 [details] [review] 2nd try for a patch that fixes the problem Note I had previously committed my last patch, which didn't work. I backed out that modification from the GDM 2.20 branch and have committed this patch instead.
Wouldn't a mutex be the right way to fix this?
To be honest, I'm not really sure what the "right" way to fix this problem would be. The complications of writing thread-safe code that needs to work within the context of signal handlers has been an ongoing source of problems within GDM. I realize that my patch isn't that great, so if you or anybody else wants to provide a better patch that would be great too. However, I think in the meantime my patch should at least make it stop crashing so consistently. Can people verify that my patch improves the situation. That way we can at least be more confident that we've identified the source of the problem.
I try the 2nd patch and it doesn't work for me (linux debian x86 with gdm 2.20.5). There is always a segfault on shutdown/restart.
The second patch prevents Debian gdm 2.20.5-1 from segfaulting on /etc/init.d/gdm reload but gdm exits instead of restarting when /etc/init.d/gdm reload is run.
Pascal, did you make sure to restart GDM by running gdm-restart or rebooting after updating GDM with the patch? If so, and it is still crashing, then could you please provide a stack trace and any other additional information about the segfault that you continue to see. To be honest, I am a bit mystified if you are seeing a consistent segfault with the same stack trace after applying the patch. Paul, I am glad to hear that the patch fixes the core dumping issue. Could you explain what "/etc/init.d/gdm reload" does? The /etc/init.d/gdm script is not a part of GDM, and is likely something added by your distro. I use Solaris, which does not use any /etc/init.d/gdm scripts so I am unfamiliar with it. Does it just run gdm-restart? If so, I am not sure why it is broken, gdm-restart seems to work reasonably for me. Note, I do notice that gdm-restart does not always behave so well immediately after installing a new version of GDM. I notice I generally have to reboot once after installing a new version before gdm-restart works right. At any rate, Paul, could you provide more information about what specifically isn't working when you try to reload? Perhaps turn on GDM debug and share the output, that might help.
/etc/init.d/gdm reload sends the USR1 signal to the gdm process so that it reloads its configuration and restarts. I'll try to get debug output for you later today. PS: Debian gdm has a several patches that may be interfering with this.
After more testing, I can say this: I rebuild GDM-2.20.5-1 on Debian Testing/Lenny with debugging symbols and patch "gdm-try2.diff". A segfault only occurs when GDM shutdown/restart the computer (Actions menu). It's really strange because the segfault appears when the script "K01gdm" is not launched yet. When the shutdown/restart is done from gnome (gnome-session), there is no segfault but only one of the two processes of GDM was stopped (with K01gdm). The last chance script "sendsigs" launched just before the shutdown sends a signal -15 (a warning of GDM appears) and after a KILL -9 (the second processe is killed). And when I run "init 6" or "init 0" (on TTY or Gnome Terminal), GDM is stopped properly. For the moment, I can't get a backtrace but I will try as soon as possible.
Here is the backtrace (when shutdown/restart is done from GDM): Program received signal SIGSEGV, Segmentation fault.
+ Trace 197269
Thread 3075020576 (LWP 2831)
*** Bug 532176 has been marked as a duplicate of this bug. ***
Created attachment 110654 [details] [review] patch to test That looks like a new issue. I notice that the code is always calling gdm_daemon_config_get_value_bool_per_display regardless of the signal, or whether it will use the variable. Since mainloop_sig_callback calls gdm_cleanup_children in a while() loop, it is probably bad to always refer to the configuration data in the signal handler - unless dealing with a signal where we are confident that GDM isn't in the process of going down. I am providing a patch which moves the call to get the sysmenu configuration option so it is only done in the one signal cases which actually seems to need it (when processing chooser). In the cases when GDM receives this signals, the configuration should be okay. I removed the belt-and-braces code that tries to avoid remanaging the display if the configuration is not set up right. There shouldn't be any harm in restarting the GUI even if the configuration isn't right. If, for example, the reboot configuration command is NULL, this would just mean that the GUI would restart and the reboot wouldn't happen. Could you test this patch and see if it makes things work better?
Created attachment 110658 [details] [review] updated patch to test That last patch is wrong. After reviewing the code a bit more, it is necessary to reset status from REBOOT/HALT/SUSPEND to REMANAGE if the configuration doesn't support reboot/halt/suspend. However, this patch is smarter and only accesses the configuration values in the reboot/halt/suspend/chooser cases. I think this may fix your crashing issue. Can you test it? If this doesn't fix the crashing issue, it may be necessary to cache the various configuration options needed in the signal handlers into globals or something so we don't have to worry about them getting freed out from under us during the shutdown sequence.
The last patch works fine for me, good ! With this one and "gdm-try2.diff", there is no more segfault.
Great. I committed the patch upstream, so hopefully this issue is now fixed. I'm marking this bug as fixed. Reopen or file a new bug if you find more issues. I will likely do a new GDM 2.20 release with these fixes since these seem to be causing users a lot of issues.
The patch fixes some of the issues (especially the SIGABRT upon shutdown) but I’m still getting several segfaults during the shutdown process.
The patch on the file "common/gdm-config.c" is missing in GDM-2.20.5-2/debian unstable (it resolves at least one segfault). But there are still problems... Ex: Sending a SIGTERM to GDM, the following error appears: g_hash_table_lookup_extended: assertion `hash_table != NULL'
Pascal & Josselin: Could you please provide more detail on the segfault issues you are seeing. Could you both provide a stacktrace? I'm reopening this bug until we get more clarity that the issues are actually resolved. By the way, I am going to do a 2.20.6 release since there are a lot of fixes that I'd like to make more available, so it might be good to check again with the new release when it comes out. That way, we make sure that everybody is testing with the same codebase/patches, etc.
I can confirm gdm 2.20.6 fixes the crash but I'm sometime getting the following assertion when stopping it : ***MEMORY-ERROR***: gdm-binary[5799]: GSlice: assertion failed: sinfo->n_allocated > 0 Program received signal SIGABRT, Aborted. 0xffffe410 in __kernel_vsyscall ()
+ Trace 199468
Created attachment 112037 [details] [review] use g_free, not g_ptr_array_free this patch fixes invalid memory free. It allows gdm to be run with G_SLICE=debug-blocks (which was triggering the error) and fixes assertion I posted some minutes ago.
Thanks for the patch. This is now committed to the 2.20 branch. So, is anybody else experiencing similar crashing or assertion problems on exit. Hopefully we have now fixed all the problems in this area. If I don't hear any more issues, I will close this bug in a week or so.
Even with 2.20.6 and Frédéric’s patch, I still get an invalid free on shutdown. However simply stopping gdm does not seem to trigger it, so I can’t get a stack trace.
That's disappointing. I'm not seeing any more issues myself using the latest GDM. Hopefully someone who is seeing this problem can provide a stack trace. Otherwise, it's really hard to debug. On Solaris, you need to configure coreadm to get core files from system processes like GDM. You might need to do something similar on your system.
I am also still seeing crashes on shutting down the machine were gdm either hangs or crashes. This is on debian sid 2.20.6 + patch + own compile - no use.
Also, something that I suspected and that was confirmed by another user: the crash does not occur at the time of the gdm shutdown, but later in the shutdown process, when init kills all remaining processes. Which is probably why there is no core file created. The error message is: Asking all remaining processes to terminate...done. gdm[3459]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion 'hash_table != NULL' failed gdm[3459]: WARNING: Request for invalid configuration key xdmcp/Enable=false *** glibc detected *** /usr/sbin/gdm: double free or corruption (!prev): 0x0000000000664cf0 ****
I notice that there are a few calls where GDM_KEY_XDMCP is references in daemon/gdm.c. One in gdm_final_cleanup, one in deal_with_x_crashes, two in main, and one in gdm_handle_message. Based on our experience so far, I'd bet the problem is happening to the place where we call in gdm_final_cleanup. Since we can't get a core file, then the next best thing is to debug via gdm_debug statements. More time-consuming, but hopefully will highlight the problem. One thing we should do is figure out exactly which reference to the xdmcp/Enable key is causing this problem. I would add gdm_debug calls immediately before each reference to GDM_KEY_XDMCP in daemon/gdm.c, so we can identify which one is causing the problem. Also, I notice that there are a half-dozen places where the message "Request for invalid configuration key" is shown in daemon/gdm-daemon-config.c. If we could modify the messages so they are different (perhaps add a unique number to the end of each string), then it would be easier to figure out what is going wrong. Could you add such gdm_debug messages to the code, and re-run, and let me know what you find?
My system have this bug too (gdm 2.27 gnome 2.22.3 i386 core duo2 2.24.19 kernel) http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=486657 have attached screenshot I hope it help you, couse it is very annoying problem.
Thanksf or the reference to the Debug bug report. However, I don't see any additional information there that isn't already reported in this report. As I say in Comment #50, someone who is experiencing this problem needs to do some debug work to track down the problem.
I am not guru in linux, I make my first steps. But I am J2EE developer, If someone give me detail TO:DO list, I will try to get debug report (on ubuntu 8.04.1).
Hi, I have the same problem on Debian GNU/Linux unstable. Unfortunately, I can't use gdb because of the problem is gone on gdb. Thus I put so many g_warning to many places for investigation. While restarting the computer, I noticed that gdm_final_cleanup() is called twice. The first one was called from restart_machine(), and the second one was called from mainloop_sig_callback(). gdm_final_cleanup() has following code: if (gdm_daemon_config_get_value_bool (GDM_KEY_XDMCP)) gdm_xdmcp_close (); That seems to cause following errors since daemon_config were already free'd in first gdm_final_cleanup() call. gdm[3459]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion 'hash_table != NULL' failed gdm[3459]: WARNING: Request for invalid configuration key xdmcp/Enable=false Presumably, restart_machine() executes "/sbin/reboot" and then calls gdm_final_cleanup(). Executing /sbin/reboot causes to send a SIGTERM signal to processes including gdm. When gdm has got a SIGTERM, mainloop_sig_callback() will be called to handle the signal. mainloop_sig_callback() calls gdm_final_cleanup() then exit(EXIT_SUCCESS).
Oh, thanks for tracking this down. This does sound like the origin of the many problems we've fixed in this bug report. I think we should remove the call from restart_machine and just let it cleanup once. Would you be able to test this small change, and let us know if it solves your problem? If so, I can put this into the 2.20 branch and re-spin a new release.
I think we should also consider about halt_machine since it calls gdm_final_cleanup. I removed the call for gdm_final_cleanup from restart_machine and halt_machine and I tested shutdown/restart repeatedly. As a result, no error messages is shown in the console nor syslog while shutdown/restarting. The problem has been solved for me by this small changes.
Created attachment 116298 [details] [review] Fix assertion failed during shutdown process It does not work for me, gdm freezes during the shutdown process. However, I propose a patch which resolved for me this problem and one other (here is the output of daemon.log): Aug 10 18:45:36 debian init: Switching to runlevel: 6 Aug 10 18:45:36 debian gdm-binary[3073]: CRITICAL: gdm_connection_close: assertion `conn != NULL' failed Aug 10 18:45:36 debian last message repeated 2 times Aug 10 18:45:36 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed Aug 10 18:45:36 debian gdm-binary[3073]: WARNING: Request for invalid configuration key xdmcp/PingIntervalSeconds=15 Aug 10 18:45:36 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed Aug 10 18:45:36 debian gdm-binary[3073]: WARNING: Request for invalid configuration key xdmcp/PingIntervalSeconds=15 Aug 10 18:45:37 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: Request for invalid configuration key daemon/ServAuthDir=/var/lib/gdm Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: gdm_slave_send| : impossible d'ouvrir la FIFO Aug 10 18:45:37 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: Request for invalid configuration key daemon/ServAuthDir=/var/lib/gdm Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: gdm_auth_secure_display| : impossible d'ouvrir de manière sûre Aug 10 18:45:37 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: Request for invalid configuration key daemon/ConsoleCannotHandle=am,ar,az,bn,el,fa,gu,hi,ja,ko,ml,mr,pa,ta,zh Aug 10 18:45:37 debian gdm-binary[3073]: GLib-CRITICAL: g_hash_table_lookup_extended: assertion `hash_table != NULL' failed Aug 10 18:45:37 debian gdm-binary[3073]: WARNING: Request for invalid configuration key daemon/ConsoleNotify=true someone can test this patch ?
This patch is wrong for the first bug because the function "mainloop_sig_callback()" do nothing when a SIGTERM signal is sent. Maybe a solution is to call gdm_final_cleanup in "mainloop_sig_callback()" when the system is not in a Shutdown/Restart process.
Created attachment 116339 [details] [review] Fix double assertion failed during shutdown process I propose another patch, maybe this one is more acceptable...
Oops! I only tested shutdown/restart from gnome-session for my patch, forgot to test from Action menu in gdm... When I shutdown/restart from Action menu in gdm, I see the same errors as Comment #57. The last patch Pascal proposed works fine. I can see no errors/warnings in the console nor syslog. Thanks for your patch.
Created attachment 116380 [details] [review] Minor change to previous patch "Fix assertion failed during shutdown process" I am happy to see that somebody can reproduce the second problem. It was similar to the first, gdm_cleanup_children() tries to un-manage display and other but gdm_final_cleanup() has already done. I re-uploaded the patch with minor change (affection to the variable should be done before the execution of gdm_final_cleanup).
I'd like to understand this code a bit more and fix it properly. I don't really like the proposed patch. It tried to work around these problems by setting flags and checking them. This just moves the race condition around. While it possibly makes the code less likely to fall into the race condition, we should just fix the code so it isn't racy. Also, adding "in_final_cleanup" to the halt_machine, restart_machine, custom_cmd_restart functions makes no logical sense to me. Nothing about these three functions has anything to do with whether we are in a "final cleanup" state or not. You can obviously shut down GDM without having selected shutdown or reboot. So we seem to have discovered is the following: 1) When running shutdown, reboot, suspend from the Actions Menu from the GDM GUI, now gdm_final_cleanup only gets called once. I am guessing there are no shutdown warnings or problems in the latest GDM 2.20 release running this way. Note that in this case, the GDM GUI exits with a return code like DISPLAY_HALT, and exits. The GDM daemon picks this up when it notices the SIGCHLD. I'm guessing that in this case there is no SIGTERM generated. 2) When running shutdown, reboot, suspend from the gnome-session, it causes gdm_final_cleanup to get called twice, causing problems. I suspect the first time is in gdm_do_logout_action, called by gdm_try_logout_action, called when when the GDM daemon receives the SET_LOGOUT_ACTION message (refer to sup_handle_set_logout_action function). Then the second time is in when the TERM signal is received. Is that what other people are seeing? Have I summarized what is going on well? However, perhaps I am confused. The attached patch seems to indicate that adding racy belts and braces to gdm_cleanup_children will fix this problem. If my above analysis is correct, then this function is only called in case #1 which didn't have any problems before. Or do we also get the SIGCHLD in case #2? Or perhaps the latest patch just works because case #2 calls gdm_try_logout_action, which calls halt_machine which sets the "in_final_cleanup" flag, so that when the SIGTERM happens, it doesn't try to call gdm_final_cleanup again? If so, then I suspect that the code changes to gdm_cleanup_children aren't really needed to make this hacky racy patch work. Perhaps it would be better to fix case #2 so that we don't have so much complicated logic. Why don't we just make the main daemon send a SIGTERM to itself in this case and just let the SIGTERM handler logic just take care of all cleanup needed? Something that looks odd to me is the following. I notice that the halt_machine/restart_machine/suspend_machine functions are called in 2 places: gdm_cleanup_children and gdm_do_logout_action. gdm_try_logout_action calls gdm_do_logout_action. Now note that gdm_cleanup_children has a switch statement where it will call functions like restart_machine, halt_machine or gdm_try_logout_action. Then at the end of the function it calls gdm_try_logout_action again. So it looks to me like the child handler could call these functions multiple times also.
Maybe this will be more comprehensible with this ouput from GDM-2.20.7 (original version without patchs). Reboot from Action Menu in GDM: Aug 21 15:16:15 debian gdm-binary[4229]: DEBUG: term_quit: Final cleanup Aug 21 15:16:15 debian gdm-binary[4229]: DEBUG: gdm_slave_quick_exit: Will kill everything from the display Aug 21 15:16:15 debian gdm-binary[4229]: DEBUG: Running gdm_verify_cleanup and pamh != NULL Aug 21 15:16:15 debian gdm-binary[4229]: DEBUG: gdm_server_stop: Server for :0 going down! Aug 21 15:16:15 debian gdm-binary[4229]: DEBUG: gdm_server_stop: Killing server pid 4237 Aug 21 15:16:16 debian gdm-binary[4229]: DEBUG: gdm_server_stop: Server pid 4237 dead Aug 21 15:16:16 debian gdm-binary[4229]: DEBUG: gdm_slave_quick_exit: Killed everything from the display *** Ok, no problem here... Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: mainloop_sig_callback: Got signal 17 Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: gdm_cleanup_children: child 4229 returned 8 Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Looking up per display value for greeter/SystemMenu=true Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: greeter/SystemMenu=true Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: greeter/SystemMenu=true Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Requesting group=greeter key=SystemMenu locale=(null) Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: greeter/SystemMenu=true Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: daemon/RebootCommand=/usr/bin/reboot;/sbin/reboot;/sbin/shutdown -r now;/usr/sbin/shutdown -r now Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Redémarrage de l'ordinateur en cours... Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: daemon/RebootCommand=/usr/bin/reboot;/sbin/reboot;/sbin/shutdown -r now;/usr/sbin/shutdown -r now Aug 21 15:16:16 debian gdm-binary[4228]: DEBUG: Running /sbin/shutdown -r now "Rebooted via gdm." Aug 21 15:16:18 debian init: Switching to runlevel: 6 *** A signal 17 (SIGCHLD) is sent "to mainloop_sig_callback" and "gdm_cleanup_children" is executed. Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: gdm_final_cleanup Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: gdm_display_unmanage: Stopping :0 (slave pid: 0) Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: gdm_display_unmanage: Display stopped Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: xdmcp/Enable=false Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: Attempting to parse key string: daemon/ServAuthDir=/var/lib/gdm *** We are in "start_autopsy" label with status=DISPLAY_REBOOT. *** So restart_machine is executed and gdm_display is stopped (gdm_final_cleanup is called by restart_machine). Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: gdm_child_action: In remanage Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: gdm_display_manage: Managing :0 Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: loop check: last_start 1219324566, last_loop 1219324566, now: 1219324578, retry_count: 1 Aug 21 15:16:18 debian gdm-binary[4228]: DEBUG: Forking slave process Aug 21 15:16:18 debian gdm-binary[4265]: CRITICAL: gdm_connection_close: assertion `conn != NULL' failed Aug 21 15:16:18 debian last message repeated 2 times Aug 21 15:16:18 debian gdm-binary[4265]: DEBUG: Attempting to parse key string: xdmcp/PingIntervalSeconds=15 ... *** Value of status = DISPLAY_REMANAGE just after the call to restart_machine, so gdm_display is restarted and another slave process is created but we are on a SHUTDOWN/REBOOT PROCESS: case DISPLAY_REBOOT: /* Restart machine */ restart_machine (); status = DISPLAY_REMANAGE; goto start_autopsy; break; It's true or not ? I don't understand why there is no condition to affect "DISPLAY_REMANAGE" to "status" in this case ?
Hmmm. It doesn't seem like it makes sense to try to REMANAGE the display when doing a restart or halt operation. Perhaps we need to add a DISPLAY_EXIT which just calls exit, and set status to DISPLAY_EXIT. Could you try something like that and see if that fixes the problem?
I have tested something like this: in gdm.h #define DISPLAY_EXIT 255 /* Exit GDM */ in gdm.c case DISPLAY_REBOOT: /* Restart machine */ restart_machine (); status = DISPLAY_EXIT; goto start_autopsy; break; case DISPLAY_HALT: /* Halt machine */ halt_machine (); status = DISPLAY_EXIT; goto start_autopsy; break; ... case DISPLAY_EXIT: _exit(EXIT_SUCCESS); break; After rebooting or halting machine with action menu in GDM, the errors i mentionned in Comment #63 are disappeared.
That looks good to me. Can others verify that this addresses the problem? If we have this problem fixed, I think it's time to do another GDM 2.20 release. I think this is the most serious known regression in the 2.20 code.
After applying the changes proposed in Comment #65, I can see no errors during shutdown or rebooting from the Action menu in gdm. However, when running shutdown or reboot from the gnome-session, I still got the following errors. GLib-CRITICAL: g_hash_table_lookup_extended: assertion 'hash_table != NULL' failed WARNING: Request for invalid configuration key xdmcp/Enable=false I think we need to add a bit more code to prevent the multiple call for gdm_final_cleanup.
Could you propose a patch that fixes the problem that you see?
I would like to summarize this problem again. AFAIK, there is two problems. 1) When running shutdown, reboot, suspend from the Actions Menu from the GDM GUI, it tries to REMANAGE the display after the call to restart_machine, halt_machine, suspend_machine. It causes the error mentioned in Comment #57. => Solution: Just calls exit after the call to restart_machine, halt_machine, suspend_machine. (Mentioned in Comment #65.) 2) When running shutdown, reboot, suspend from the gnome-session, it causes gdm_final_cleanup to get called twice. The error mentioned in Comment #54 and Comment #67 will appear. => Solution: Setting flags and checking them. (Mentioned in Comment #61.) Perhaps merging the change in Comment #65 and Comment #61 solves the problem properly. However, Comment #62 says that the proposed patch in Comment #61 is racy and should be fixed. Hmmmm... Unfortunately, I'm not sure how to make it isn't racy so I can't propose a patch.
Created attachment 117589 [details] [review] Exit after launching the shutdown command The changes proposed in Comment #65 are not enough since they don’t cover all code paths. However the simple attached patch fixes the issue for me.
Josselin, are you saying that your patch as well as patch #65 are needed? I suspect only your patch in #70 should fix the problem. Pascal, can you verify that Josselin's patch also fixes your issues? I prefer the solution in patch #70. I think it is a more simple way to fix things. Thoughts? Please let me know right away, if possible. I'd like to do a new GDM 2.20 release with this bug fix quickly if we can verify this is fixed properly.
(In reply to comment #71) > Josselin, are you saying that your patch as well as patch #65 [edit] are needed? I > suspect only your patch in #70 should fix the problem. Yes, my patch should be enough, at least to fix the crash that happens when gdm receives SIGTERM after the termination code is executed. There are still a few error messages in the syslog, as described in comment #67, so this looks like a different issue (although much less visible to the user).
I think we should still try to resolve the error messages in the syslog from comment #67. Can we track down which hash table code is causing this issue? If I were to guess, it is almost surely a hash table that is in the common/gdm-config.c code. Note the error message complains about g_hash_table_lookup_extended, which is used in gdm_config_peek_value and the internal_set_value (called by gdm_config_set_value and gdm_config_process_entry) functions. The error seems to happen when trying to access the xdmcp/Enable key. In the daemon code, this is only accessed via GDM_KEY_XDMCP key in 5 places in the GDM code, in gdm and one place in misc.c. Perhaps if one of you could add an additional debug message to just before these 6 places so we can make sure we understand which one is causing the issue, and whether this is a problem?
Actually the error messages in my syslog predate the upgrade to a version with the patch #117589, and they don’t appear anymore in the tests I’ve just conducted. So I think we’re fine now. The only remaining messages in the syslog are some "Gtk-WARNING: Ignoring the separator setting" when the session lasts less than 10 seconds. But as the code in gdm.c already says "FIXME: this is really bad", I guess you already know it could be improved :)
Yes, this patch resolves also the second bug, it prevents multiple calls to gdm_final_cleanup because it exits before. Why not, but setting status to DISPLAY_REMANAGE and "go to start_autopsy" should be removed to do really properly :)
(In reply to comment #75) > Why not, but setting status to DISPLAY_REMANAGE and "go to start_autopsy" > should be removed to do really properly :) I’m not sure, since this code will still be executed if the shutdown command is not found.
That patch works fine for me. Thank you. Can I make a suggestion? I suggest to put signal(SIGTERM, SIG_IGN) after the execution of shutdown/reboot command. Otherwise, it is possible that a SIGTERM signal are arrived between gdm_final_cleanup and _exit. I'm not sure, but it causes multiple calls for gdm_final_cleanup, therefore it is better to ignoring SIGTERM signal.
(In reply to comment #77) > I suggest to put signal(SIGTERM, SIG_IGN) after the execution of > shutdown/reboot command. > Otherwise, it is possible that a SIGTERM signal are arrived between > gdm_final_cleanup and _exit. That’s definitely a good idea. It should even be done at the beginning of gdm_final_cleanup so that nothing in this function can be executed twice.
Could you provide an updated final patch that includes the signal() call? Then I will commit and respin a new release for everybody to test. Thanks.
In other words, I'd like you to provide an updated patch that has been tested and we make sure works properly before I commit and respin a new GDM release. If you could do this, that would be great.
Created attachment 117920 [details] [review] Remove signal handlers in gdm_final_cleanup and exit after the reboot/shutdown sequence Here is an updated patch doing what is suggested.
I think this should fix the bug. I committed the patch to the 2.20 branch and spun the new 2.20.8 release. If there are any more issues, please reopen or file a new bug.