GNOME Bugzilla – Bug 143590
signal handler in gconfd called malloc caused hanged login
Last modified: 2006-07-05 13:07:20 UTC
The following are steps specific to Solaris to reproduce the problem: - login to GNOME 2.6 - logout repeat that a number of times (say 10 times) Then, the user is not able to login again with the hour glass lasting forever. telnet to the machine and notice that gconfd is the process left from the previous login session and gconf-sanity-check-2 is waiting for gconfd to response. gconfd is sitting in the middle of of signal handler but not able to response. Like, (dbx) where current thread: t@1 =>[1] __lwp_park(0xcbef0000, 0x0), at 0xcc0348d0 [2] mutex_lock_queue(0xcbef0000, 0x0, 0xcc013550), at 0xcc03067f [3] slow_lock(0xcc013550, 0xcbef0000), at 0xcc030e95 [4] _ti_mutex_lock(0xcc013550), at 0xcc030f50 [5] malloc(0x2c, 0x1, 0xcc077a25, 0xcc2ddc90, 0x0, 0x1), at 0xcbfa0b4c [6] standard_malloc(0x0, 0x8046ca4, 0x805a378, 0x8046d18, 0x80e9ff8, 0x0, 0xcc0988c1), at 0xcc077a38 [7] _g_gnulib_vasprintf(0x1, 0x0, 0x8046dc0, 0x8056425), at 0xcc0988df [8] call_user_handler(0x1, 0x0, 0x8046dc0), at 0xcc02e9fb [9] sigacthandler(), at 0xcc02eb21 ---- called from signal handler with signal 1 (SIGHUP) ------ [10] _free_unlocked(0x84b2008), at 0xcbfa17c4 [11] free(0x84b2008, 0xcc00f000, 0xcc077a6b, 0xcbfa1774, 0xcc013550, 0xcc0d6370), at 0xcbfa1766 Looking at the gconfd.c, it looks like within the signal handler for SIGHUP, it calls, gconf_log (GCL_INFO, _("Received signal %d, shutting down cleanly"), signo); which calls dgettext (the above stack trace has dgettext removed) and in gconf_log it calls g_strdup_vprintf() which callled malloc. According to Richard Steven's book and also http://www.linux.com/howtos/Secure-Programs-HOWTO/signals.shtml it is a unsafe thing to malloc in the signal handler. I believe the problem this is the problem I have seen. Now, I tried very hard on a Linux box running GNOME 2.6 (JDS on SuSE), I just can't reproduce the problem. Now is there something specific or different on Linux, I don't know. When I removed the call to gconf_log under SIGHUP case, the problem goes away. I log this bug so that some more experioence folks with signal may want to contributet to this.
Is this reproducible on GNOME 2.8 as well?
Is it reproducible on gnome 2.10 or higher?
No sure at the moment as we have yet come to QA GNOME 2-10. But will certainly update this bug in such time. We have a sun-specific patch which is not to call gconf_log() for GNOME 2.6 on Solaris to avoid this problem rather than fixing this. If there is no one else in the Linux community seeing this, could this be due to the differemces between Solaris and Linux in the signal handling, I don't know.
Closing this bug report as no further information has been provided. Please feel free to reopen this bug if you can provide the information Christian and Farzaneh asked for. Thanks a lot in advance!
Thanks for closing this, I don't seem to see this anymore in gnome 2.14. Aoplogy for not responded to the last 2 requests :)