GNOME Bugzilla – Bug 789237
Call gjs_dumpstack on aborts and traps and optionally on segfaults
Last modified: 2017-10-24 06:19:34 UTC
In order to debug properly some crashes which are triggered by JS code we might need to see the the JS stack. Adding some debug bits in order to get this possible.
Created attachment 361927 [details] [review] main: call gjs_dumpstack on aborts and traps and optionally on segfaults In order to debug issues triggered by JS code we might need to see the stack of it, this is not normally visible in static stack traces, thus we need to call gjs_dumpstack () before dying the process. Intercepting signals SIGABRT and SIGTRAP (needed for catching fatal glib errors) by default, while introducing a new 'backtrace-segfaults' flag for the SHELL_DEBUG environment variable to do the same on SIGSEGV and SIGBUS. In any case after dumping the stack we raise the signal again, in order to make the system aware of it.
Review of attachment 361927 [details] [review]: I think this is reasonable, but I'll wouldn't mind some gnome-shell maintainer have a say if there is a reason why we shouldn't. ::: src/main.c @@ +334,3 @@ +dump_gjs_stack_on_signal_handler (int signo, + siginfo_t *si, + void *unused) nit: alignment
(In reply to Jonas Ådahl from comment #2) > I wouldn't mind some gnome-shell maintainer have a say if there > is a reason why we shouldn't. FWIW, I can't think of one.
Review of attachment 361927 [details] [review]: ::: src/main.c @@ +336,3 @@ + void *unused) +{ + gjs_dumpstack (); this might end up getting stuck if any of the locks it acquires in gjs and glib are being held already but can't think of any other way and the process is dying for these signals anyway @@ +350,3 @@ + + sa.sa_flags = SA_SIGINFO | SA_RESETHAND; + sa.sa_sigaction = dump_gjs_stack_on_signal_handler; could use sa_handler since we aren't using anything but the signal number @@ +492,3 @@ + { + dump_gjs_stack_on_signal (SIGSEGV); + dump_gjs_stack_on_signal (SIGBUS); perhaps catch SIGFPE too
(In reply to Florian Müllner from comment #3) > (In reply to Jonas Ådahl from comment #2) > > I wouldn't mind some gnome-shell maintainer have a say if there > > is a reason why we shouldn't. > > FWIW, I can't think of one. only thing I can think of would be if it overwrites any of the signals mutter already catches itself in meta_init() but that's not the case
(In reply to Rui Matos from comment #5) > (In reply to Florian Müllner from comment #3) > > (In reply to Jonas Ådahl from comment #2) > > > I wouldn't mind some gnome-shell maintainer have a say if there > > > is a reason why we shouldn't. > > > > FWIW, I can't think of one. > > only thing I can think of would be if it overwrites any of the signals > mutter already catches itself in meta_init() but that's not the case Exactly... It's the only thing I was a bit worried about, but it's not happening. Also GLib unix signal handlers doesn't allow to catch these anyways. To be fair, I was thinking to include this directly in Gjs, protected under more granulated environment variables (or debug calls, as it's something planned to replace GJS_DEBUG_* env vars), but while this might be useful, it's a bit more risky to be in a lib. So, let me know if you have some opinions on that... While I'd start anyway by adding this to the shell first (possibly to 3.26 too, so we can have some user reports based on this already in this cycle). Also, some context, the reason why we protect segfaults with an environment varialble is that gjs_dumpstack allocates memory, and this might break other coredumps... It looks possible to rewrite it by avoiding the GString usage (and just using write's), but it might be pointless, since mozjs might play with memory allocations anyways. (In reply to Rui Matos from comment #4) > @@ +350,3 @@ > + > + sa.sa_flags = SA_SIGINFO | SA_RESETHAND; > + sa.sa_sigaction = dump_gjs_stack_on_signal_handler; > > could use sa_handler since we aren't using anything but the signal number Agree, I was undecided, then I said, well let's take everything, maybe we want to log the memory position too or something else. But I guess we can just dumpstack here (and so I did) avoiding any other addresses related info. > @@ +492,3 @@ > + { > + dump_gjs_stack_on_signal (SIGSEGV); > + dump_gjs_stack_on_signal (SIGBUS); > > perhaps catch SIGFPE too Oh, right... This was meant to be added (also per Jonas discussion, who suggested to do that to catch some issues we had with not-a-number triggered crashes). So yes... Since SIGPIPE is already ignored by mutter, I guess the others in the list that might happen are SIGSYS (but well, this seems to be a bit too much low leve for what we care here, but who knows... JS code might still trigger it) and SIGIOT (probably just an alias in most archs).
Created attachment 361984 [details] [review] main: call gjs_dumpstack on aborts and traps and optionally on segfaults Updated as discussed
Review of attachment 361984 [details] [review]: . ::: src/main.c @@ +350,3 @@ + sa.sa_sigaction = dump_gjs_stack_on_signal_handler; + + sigaction (signo, &sa, NULL); Want me to trigger a warning in case this fails too?
Created attachment 361988 [details] [review] main: call gjs_dumpstack on aborts and traps and optionally on segfaults Ouch, added missing space before brackets.
Created attachment 361990 [details] [review] main: call gjs_dumpstack on aborts and traps and optionally on segfaults In order to debug issues triggered by JS code we might need to see the stack of it, this is not normally visible in static stack traces, thus we need to call gjs_dumpstack () before dying the process. Intercepting signals SIGABRT, SIGTRAP (needed for catching fatal glib errors) SIGFPE and SIGIOT by default, while introducing a new 'backtrace-segfaults' flag for the SHELL_DEBUG environment variable to do the same on SIGSEGV and SIGBUS (this is a precaution to avoid that we corrupt the stack for automatic errors trackers). In any case after dumping the stack we raise the signal again, in order to make the system aware of it.
(In reply to Marco Trevisan (Treviño) from comment #6) > (In reply to Rui Matos from comment #5) > > (In reply to Florian Müllner from comment #3) > > > (In reply to Jonas Ådahl from comment #2) > > > > I wouldn't mind some gnome-shell maintainer have a say if there > > > > is a reason why we shouldn't. > > > > > > FWIW, I can't think of one. > > > > only thing I can think of would be if it overwrites any of the signals > > mutter already catches itself in meta_init() but that's not the case > > Exactly... It's the only thing I was a bit worried about, but it's not > happening. > Also GLib unix signal handlers doesn't allow to catch these anyways. It's not happening because its not aborting inside any of those locked regions. It'd have to crash somewhere probably inside mozjs or gjs, and if we did, we might risk getting stuck. Couldn't we mitigate that by 1) unregistering the signal handler when triggered, then 2) adding an alarm() to retrigger the signal (probably while logging a warning about how we were stuck)?
1) is what SA_RESETHAND is supposed to do...
In fact, if I for example change the handler to be: static void dump_gjs_stack_on_signal_handler (int signo) { gjs_dumpstack (); g_assert(TRUE == FALSE); raise (signo); } This will still raise the proper signal, and print the stack just once.
Created attachment 361991 [details] [review] main: call gjs_dumpstack on aborts and traps and optionally on segfaults Updated the patch to use sa_handler instead (I forgot to change callback name before)
Review of attachment 361991 [details] [review]: ::: src/main.c @@ +334,3 @@ +dump_gjs_stack_on_signal_handler (int signo) +{ + gjs_dumpstack (); We'll still freeze if we abort inside a locked gjs/mozjs block, assuming gjs_dumpstack() does or will at any point acquire such a lock. Thus, we should do something like static void dump_gjs_stack_alarm_sigaction (int signo, siginfo_t *, void *data) { g_warning ("Failed to dump Javascript stack, got stuck"); raise (GPOINTER_TO_INT (data)); } static void dump_gjs_stack_on_signal_handler (int signo) { struct sigaction sa; memset (..); sa.sa_flags = SA_RESET_HAND; sa.sa_sigaction = dump_gjs_stack_alarm_sigaction; sigaction(...ALARM); alarm (4); gjs_dumpstack (); alarm (0); ..unsetsigacton... raise(..); }
(In reply to Jonas Ådahl from comment #15) > Review of attachment 361991 [details] [review] [review]: > > ::: src/main.c > @@ +334,3 @@ > +dump_gjs_stack_on_signal_handler (int signo) > +{ > + gjs_dumpstack (); > > We'll still freeze if we abort inside a locked gjs/mozjs block, assuming > gjs_dumpstack() does or will at any point acquire such a lock. > > Thus, we should do something like > sigaction(...ALARM); This is very reasonable... BUT, if I do that (https://pastebin.ubuntu.com/25782704/) the second handler crashes on call of sigaction, quite likely because this is allocating the struct sigaction. Maybe keeping a static around works though.
Nevermind, it was wrong the API :) But still, i guess it would be safer to allocate that data before.
Created attachment 361998 [details] [review] main: call gjs_dumpstack on aborts and traps and optionally on segfaults Updating the patch as requested, tested working using g_usleep (G_USEC_PER_SEC * 7); instead of dumping the stack... I would still probably prefer to keep the data statically allocated first. What you think?
Review of attachment 361998 [details] [review]: On my phone so cant select lines to comment on, but you can pass the cought signal as void *data with GINT_TO_POINTER then get it back with GPOINTER_TO_INT in the handler. No need for anything static nor allocated. Also, any reason to pass SA_RESETHAND in the alarm handler too?
I forgot the flag... But without static allocation it was crashing to me. All the times. Anyway it's weekend there (and in the other part of the world too), I'll fix this ASAP. But enjoy the free time now ;-) PS: no thoughts about the fact it wouldn't be better to allocate memory there (especially for segfaults)?
(In reply to Marco Trevisan (Treviño) from comment #20) > I forgot the flag... But without static allocation it was crashing to me. > All the times. > > Anyway it's weekend there (and in the other part of the world too), I'll fix > this ASAP. But enjoy the free time now ;-) > > PS: no thoughts about the fact it wouldn't be better to allocate memory > there (especially for segfaults)? If you just use GINT_TO_POINTER (signum) and pass that, doesn't that work? It doesn't allocate anything.
(In reply to Jonas Ådahl from comment #21) > If you just use GINT_TO_POINTER (signum) and pass that, doesn't that work? > It doesn't allocate anything. I'm not referring that, but the fact that there we do struct sigaction sa; Which still allocates that space, not that the dumpstack won't allocate anyways, but still... Anyway I can't pass anything to sigaction, the signature is int sigaction(int signum, const struct sigaction *act, struct sigaction *oldact); (thus the crash)
(In reply to Marco Trevisan (Treviño) from comment #22) > (In reply to Jonas Ådahl from comment #21) > > If you just use GINT_TO_POINTER (signum) and pass that, doesn't that work? > > It doesn't allocate anything. > > I'm not referring that, but the fact that there we do > > struct sigaction sa; > > Which still allocates that space, not that the dumpstack won't allocate > anyways, but still... > > Anyway I can't pass anything to sigaction, the signature is > > int sigaction(int signum, const struct sigaction *act, > struct sigaction *oldact); > > (thus the crash) Ah, the data passed to the callback is not some passable data, but a 'ucontext_t *' for some reason passed to 'void *' and referred to as 'data'. We'll have to go with the static variable then.
Review of attachment 361998 [details] [review]: Looks good. Just a spelling error to fix before pushing. ::: src/main.c @@ +39,3 @@ static gboolean is_gdm_mode = FALSE; static char *session_mode = NULL; +static int cought_signal = 0; spelling: s/cought/caught/
Created attachment 362089 [details] [review] main: call gjs_dumpstack on aborts and traps and optionally on segfaults In order to debug issues triggered by JS code we might need to see the stack of it, this is not normally visible in static stack traces, thus we need to call gjs_dumpstack () before dying the process. Intercepting signals SIGABRT, SIGTRAP (needed for catching fatal glib errors) SIGFPE and SIGIOT by default, while introducing a new 'backtrace-segfaults' flag for the SHELL_DEBUG environment variable to do the same on SIGSEGV and SIGBUS (this is a precaution to avoid that we corrupt the stack for automatic errors trackers). In any case after dumping the stack we raise the signal again, in order to make the system aware of it.
The following fix has been pushed: f90b225 main: call gjs_dumpstack on aborts and traps and optionally on segfaults
Created attachment 362090 [details] [review] main: call gjs_dumpstack on aborts and traps and optionally on segfaults In order to debug issues triggered by JS code we might need to see the stack of it, this is not normally visible in static stack traces, thus we need to call gjs_dumpstack () before dying the process. Intercepting signals SIGABRT, SIGTRAP (needed for catching fatal glib errors) SIGFPE and SIGIOT by default, while introducing a new 'backtrace-segfaults' flag for the SHELL_DEBUG environment variable to do the same on SIGSEGV and SIGBUS (this is a precaution to avoid that we corrupt the stack for automatic errors trackers). In any case after dumping the stack we raise the signal again, in order to make the system aware of it.
Created attachment 362134 [details] [review] main: use SA_NODEFER to track signals to being able to raise from alarm Some cleanups to the previous patch and using SA_NODEFER in order to be able to properly raise signals from the alarm handler.
Review of attachment 362134 [details] [review]: ::: src/main.c @@ +49,3 @@ }; static int _shell_debug; +static int _tracked_signals[NSIG] = { 0 }; This should be gboolean list right? That is how it's used below, while the index is the signal. @@ +350,1 @@ + /* Ignore all the signals starting this point, a part the one we'll raise */ IMHO we should explain *why* we do this, not what we do. What we do is documented by the code itself. @@ +350,2 @@ + /* Ignore all the signals starting this point, a part the one we'll raise */ + for (i = 0; i < G_N_ELEMENTS (_tracked_signals); ++i) IIIRC G_N_ELEMENTS returns an unsigned int, so you'll need to explicitly cast it to (int) before comparing to avoid the warning. @@ +354,3 @@ + { + signal (i, SIG_IGN); + _tracked_signals[i] = FALSE; Why set this to false? It's not like it was reset to its initial state. We also will never enter a loop like this more than once, so it doesn't matter if it stays TRUE. @@ +360,3 @@ + /* Waiting at least 5 seconds for the dumpstack, if it fails, we raise the error */ + caught_signal = signo; + signal (SIGALRM, dump_gjs_stack_alarm_sigaction); According to SIGNAL(2): The only portable use of signal() is to set a signal's disposition to SIG_DFL or SIG_IGN. The semantics when using signal() to establish a signal handler vary across systems (and POSIX.1 explicitly permits this variation); do not use it for this purpose.
Review of attachment 362134 [details] [review]: ::: src/main.c @@ +49,3 @@ }; static int _shell_debug; +static int _tracked_signals[NSIG] = { 0 }; Right...
Created attachment 362139 [details] [review] main: use SA_NODEFER to track signals to being able to raise from alarm Updated to address Jonas comments
Created attachment 362140 [details] [review] main: use SA_NODEFER to track signals to being able to raise from alarm not really... Now it should be. :-@
Review of attachment 362140 [details] [review]: lgtm
The following fix has been pushed: 891bc45 main: use SA_NODEFER to track signals to being able to raise from alarm
Created attachment 362142 [details] [review] main: use SA_NODEFER to track signals to being able to raise from alarm After we receive one of the tracked signals, in case we get stuck inside the gjs_dumpstack () call, we use an alarm to raise the previously emitted signal, however without using SA_NODEFER, the raise inside the alarm handler will be ignored. To avoid to handle new signals caused by the handler calls, once we get the first signal, we just ignore them all as they could only lead to dirty traces. Also, cleaning up a bit the code, and disabling the shell log handler in dump_gjs_stack_alarm_sigaction since this might lead to a new gjs_dumpstack () request.