After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 789237 - Call gjs_dumpstack on aborts and traps and optionally on segfaults
Call gjs_dumpstack on aborts and traps and optionally on segfaults
Status: RESOLVED FIXED
Product: gnome-shell
Classification: Core
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: gnome-shell-maint
gnome-shell-maint
Depends on:
Blocks:
 
 
Reported: 2017-10-20 10:44 UTC by Marco Trevisan (Treviño)
Modified: 2017-10-24 06:19 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
main: call gjs_dumpstack on aborts and traps and optionally on segfaults (2.54 KB, patch)
2017-10-20 10:44 UTC, Marco Trevisan (Treviño)
none Details | Review
main: call gjs_dumpstack on aborts and traps and optionally on segfaults (2.65 KB, patch)
2017-10-20 21:03 UTC, Marco Trevisan (Treviño)
none Details | Review
main: call gjs_dumpstack on aborts and traps and optionally on segfaults (2.66 KB, patch)
2017-10-21 00:21 UTC, Marco Trevisan (Treviño)
none Details | Review
main: call gjs_dumpstack on aborts and traps and optionally on segfaults (2.66 KB, patch)
2017-10-21 01:31 UTC, Marco Trevisan (Treviño)
none Details | Review
main: call gjs_dumpstack on aborts and traps and optionally on segfaults (2.65 KB, patch)
2017-10-21 02:34 UTC, Marco Trevisan (Treviño)
none Details | Review
main: call gjs_dumpstack on aborts and traps and optionally on segfaults (3.50 KB, patch)
2017-10-21 03:31 UTC, Marco Trevisan (Treviño)
none Details | Review
main: call gjs_dumpstack on aborts and traps and optionally on segfaults (3.50 KB, patch)
2017-10-23 10:03 UTC, Marco Trevisan (Treviño)
committed Details | Review
main: call gjs_dumpstack on aborts and traps and optionally on segfaults (3.50 KB, patch)
2017-10-23 10:06 UTC, Marco Trevisan (Treviño)
committed Details | Review
main: use SA_NODEFER to track signals to being able to raise from alarm (3.19 KB, patch)
2017-10-24 05:29 UTC, Marco Trevisan (Treviño)
none Details | Review
main: use SA_NODEFER to track signals to being able to raise from alarm (3.19 KB, patch)
2017-10-24 06:02 UTC, Marco Trevisan (Treviño)
none Details | Review
main: use SA_NODEFER to track signals to being able to raise from alarm (3.34 KB, patch)
2017-10-24 06:04 UTC, Marco Trevisan (Treviño)
committed Details | Review
main: use SA_NODEFER to track signals to being able to raise from alarm (3.34 KB, patch)
2017-10-24 06:19 UTC, Marco Trevisan (Treviño)
committed Details | Review

Description Marco Trevisan (Treviño) 2017-10-20 10:44:24 UTC
In order to debug properly some crashes which are triggered by JS code
we might need to see the the JS stack.

Adding some debug bits in order to get this possible.
Comment 1 Marco Trevisan (Treviño) 2017-10-20 10:44:29 UTC
Created attachment 361927 [details] [review]
main: call gjs_dumpstack on aborts and traps and optionally on segfaults

In order to debug issues triggered by JS code we might need to
see the stack of it, this is not normally visible in static stack traces,
thus we need to call gjs_dumpstack () before dying the process.

Intercepting signals SIGABRT and SIGTRAP (needed for catching fatal glib
errors) by default, while introducing a new 'backtrace-segfaults' flag for
the SHELL_DEBUG environment variable to do the same on SIGSEGV and SIGBUS.

In any case after dumping the stack we raise the signal again, in order
to make the system aware of it.
Comment 2 Jonas Ådahl 2017-10-20 12:51:32 UTC
Review of attachment 361927 [details] [review]:

I think this is reasonable, but I'll wouldn't mind some gnome-shell maintainer have a say if there is a reason why we shouldn't.

::: src/main.c
@@ +334,3 @@
+dump_gjs_stack_on_signal_handler (int signo,
+                                  siginfo_t *si,
+                                  void *unused)

nit: alignment
Comment 3 Florian Müllner 2017-10-20 12:54:51 UTC
(In reply to Jonas Ådahl from comment #2)
> I wouldn't mind some gnome-shell maintainer have a say if there 
> is a reason why we shouldn't.

FWIW, I can't think of one.
Comment 4 Rui Matos 2017-10-20 14:59:17 UTC
Review of attachment 361927 [details] [review]:

::: src/main.c
@@ +336,3 @@
+                                  void *unused)
+{
+  gjs_dumpstack ();

this might end up getting stuck if any of the locks it acquires in gjs and glib are being held already but can't think of any other way and the process is dying for these signals anyway

@@ +350,3 @@
+
+  sa.sa_flags     = SA_SIGINFO | SA_RESETHAND;
+  sa.sa_sigaction = dump_gjs_stack_on_signal_handler;

could use sa_handler since we aren't using anything but the signal number

@@ +492,3 @@
+    {
+      dump_gjs_stack_on_signal (SIGSEGV);
+      dump_gjs_stack_on_signal (SIGBUS);

perhaps catch SIGFPE too
Comment 5 Rui Matos 2017-10-20 15:01:04 UTC
(In reply to Florian Müllner from comment #3)
> (In reply to Jonas Ådahl from comment #2)
> > I wouldn't mind some gnome-shell maintainer have a say if there 
> > is a reason why we shouldn't.
> 
> FWIW, I can't think of one.

only thing I can think of would be if it overwrites any of the signals mutter already catches itself in meta_init() but that's not the case
Comment 6 Marco Trevisan (Treviño) 2017-10-20 20:01:27 UTC
(In reply to Rui Matos from comment #5)
> (In reply to Florian Müllner from comment #3)
> > (In reply to Jonas Ådahl from comment #2)
> > > I wouldn't mind some gnome-shell maintainer have a say if there 
> > > is a reason why we shouldn't.
> > 
> > FWIW, I can't think of one.
> 
> only thing I can think of would be if it overwrites any of the signals
> mutter already catches itself in meta_init() but that's not the case

Exactly... It's the only thing I was a bit worried about, but it's not happening.
Also GLib unix signal handlers doesn't allow to catch these anyways.

To be fair, I was thinking to include this directly in Gjs, protected under more granulated environment variables (or debug calls, as it's something planned to replace GJS_DEBUG_* env vars), but while this might be useful, it's a bit more risky to be in a lib. So, let me know if you have some opinions on that...

While I'd start anyway by adding this to the shell first (possibly to 3.26 too, so we can have some user reports based on this already in this cycle).

Also, some context, the reason why we protect segfaults with an environment varialble is that gjs_dumpstack allocates memory, and this might break other coredumps... It looks possible to rewrite it by avoiding the GString usage (and just using write's), but it might be pointless, since mozjs might play with memory allocations anyways.

(In reply to Rui Matos from comment #4)
> @@ +350,3 @@
> +
> +  sa.sa_flags     = SA_SIGINFO | SA_RESETHAND;
> +  sa.sa_sigaction = dump_gjs_stack_on_signal_handler;
> 
> could use sa_handler since we aren't using anything but the signal number

Agree, I was undecided, then I said, well let's take everything, maybe we want to log the memory position too or something else. But I guess we can just dumpstack here (and so I did) avoiding any other addresses related info.

> @@ +492,3 @@
> +    {
> +      dump_gjs_stack_on_signal (SIGSEGV);
> +      dump_gjs_stack_on_signal (SIGBUS);
> 
> perhaps catch SIGFPE too

Oh, right... This was meant to be added (also per Jonas discussion, who suggested to do that to catch some issues we had with not-a-number triggered crashes). So yes... 

Since SIGPIPE is already ignored by mutter, I guess the others in the list that might happen are SIGSYS (but well, this seems to be a bit too much low leve for what we care here, but who knows... JS code might still trigger it) and SIGIOT (probably just an alias in most archs).
Comment 7 Marco Trevisan (Treviño) 2017-10-20 21:03:29 UTC
Created attachment 361984 [details] [review]
main: call gjs_dumpstack on aborts and traps and optionally on segfaults

Updated as discussed
Comment 8 Marco Trevisan (Treviño) 2017-10-20 21:05:05 UTC
Review of attachment 361984 [details] [review]:

.

::: src/main.c
@@ +350,3 @@
+  sa.sa_sigaction = dump_gjs_stack_on_signal_handler;
+
+  sigaction (signo, &sa, NULL);

Want me to trigger a warning in case this fails too?
Comment 9 Marco Trevisan (Treviño) 2017-10-21 00:21:10 UTC
Created attachment 361988 [details] [review]
main: call gjs_dumpstack on aborts and traps and optionally on segfaults

Ouch, added missing space before brackets.
Comment 10 Marco Trevisan (Treviño) 2017-10-21 01:31:12 UTC
Created attachment 361990 [details] [review]
main: call gjs_dumpstack on aborts and traps and optionally on segfaults

In order to debug issues triggered by JS code we might need to
see the stack of it, this is not normally visible in static stack traces,
thus we need to call gjs_dumpstack () before dying the process.

Intercepting signals SIGABRT, SIGTRAP (needed for catching fatal glib
errors) SIGFPE and SIGIOT by default, while introducing a new
'backtrace-segfaults' flag for the SHELL_DEBUG environment variable to
do the same on SIGSEGV and SIGBUS (this is a precaution to avoid that we
corrupt the stack for automatic errors trackers).

In any case after dumping the stack we raise the signal again, in order
to make the system aware of it.
Comment 11 Jonas Ådahl 2017-10-21 01:55:39 UTC
(In reply to Marco Trevisan (Treviño) from comment #6)
> (In reply to Rui Matos from comment #5)
> > (In reply to Florian Müllner from comment #3)
> > > (In reply to Jonas Ådahl from comment #2)
> > > > I wouldn't mind some gnome-shell maintainer have a say if there 
> > > > is a reason why we shouldn't.
> > > 
> > > FWIW, I can't think of one.
> > 
> > only thing I can think of would be if it overwrites any of the signals
> > mutter already catches itself in meta_init() but that's not the case
> 
> Exactly... It's the only thing I was a bit worried about, but it's not
> happening.
> Also GLib unix signal handlers doesn't allow to catch these anyways.

It's not happening because its not aborting inside any of those locked regions. It'd have to crash somewhere probably inside mozjs or gjs, and if we did, we might risk getting stuck.

Couldn't we mitigate that by 1) unregistering the signal handler when triggered, then 2) adding an alarm() to retrigger the signal (probably while logging a warning about how we were stuck)?
Comment 12 Marco Trevisan (Treviño) 2017-10-21 02:25:18 UTC
1) is what SA_RESETHAND is supposed to do...
Comment 13 Marco Trevisan (Treviño) 2017-10-21 02:32:22 UTC
In fact, if I for example change the handler to be:

static void
dump_gjs_stack_on_signal_handler (int signo)
{
  gjs_dumpstack ();
  g_assert(TRUE == FALSE);

  raise (signo);
}

This will still raise the proper signal, and print the stack just once.
Comment 14 Marco Trevisan (Treviño) 2017-10-21 02:34:26 UTC
Created attachment 361991 [details] [review]
main: call gjs_dumpstack on aborts and traps and optionally on segfaults

Updated the patch to use sa_handler instead (I forgot to change callback name before)
Comment 15 Jonas Ådahl 2017-10-21 02:45:59 UTC
Review of attachment 361991 [details] [review]:

::: src/main.c
@@ +334,3 @@
+dump_gjs_stack_on_signal_handler (int signo)
+{
+  gjs_dumpstack ();

We'll still freeze if we abort inside a locked gjs/mozjs block, assuming gjs_dumpstack() does or will at any point acquire such a lock.

Thus, we should do something like

static void
dump_gjs_stack_alarm_sigaction (int signo, siginfo_t *, void *data)
{
  g_warning ("Failed to dump Javascript stack, got stuck");
  raise (GPOINTER_TO_INT (data));
}

static void
dump_gjs_stack_on_signal_handler (int signo)
{
  struct sigaction sa;

  memset (..);
  sa.sa_flags = SA_RESET_HAND;
  sa.sa_sigaction = dump_gjs_stack_alarm_sigaction;
  sigaction(...ALARM);

  alarm (4);
  gjs_dumpstack ();
  alarm (0);

  ..unsetsigacton...

  raise(..);
}
Comment 16 Marco Trevisan (Treviño) 2017-10-21 03:18:01 UTC
(In reply to Jonas Ådahl from comment #15)
> Review of attachment 361991 [details] [review] [review]:
> 
> ::: src/main.c
> @@ +334,3 @@
> +dump_gjs_stack_on_signal_handler (int signo)
> +{
> +  gjs_dumpstack ();
> 
> We'll still freeze if we abort inside a locked gjs/mozjs block, assuming
> gjs_dumpstack() does or will at any point acquire such a lock.
> 
> Thus, we should do something like
>   sigaction(...ALARM);

This is very reasonable... BUT, if I do that (https://pastebin.ubuntu.com/25782704/) the second handler crashes on call of sigaction, quite likely because this is allocating the struct sigaction.
Maybe keeping a static around works though.
Comment 17 Marco Trevisan (Treviño) 2017-10-21 03:27:00 UTC
Nevermind, it was wrong the API :)

But still, i guess it would be safer to allocate that data before.
Comment 18 Marco Trevisan (Treviño) 2017-10-21 03:31:39 UTC
Created attachment 361998 [details] [review]
main: call gjs_dumpstack on aborts and traps and optionally on segfaults

Updating the patch as requested, tested working using
g_usleep (G_USEC_PER_SEC * 7); instead of dumping the stack...

I would still probably prefer to keep the data statically allocated first.
What you think?
Comment 19 Jonas Ådahl 2017-10-21 04:40:54 UTC
Review of attachment 361998 [details] [review]:

On my phone so cant select lines to comment on, but you can pass the cought signal as void *data with GINT_TO_POINTER then get it back with GPOINTER_TO_INT in the handler. No need for anything static nor allocated.

Also, any reason to pass SA_RESETHAND in the alarm handler too?
Comment 20 Marco Trevisan (Treviño) 2017-10-21 07:43:55 UTC
I forgot the flag... But without static allocation it was crashing to me. All the times.

Anyway it's weekend there (and in the other part of the world too), I'll fix this ASAP. But enjoy the free time now ;-)

PS: no thoughts about the fact it wouldn't be better to allocate memory there (especially for segfaults)?
Comment 21 Jonas Ådahl 2017-10-21 09:25:11 UTC
(In reply to Marco Trevisan (Treviño) from comment #20)
> I forgot the flag... But without static allocation it was crashing to me.
> All the times.
> 
> Anyway it's weekend there (and in the other part of the world too), I'll fix
> this ASAP. But enjoy the free time now ;-)
> 
> PS: no thoughts about the fact it wouldn't be better to allocate memory
> there (especially for segfaults)?

If you just use GINT_TO_POINTER (signum) and pass that, doesn't that work? It doesn't allocate anything.
Comment 22 Marco Trevisan (Treviño) 2017-10-21 20:32:17 UTC
(In reply to Jonas Ådahl from comment #21)
> If you just use GINT_TO_POINTER (signum) and pass that, doesn't that work?
> It doesn't allocate anything.

I'm not referring that, but the fact that there we do

 struct sigaction sa;

Which still allocates that space, not that the dumpstack won't allocate anyways, but still...

Anyway I can't pass anything to sigaction, the signature is 

  int sigaction(int signum, const struct sigaction *act,
                struct sigaction *oldact);

(thus the crash)
Comment 23 Jonas Ådahl 2017-10-23 07:20:28 UTC
(In reply to Marco Trevisan (Treviño) from comment #22)
> (In reply to Jonas Ådahl from comment #21)
> > If you just use GINT_TO_POINTER (signum) and pass that, doesn't that work?
> > It doesn't allocate anything.
> 
> I'm not referring that, but the fact that there we do
> 
>  struct sigaction sa;
> 
> Which still allocates that space, not that the dumpstack won't allocate
> anyways, but still...
> 
> Anyway I can't pass anything to sigaction, the signature is 
> 
>   int sigaction(int signum, const struct sigaction *act,
>                 struct sigaction *oldact);
> 
> (thus the crash)

Ah, the data passed to the callback is not some passable data, but a 'ucontext_t *' for some reason passed to 'void *' and referred to as 'data'. We'll have to go with the static variable then.
Comment 24 Jonas Ådahl 2017-10-23 07:22:09 UTC
Review of attachment 361998 [details] [review]:

Looks good. Just a spelling error to fix before pushing.

::: src/main.c
@@ +39,3 @@
 static gboolean is_gdm_mode = FALSE;
 static char *session_mode = NULL;
+static int cought_signal = 0;

spelling: s/cought/caught/
Comment 25 Marco Trevisan (Treviño) 2017-10-23 10:03:55 UTC
Created attachment 362089 [details] [review]
main: call gjs_dumpstack on aborts and traps and optionally on segfaults

In order to debug issues triggered by JS code we might need to
see the stack of it, this is not normally visible in static stack traces,
thus we need to call gjs_dumpstack () before dying the process.

Intercepting signals SIGABRT, SIGTRAP (needed for catching fatal glib
errors) SIGFPE and SIGIOT by default, while introducing a new
'backtrace-segfaults' flag for the SHELL_DEBUG environment variable to
do the same on SIGSEGV and SIGBUS (this is a precaution to avoid that we
corrupt the stack for automatic errors trackers).

In any case after dumping the stack we raise the signal again, in order
to make the system aware of it.
Comment 26 Marco Trevisan (Treviño) 2017-10-23 10:06:20 UTC
The following fix has been pushed:
f90b225 main: call gjs_dumpstack on aborts and traps and optionally on segfaults
Comment 27 Marco Trevisan (Treviño) 2017-10-23 10:06:30 UTC
Created attachment 362090 [details] [review]
main: call gjs_dumpstack on aborts and traps and optionally on segfaults

In order to debug issues triggered by JS code we might need to
see the stack of it, this is not normally visible in static stack traces,
thus we need to call gjs_dumpstack () before dying the process.

Intercepting signals SIGABRT, SIGTRAP (needed for catching fatal glib
errors) SIGFPE and SIGIOT by default, while introducing a new
'backtrace-segfaults' flag for the SHELL_DEBUG environment variable to
do the same on SIGSEGV and SIGBUS (this is a precaution to avoid that we
corrupt the stack for automatic errors trackers).

In any case after dumping the stack we raise the signal again, in order
to make the system aware of it.
Comment 28 Marco Trevisan (Treviño) 2017-10-24 05:29:44 UTC
Created attachment 362134 [details] [review]
main: use SA_NODEFER to track signals to being able to raise from alarm

Some cleanups to the previous patch and using SA_NODEFER in order to be
able to properly raise signals from the alarm handler.
Comment 29 Jonas Ådahl 2017-10-24 05:44:56 UTC
Review of attachment 362134 [details] [review]:

::: src/main.c
@@ +49,3 @@
 };
 static int _shell_debug;
+static int _tracked_signals[NSIG] = { 0 };

This should be gboolean list right? That is how it's used below, while the index is the signal.

@@ +350,1 @@
+  /* Ignore all the signals starting this point, a part the one we'll raise */

IMHO we should explain *why* we do this, not what we do. What we do is documented by the code itself.

@@ +350,2 @@
+  /* Ignore all the signals starting this point, a part the one we'll raise */
+  for (i = 0; i < G_N_ELEMENTS (_tracked_signals); ++i)

IIIRC G_N_ELEMENTS returns an unsigned int, so you'll need to explicitly cast it to (int) before comparing to avoid the warning.

@@ +354,3 @@
+        {
+          signal (i, SIG_IGN);
+          _tracked_signals[i] = FALSE;

Why set this to false? It's not like it was reset to its initial state. We also will never enter a loop like this more than once, so it doesn't matter if it stays TRUE.

@@ +360,3 @@
+  /* Waiting at least 5 seconds for the dumpstack, if it fails, we raise the error */
+  caught_signal = signo;
+  signal (SIGALRM, dump_gjs_stack_alarm_sigaction);

According to SIGNAL(2):

       The only portable use of signal() is to set a signal's  disposition  to
       SIG_DFL  or  SIG_IGN.  The semantics when using signal() to establish a
       signal handler vary across systems (and POSIX.1 explicitly permits this
       variation); do not use it for this purpose.
Comment 30 Marco Trevisan (Treviño) 2017-10-24 05:51:06 UTC
Review of attachment 362134 [details] [review]:

::: src/main.c
@@ +49,3 @@
 };
 static int _shell_debug;
+static int _tracked_signals[NSIG] = { 0 };

Right...
Comment 31 Marco Trevisan (Treviño) 2017-10-24 06:02:30 UTC
Created attachment 362139 [details] [review]
main: use SA_NODEFER to track signals to being able to raise from alarm

Updated to address Jonas comments
Comment 32 Marco Trevisan (Treviño) 2017-10-24 06:04:51 UTC
Created attachment 362140 [details] [review]
main: use SA_NODEFER to track signals to being able to raise from alarm

not really... Now it should be. :-@
Comment 33 Jonas Ådahl 2017-10-24 06:09:53 UTC
Review of attachment 362140 [details] [review]:

lgtm
Comment 34 Marco Trevisan (Treviño) 2017-10-24 06:19:27 UTC
The following fix has been pushed:
891bc45 main: use SA_NODEFER to track signals to being able to raise from alarm
Comment 35 Marco Trevisan (Treviño) 2017-10-24 06:19:34 UTC
Created attachment 362142 [details] [review]
main: use SA_NODEFER to track signals to being able to raise from alarm

After we receive one of the tracked signals, in case we get stuck inside
the gjs_dumpstack () call, we use an alarm to raise the previously emitted
signal, however without using SA_NODEFER, the raise inside the alarm handler
will be ignored.

To avoid to handle new signals caused by the handler calls, once we get the
first signal, we just ignore them all as they could only lead to dirty traces.

Also, cleaning up a bit the code, and disabling the shell log handler
in dump_gjs_stack_alarm_sigaction since this might lead to a new
gjs_dumpstack () request.