Bug 671997 – Unix signal handling assumes that volatile 1-byte writes are atomic

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 671997 - Unix signal handling assumes that volatile 1-byte writes are atomic


Summary:	Unix signal handling assumes that volatile 1-byte writes are atomic


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	mainloop
Version:	2.31.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2012-03-13 15:31 UTC by Simon McVittie
Modified:	2012-03-16 21:11 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
gmain: Use sig_atomic_t for list of pending Unix signals (1.01 KB, patch) 2012-03-13 18:51 UTC, Colin Walters	none	Details \| Review
gmain: Use sig_atomic_t for list of pending Unix signals (1.90 KB, patch) 2012-03-16 20:16 UTC, Colin Walters	committed	Details \| Review

Description Simon McVittie 2012-03-13 15:31:55 UTC

According to sigaction(3posix):

       If  the  signal  occurs  other  than  as the result of calling abort(),
       kill(), or raise(), the behavior is undefined  if  the  signal  handler
       [...] refers to  any  object  with  static
       storage  duration  other  than by assigning a value to a static storage
       duration variable of type volatile sig_atomic_t.

I believe the rationale for this is to avoid situations like this on architectures where bytes are not set individually:

* normal code wants to load/modify/store a byte
  which shares a machine word with the one used by the signal handler,
  perhaps this:

      volatile gchar unix_signal_pending[NSIG]

  (suppose it's setting u_s_p[1] 1 to 0, for instance)

* it gets as far as executing the load and maybe the modify

* a signal arrives and the normal code is interrupted

* the signal handler sets u_s_p[2] to 1, which is really a 32-bit
  load/modify/store to u_s_p[0..3] or similar

* back to normal code

* the store happens and the change to u_s_p[2] from the signal
  handler has been lost!

but I could be wrong. It's undefined behaviour, anyway.

Until 2.30 this was a pipe-to-self, which is well-understood and async-signal-safe. I believe replacing the volatile gchar and gboolean variables written by the signal handler with volatile sig_atomic_t would fix this; so would returning to a pipe-to-self setup.

Comment 1 Colin Walters 2012-03-13 15:47:27 UTC

Is this something you saw in practice or you're just theorizing?

Changing to use sig_atomic_t makes sense to me, I assume it would expand to word size on most non-x86 architectures.

Hm, so a concrete problem that might happen here on an architecture which can only do atomic word assignment would be that we might miss signals.

Comment 2 Simon McVittie 2012-03-13 18:17:45 UTC

(In reply to comment #1)
> Is this something you saw in practice or you're just theorizing?

Just theorizing. I was reviewing a faulty signal handler in Telepathy code (which can't use GLib's facilities for this yet, since our policy is to only depend on stable-branches), started to say "look in D-Bus or GLib for an example of how to do it Right™", then realized that GLib actually has undefined behaviour. (D-Bus uses a pipe-to-self; if GLib's implementation used sig_atomic_t, GLib and D-Bus between them would provide examples of both the well-defined approaches to writing a signal handler).

> Changing to use sig_atomic_t makes sense to me, I assume it would expand to
> word size on most non-x86 architectures.

It seems to be int on all glibc platforms, in practice. I don't know about other Unixes.

The glibc manual claims that on "all of the machines that the GNU C library supports, and on all POSIX systems we know of", int, everything smaller than int, and all pointer types are atomic.

One of the answers to <http://stackoverflow.com/questions/9606725/linux-why-is-sig-atomic-t-typedefed-to-int> claims that the glibc manual is wrong, citing AVR as a problem case (AVR is an 8-bit microcontroller - not to be confused with AVR32 - so it's not interesting for either glibc or GLib, but Linux has apparently been ported to it).

> Hm, so a concrete problem that might happen here on an architecture which can
> only do atomic word assignment would be that we might miss signals.

Right, that's the "direction" I described: our variables are smaller than a sig_atomic_t.

The other "direction" is a more common cause for problems, although it isn't possible in this particular code (assuming gboolean == int is no larger than a machine word): if you try to access a type larger than a machine word in a signal handler (most commonly int64 or double on 32-bit platforms), the two or more read/write operations needed to get/set the whole value can be interleaved with the two or more operations needed to set/get it in the main program.

Comment 3 Simon McVittie 2012-03-13 18:28:30 UTC

I sometimes think it'd be good for GLib (and D-Bus, and other portable projects) to document what architectural assumptions they make, beyond what C guarantees: there are plenty of platforms allowed by C99 that we don't support because they're rare and difficult. (For instance, platforms where a function pointer is bigger than a data pointer, or where NULL pointers aren't all-bits-zero, are allowed by C, but GLib is unlikely to ever work on such platforms.)

Comment 4 Dan Winship 2012-03-13 18:47:21 UTC

(In reply to comment #3)
> I sometimes think it'd be good for GLib (and D-Bus, and other portable
> projects) to document what architectural assumptions they make, beyond what C
> guarantees

I started doing that a long time ago (though more for "GNOME in general" than for any particular package) but never did anything with it. http://mysterion.org/~danw/c-assumptions.txt. Feel free to run with that.

Comment 5 Colin Walters 2012-03-13 18:51:12 UTC

Created attachment 209638 [details] [review]
gmain: Use sig_atomic_t for list of pending Unix signals

Pointed out by: Simon McVittie <simon.mcvittie@collabora.co.uk>

Comment 6 Colin Walters 2012-03-13 19:03:02 UTC

One concern I have about the above is what platforms provide sig_atomic_t...but I guess if it fails later we can add a configure test (fallback to int?) or something.

Comment 7 Matthias Clasen 2012-03-14 10:35:47 UTC

Why would we make this more complicated ? writing a single byte has been the traditional way of doing this. Some committee inventing a new type for it won't break that...

Comment 8 Colin Walters 2012-03-14 13:41:36 UTC

(In reply to comment #7)
> Why would we make this more complicated ? writing a single byte has been the
> traditional way of doing this. Some committee inventing a new type for it won't
> break that...

Writing a single byte...to a pipe?  Yes, but we're not using a pipe anymore.

And writing a single byte to an array of bytes *is* flawed as smcv says.  The cost to turn it into an array of ints is negligible.

Comment 9 Matthias Clasen 2012-03-15 22:07:03 UTC

Of course, I was talking about pipes...

Comment 10 Colin Walters 2012-03-16 14:13:49 UTC

(In reply to comment #9)
> Of course, I was talking about pipes...

Well we *are* using an eventfd which gives us the "safely wake up main thread" semantics.  

There's actually two ways to use pipes for signal handling:

1) The byte you write to the pipe is the signal number.  This is how DBus does it.
2) In the signal handler, you set a static int variable, and write a fixed byte to the pipe (this is how GLib *used* to do it)

What we've changed in GLib is replacing the static-byte-to-pipe with an eventfd, which is strictly more efficient.  However we introduced a bug by using 'char' for the array of signals.

But if there's a platform which can't do atomic int-sized assignments (and I bet there are some 64 bit platforms like that), then GLib has always been buggy in this respect up until recently, since with the introduction of GChildWatchSource in the GLIB_2_4_0 timeframe:

static void
g_child_watch_signal_handler (int signum)
{
  child_watch_count ++;

  if (child_watch_init_state == CHILD_WATCH_INITIALIZED_THREADED)
    {
      write (child_watch_wake_up_pipe[1], "B", 1);
    }
  else
    {
      /* We count on the signal interrupting the poll in the same thread.
       */
    }
}

That increment wouldn't be atomic; it needs sig_atomic_t.

Comment 11 Colin Walters 2012-03-16 14:16:36 UTC

Bottom line is I believe my patch fixes the reported bug here, and after that what GLib is doing for signals is about the best we can do given the present kernel/libc APIs.

Comment 12 Allison Karlitskaya (desrt) 2012-03-16 17:36:46 UTC

The patch looks good with one exception: are we sure sig_atomic_t exists on all platforms?

Comment 13 Colin Walters 2012-03-16 20:16:10 UTC

Created attachment 209948 [details] [review]
gmain: Use sig_atomic_t for list of pending Unix signals

Now with a configure check

Comment 14 Allison Karlitskaya (desrt) 2012-03-16 20:30:20 UTC

Without commenting on the validity of your auto-fu (which I assume was copy/pasted), this looks good.

Comment 15 Colin Walters 2012-03-16 21:11:16 UTC

FWIW said auto-fu wasn't copy-pasted exactly, but I was reading
AC_DEFUN([jm_AC_TYPE_LONG_LONG], ...) from acinclude.m4 as I was
writing.

Attachment 209948 [details] pushed as 6833385 - gmain: Use sig_atomic_t for list of pending Unix signals