After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 671997 - Unix signal handling assumes that volatile 1-byte writes are atomic
Unix signal handling assumes that volatile 1-byte writes are atomic
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: mainloop
2.31.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2012-03-13 15:31 UTC by Simon McVittie
Modified: 2012-03-16 21:11 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gmain: Use sig_atomic_t for list of pending Unix signals (1.01 KB, patch)
2012-03-13 18:51 UTC, Colin Walters
none Details | Review
gmain: Use sig_atomic_t for list of pending Unix signals (1.90 KB, patch)
2012-03-16 20:16 UTC, Colin Walters
committed Details | Review

Description Simon McVittie 2012-03-13 15:31:55 UTC
According to sigaction(3posix):

       If  the  signal  occurs  other  than  as the result of calling abort(),
       kill(), or raise(), the behavior is undefined  if  the  signal  handler
       [...] refers to  any  object  with  static
       storage  duration  other  than by assigning a value to a static storage
       duration variable of type volatile sig_atomic_t.

I believe the rationale for this is to avoid situations like this on architectures where bytes are not set individually:

* normal code wants to load/modify/store a byte
  which shares a machine word with the one used by the signal handler,
  perhaps this:

      volatile gchar unix_signal_pending[NSIG]

  (suppose it's setting u_s_p[1] 1 to 0, for instance)

* it gets as far as executing the load and maybe the modify

* a signal arrives and the normal code is interrupted

* the signal handler sets u_s_p[2] to 1, which is really a 32-bit
  load/modify/store to u_s_p[0..3] or similar

* back to normal code

* the store happens and the change to u_s_p[2] from the signal
  handler has been lost!

but I could be wrong. It's undefined behaviour, anyway.

Until 2.30 this was a pipe-to-self, which is well-understood and async-signal-safe. I believe replacing the volatile gchar and gboolean variables written by the signal handler with volatile sig_atomic_t would fix this; so would returning to a pipe-to-self setup.
Comment 1 Colin Walters 2012-03-13 15:47:27 UTC
Is this something you saw in practice or you're just theorizing?

Changing to use sig_atomic_t makes sense to me, I assume it would expand to word size on most non-x86 architectures.

Hm, so a concrete problem that might happen here on an architecture which can only do atomic word assignment would be that we might miss signals.
Comment 2 Simon McVittie 2012-03-13 18:17:45 UTC
(In reply to comment #1)
> Is this something you saw in practice or you're just theorizing?

Just theorizing. I was reviewing a faulty signal handler in Telepathy code (which can't use GLib's facilities for this yet, since our policy is to only depend on stable-branches), started to say "look in D-Bus or GLib for an example of how to do it Right™", then realized that GLib actually has undefined behaviour. (D-Bus uses a pipe-to-self; if GLib's implementation used sig_atomic_t, GLib and D-Bus between them would provide examples of both the well-defined approaches to writing a signal handler).

> Changing to use sig_atomic_t makes sense to me, I assume it would expand to
> word size on most non-x86 architectures.

It seems to be int on all glibc platforms, in practice. I don't know about other Unixes.

The glibc manual claims that on "all of the machines that the GNU C library supports, and on all POSIX systems we know of", int, everything smaller than int, and all pointer types are atomic.

One of the answers to <http://stackoverflow.com/questions/9606725/linux-why-is-sig-atomic-t-typedefed-to-int> claims that the glibc manual is wrong, citing AVR as a problem case (AVR is an 8-bit microcontroller - not to be confused with AVR32 - so it's not interesting for either glibc or GLib, but Linux has apparently been ported to it).

> Hm, so a concrete problem that might happen here on an architecture which can
> only do atomic word assignment would be that we might miss signals.

Right, that's the "direction" I described: our variables are smaller than a sig_atomic_t.

The other "direction" is a more common cause for problems, although it isn't possible in this particular code (assuming gboolean == int is no larger than a machine word): if you try to access a type larger than a machine word in a signal handler (most commonly int64 or double on 32-bit platforms), the two or more read/write operations needed to get/set the whole value can be interleaved with the two or more operations needed to set/get it in the main program.
Comment 3 Simon McVittie 2012-03-13 18:28:30 UTC
I sometimes think it'd be good for GLib (and D-Bus, and other portable projects) to document what architectural assumptions they make, beyond what C guarantees: there are plenty of platforms allowed by C99 that we don't support because they're rare and difficult. (For instance, platforms where a function pointer is bigger than a data pointer, or where NULL pointers aren't all-bits-zero, are allowed by C, but GLib is unlikely to ever work on such platforms.)
Comment 4 Dan Winship 2012-03-13 18:47:21 UTC
(In reply to comment #3)
> I sometimes think it'd be good for GLib (and D-Bus, and other portable
> projects) to document what architectural assumptions they make, beyond what C
> guarantees

I started doing that a long time ago (though more for "GNOME in general" than for any particular package) but never did anything with it. http://mysterion.org/~danw/c-assumptions.txt. Feel free to run with that.
Comment 5 Colin Walters 2012-03-13 18:51:12 UTC
Created attachment 209638 [details] [review]
gmain: Use sig_atomic_t for list of pending Unix signals

Pointed out by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Comment 6 Colin Walters 2012-03-13 19:03:02 UTC
One concern I have about the above is what platforms provide sig_atomic_t...but I guess if it fails later we can add a configure test (fallback to int?) or something.
Comment 7 Matthias Clasen 2012-03-14 10:35:47 UTC
Why would we make this more complicated ? writing a single byte has been the traditional way of doing this. Some committee inventing a new type for it won't break that...
Comment 8 Colin Walters 2012-03-14 13:41:36 UTC
(In reply to comment #7)
> Why would we make this more complicated ? writing a single byte has been the
> traditional way of doing this. Some committee inventing a new type for it won't
> break that...

Writing a single byte...to a pipe?  Yes, but we're not using a pipe anymore.

And writing a single byte to an array of bytes *is* flawed as smcv says.  The cost to turn it into an array of ints is negligible.
Comment 9 Matthias Clasen 2012-03-15 22:07:03 UTC
Of course, I was talking about pipes...
Comment 10 Colin Walters 2012-03-16 14:13:49 UTC
(In reply to comment #9)
> Of course, I was talking about pipes...

Well we *are* using an eventfd which gives us the "safely wake up main thread" semantics.  

There's actually two ways to use pipes for signal handling:

1) The byte you write to the pipe is the signal number.  This is how DBus does it.
2) In the signal handler, you set a static int variable, and write a fixed byte to the pipe (this is how GLib *used* to do it)

What we've changed in GLib is replacing the static-byte-to-pipe with an eventfd, which is strictly more efficient.  However we introduced a bug by using 'char' for the array of signals.

But if there's a platform which can't do atomic int-sized assignments (and I bet there are some 64 bit platforms like that), then GLib has always been buggy in this respect up until recently, since with the introduction of GChildWatchSource in the GLIB_2_4_0 timeframe:

static void
g_child_watch_signal_handler (int signum)
{
  child_watch_count ++;

  if (child_watch_init_state == CHILD_WATCH_INITIALIZED_THREADED)
    {
      write (child_watch_wake_up_pipe[1], "B", 1);
    }
  else
    {
      /* We count on the signal interrupting the poll in the same thread.
       */
    }
}

That increment wouldn't be atomic; it needs sig_atomic_t.
Comment 11 Colin Walters 2012-03-16 14:16:36 UTC
Bottom line is I believe my patch fixes the reported bug here, and after that what GLib is doing for signals is about the best we can do given the present kernel/libc APIs.
Comment 12 Allison Karlitskaya (desrt) 2012-03-16 17:36:46 UTC
The patch looks good with one exception: are we sure sig_atomic_t exists on all platforms?
Comment 13 Colin Walters 2012-03-16 20:16:10 UTC
Created attachment 209948 [details] [review]
gmain: Use sig_atomic_t for list of pending Unix signals

Now with a configure check
Comment 14 Allison Karlitskaya (desrt) 2012-03-16 20:30:20 UTC
Without commenting on the validity of your auto-fu (which I assume was copy/pasted), this looks good.
Comment 15 Colin Walters 2012-03-16 21:11:16 UTC
FWIW said auto-fu wasn't copy-pasted exactly, but I was reading
AC_DEFUN([jm_AC_TYPE_LONG_LONG], ...) from acinclude.m4 as I was
writing.

Attachment 209948 [details] pushed as 6833385 - gmain: Use sig_atomic_t for list of pending Unix signals