GNOME Bugzilla – Bug 671997
Unix signal handling assumes that volatile 1-byte writes are atomic
Last modified: 2012-03-16 21:11:19 UTC
According to sigaction(3posix): If the signal occurs other than as the result of calling abort(), kill(), or raise(), the behavior is undefined if the signal handler [...] refers to any object with static storage duration other than by assigning a value to a static storage duration variable of type volatile sig_atomic_t. I believe the rationale for this is to avoid situations like this on architectures where bytes are not set individually: * normal code wants to load/modify/store a byte which shares a machine word with the one used by the signal handler, perhaps this: volatile gchar unix_signal_pending[NSIG] (suppose it's setting u_s_p[1] 1 to 0, for instance) * it gets as far as executing the load and maybe the modify * a signal arrives and the normal code is interrupted * the signal handler sets u_s_p[2] to 1, which is really a 32-bit load/modify/store to u_s_p[0..3] or similar * back to normal code * the store happens and the change to u_s_p[2] from the signal handler has been lost! but I could be wrong. It's undefined behaviour, anyway. Until 2.30 this was a pipe-to-self, which is well-understood and async-signal-safe. I believe replacing the volatile gchar and gboolean variables written by the signal handler with volatile sig_atomic_t would fix this; so would returning to a pipe-to-self setup.
Is this something you saw in practice or you're just theorizing? Changing to use sig_atomic_t makes sense to me, I assume it would expand to word size on most non-x86 architectures. Hm, so a concrete problem that might happen here on an architecture which can only do atomic word assignment would be that we might miss signals.
(In reply to comment #1) > Is this something you saw in practice or you're just theorizing? Just theorizing. I was reviewing a faulty signal handler in Telepathy code (which can't use GLib's facilities for this yet, since our policy is to only depend on stable-branches), started to say "look in D-Bus or GLib for an example of how to do it Right™", then realized that GLib actually has undefined behaviour. (D-Bus uses a pipe-to-self; if GLib's implementation used sig_atomic_t, GLib and D-Bus between them would provide examples of both the well-defined approaches to writing a signal handler). > Changing to use sig_atomic_t makes sense to me, I assume it would expand to > word size on most non-x86 architectures. It seems to be int on all glibc platforms, in practice. I don't know about other Unixes. The glibc manual claims that on "all of the machines that the GNU C library supports, and on all POSIX systems we know of", int, everything smaller than int, and all pointer types are atomic. One of the answers to <http://stackoverflow.com/questions/9606725/linux-why-is-sig-atomic-t-typedefed-to-int> claims that the glibc manual is wrong, citing AVR as a problem case (AVR is an 8-bit microcontroller - not to be confused with AVR32 - so it's not interesting for either glibc or GLib, but Linux has apparently been ported to it). > Hm, so a concrete problem that might happen here on an architecture which can > only do atomic word assignment would be that we might miss signals. Right, that's the "direction" I described: our variables are smaller than a sig_atomic_t. The other "direction" is a more common cause for problems, although it isn't possible in this particular code (assuming gboolean == int is no larger than a machine word): if you try to access a type larger than a machine word in a signal handler (most commonly int64 or double on 32-bit platforms), the two or more read/write operations needed to get/set the whole value can be interleaved with the two or more operations needed to set/get it in the main program.
I sometimes think it'd be good for GLib (and D-Bus, and other portable projects) to document what architectural assumptions they make, beyond what C guarantees: there are plenty of platforms allowed by C99 that we don't support because they're rare and difficult. (For instance, platforms where a function pointer is bigger than a data pointer, or where NULL pointers aren't all-bits-zero, are allowed by C, but GLib is unlikely to ever work on such platforms.)
(In reply to comment #3) > I sometimes think it'd be good for GLib (and D-Bus, and other portable > projects) to document what architectural assumptions they make, beyond what C > guarantees I started doing that a long time ago (though more for "GNOME in general" than for any particular package) but never did anything with it. http://mysterion.org/~danw/c-assumptions.txt. Feel free to run with that.
Created attachment 209638 [details] [review] gmain: Use sig_atomic_t for list of pending Unix signals Pointed out by: Simon McVittie <simon.mcvittie@collabora.co.uk>
One concern I have about the above is what platforms provide sig_atomic_t...but I guess if it fails later we can add a configure test (fallback to int?) or something.
Why would we make this more complicated ? writing a single byte has been the traditional way of doing this. Some committee inventing a new type for it won't break that...
(In reply to comment #7) > Why would we make this more complicated ? writing a single byte has been the > traditional way of doing this. Some committee inventing a new type for it won't > break that... Writing a single byte...to a pipe? Yes, but we're not using a pipe anymore. And writing a single byte to an array of bytes *is* flawed as smcv says. The cost to turn it into an array of ints is negligible.
Of course, I was talking about pipes...
(In reply to comment #9) > Of course, I was talking about pipes... Well we *are* using an eventfd which gives us the "safely wake up main thread" semantics. There's actually two ways to use pipes for signal handling: 1) The byte you write to the pipe is the signal number. This is how DBus does it. 2) In the signal handler, you set a static int variable, and write a fixed byte to the pipe (this is how GLib *used* to do it) What we've changed in GLib is replacing the static-byte-to-pipe with an eventfd, which is strictly more efficient. However we introduced a bug by using 'char' for the array of signals. But if there's a platform which can't do atomic int-sized assignments (and I bet there are some 64 bit platforms like that), then GLib has always been buggy in this respect up until recently, since with the introduction of GChildWatchSource in the GLIB_2_4_0 timeframe: static void g_child_watch_signal_handler (int signum) { child_watch_count ++; if (child_watch_init_state == CHILD_WATCH_INITIALIZED_THREADED) { write (child_watch_wake_up_pipe[1], "B", 1); } else { /* We count on the signal interrupting the poll in the same thread. */ } } That increment wouldn't be atomic; it needs sig_atomic_t.
Bottom line is I believe my patch fixes the reported bug here, and after that what GLib is doing for signals is about the best we can do given the present kernel/libc APIs.
The patch looks good with one exception: are we sure sig_atomic_t exists on all platforms?
Created attachment 209948 [details] [review] gmain: Use sig_atomic_t for list of pending Unix signals Now with a configure check
Without commenting on the validity of your auto-fu (which I assume was copy/pasted), this looks good.
FWIW said auto-fu wasn't copy-pasted exactly, but I was reading AC_DEFUN([jm_AC_TYPE_LONG_LONG], ...) from acinclude.m4 as I was writing. Attachment 209948 [details] pushed as 6833385 - gmain: Use sig_atomic_t for list of pending Unix signals