GNOME Bugzilla – Bug 688406
GStaticMutex broke ABI on at least ARM EABI during 2.31.x
Last modified: 2014-01-20 15:13:46 UTC
In GLib 2.24 (current Debian stable), GStaticMutex on pthreads platforms was defined as: struct _GStaticMutex { struct _GMutex *runtime_mutex; union { char pad[sizeof (pthread_mutex_t)]; /* [1] */ double dummy_double; void *dummy_pointer; long dummy_long; } static_mutex; }; [1] The sizeof() is actually evaluated at GLib's configure time: it's 24 bytes on Debian armel, which is ARM EABI. (I suspect that this had been the case long before 2.24, and remained the case until the simplification of mutexes during 2.31, but 2.24 is the version I'm immediately interested in). In GLib 2.32 (next Debian stable) it changed to: typedef struct { GMutex *mutex; #ifndef G_OS_WIN32 /* only for ABI compatibility reasons */ pthread_mutex_t unused; #endif } GStaticMutex; This looks OK at first glance, and on most platforms, it is. However, on at least Debian armel, doubles are 64-bit-aligned. With 32-bit pointers, this introduces 32 bits of padding between runtime_mutex and static_mutex. That padding is no longer present in 2.32. We can either go back to the 2.24 ABI (and rebuild everything in Debian that has a GStaticMutex, or a larger struct that embeds one), go forward to the 2.32 ABI (no GLib changes, equal number of rebuilds in Debian), or go back to the 2.24 ABI in Debian but not upstream. Any of these options is probably going to break things: the question is which one breaks least. Going back to the 2.24 ABI can be achieved by replacing the pthread_mutex_t with a union containing at least pthread_mutex_t and double; the safest thing would be to reintroduce the other dummy members from 2.24 as well. There might be other similar ABI changes in the rest of GThread - I haven't been able to check them all yet. I would also like to add some G_STATIC_ASSERT() statements to assert that the size and (on at least gcc) alignment of each of these structs are the same as those of its previous versions (possibly excluding the GLib 2.24 version of the affected structs on ARM), so the build will fail on any as-yet undiscovered platforms where ABI broke. A brief survey of major binary distributions that I know work on ARM, if 2.31 was indeed the cutoff point for this: * Ubuntu 11.x and older had the old ABI, 12.04 LTS and 12.10 have the new ABI. * Debian 6 has the old ABI, Debian 7 will have the new ABI unless we revert it * Fedora 17 (ARM as "secondary architecture") has the new ABI * OpenSUSE 12.1 had the old ABI, 12.2 has the new ABI (Anything described as "new ABI" might in fact be in an inconsistent state, depending how enthusiastically that distribution rebuilds packages whose dependencies have changed; Debian testing, the part of the archive that will become Debian 7, is certainly in an inconsistent state at the moment.)
Ouch. In any case, I lean towards staying with the new status quo. I'd wager that (things being as they are) the number of ARM binaries existing built with the new ABI is at least an order of magnitude higher than those existing against the old ABI. Do you know: are doubles 8-aligned on armhf?
I believe armel and armhf (ARM EABI with software floating point and ARM EABI VFP floating point, respectively) both have naturally-aligned doubles, i.e. 64-bit alignment.
For reference, this is <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=674156>.
Ugh. I'm inclined to care a whole lot more now that I found out that this is a real in-the-wild bug and not just some theoretical curiosity... In reality, though: do you really think we should actually change it back now?
(In reply to comment #1) > In any case, I lean towards staying with the new status quo. I'd wager that > (things being as they are) the number of ARM binaries existing built with the > new ABI is at least an order of magnitude higher than those existing against > the old ABI. But why didn't anyone notice the ABI break before? Either there aren't really lots of people using new-glib on those platforms, or else none of them require backwards-ABI-compatibility, and so they wouldn't notice if we switched back to the old ABI in the next release.
no, I guess there's a third possibility, which is that ARM wasn't popular enough before the ABI change for there to be many users, so there were no complaints, but now it is, and the new ARM users would be affected by changing the ABI back.
(In reply to comment #6) > no, I guess there's a third possibility, which is that ARM wasn't popular > enough before the ABI change for there to be many users, so there were no > complaints, but now it is, and the new ARM users would be affected by changing > the ABI back. This is the theory that I subscribe to, although it seems that at least one person noticed...
Created attachment 229122 [details] non-GLib-dependent test for whether ABI broke between 2.30 and 2.32 The affected data structures seem to be GStaticMutex, GStaticRWLock and GStaticRecMutex, plus anything in a third-party library that embeds one of those structs in its own public struct. Based on testing on the Debian porterboxes, ABI broke on all the 32-bit RISC architectures, because RISC CPUs like to align doubles at 64-bit boundaries: armel (ARM EABI with software floating-point, little-endian) armhf (ARM EABI with hardware floating-point, little-endian) mipsel (mips little-endian) powerpc s390 (S/390 31-bit) sparc ABI did not break on: amd64 (x86-64) i386 ia64 sh4 (Super-H) s390x (S/390 64-bit) Debian architectures and ports where I didn't test because there was no working porterbox: alpha armeb (ARM EABI, big-endian) hppa m68k mips (mips, big-endian) powerpcspe ppc64 sparc64 (of which only mips is a release architecture, so Debian doesn't particularly care about the rest). I suspect mips will need a rebuild too. I didn't bother testing on the non-Linux ports, because they're only on x86, and I suspect they have the same ABI as Linux.
(In reply to comment #4) > In reality, though: do you really think we should actually change it back now? I don't know; probably not? The old ABI was around for years, but the new ABI has already been in two stable-branches of GLib, so either way we're going to get breakage. Of the affected architectures, ARM and PowerPC seem like the most relevant: the rest are pretty obscure.
Created attachment 229124 [details] [review] glib-private.h: add _glib_alignof --- I'm not making this into API because I don't know whether it's 100% portable.
Created attachment 229125 [details] [review] GThread: statically assert that various thread-related things don't break ABI In fact several of these static assertions fail on 32-bit RISC architectures, but if we decide that keeping the old ABI is unachievable, it seems clearer to have them all for now, then remove the ones that don't actually hold. --- What I have in mind is something like +#if defined(__arm__) || defined(__powerpc__) || ... +# define _232_BROKE_ABI_HERE +#endif ... +#ifndef _232_BROKE_ABI_HERE G_STATIC_ASSERT (sizeof (_230...) == sizeof (G...)); G_STATIC_ASSERT (..._g_alignof...) +#endif so that if 2.30 also broke ABI on architectures we don't know about, users of those architectures find out about it.
Created attachment 229126 [details] [review] GHashTable: statically assert that GHashTableIter works as intended --- Not really the same issue, but while I'm adding static assertions...
Created attachment 229127 [details] [review] glib-init: make static assertions about platform assumptions
Comment on attachment 229125 [details] [review] GThread: statically assert that various thread-related things don't break ABI This clearly can't be applied, because we did break ABI, so its assertions sometimes fail. The other three patches here are only tangentially related to the original topic of the bug, but it would be nice for this sort of assumption to be statically asserted - hopefully doing more of that can avoid bugs like this in future.
Thanks for committing those patches, Matthias. I think this bug can now be resolved WONTFIX or OBSOLETE or something. GLib broke ABI, but it was long enough ago that the damage has already been done and reverting the change would be worse than keeping it; the only useful thing we can do about it is "don't do it again", for instance by accompanying every future struct reorganization with some appropriate G_STATIC_ASSERT calls.
I agree. I was wondering if we want to add some more static assertions in the thread area, but lets close this now.