GNOME Bugzilla – Bug 730932
statically assert that reasonable assumptions about enums are true
Last modified: 2016-11-22 19:36:21 UTC
As pointed out in recent discussion on gtk-devel-list, ISO C compilers are allowed to make enum GLogLevelFlags { ... some values in the range -128 to 127 } char-sized, leading to brokenness if a user-defined log level flag has value 256. Perhaps more seriously, that means that if libfoo v1 has /* all values certainly fit in a char */ enum Foo { FOO_0 = 0, ..., FOO_127 = 127 } and libfoo v2 has /* it is certainly not true that all values fit in a char */ enum Foo { FOO_0 = 0, ..., FOO_256 = 256, ... } then that's an ABI break: any struct containing a Foo just became larger. (More realistically, Foo could be an error code that is intended to be extended over time as more potential error situations are discovered, like GIOErrorEnum.) In practice, I suspect that in relevant compilers, all enums whose values fit in the range INT32_MIN to INT32_MAX (inclusive) are the same size as int. ISO C allows compiler to break that assumption, but those that do would break code that works fine in other compilers, making the compiler look bad, for no significant benefit. I conjecture that such compilers are not popular. I think we should statically assert that that assumption holds. If we discover a platform where that static assertion fails, GLib won't compile there, and we can decide what to do about that platform (and raise it as a portability red-flag for the majority of GLib-based-library writers who are probably not aware that this could possibly be an issue). Similarly, the GVariant docs effectively guarantee that int is at least 32 bits. Let's statically assert that, too.
Created attachment 277447 [details] [review] glib-init: statically assert that "small" enums are all int-sized ISO C allows compilers to make enums smaller than int if their enumerated values would all fit in the range of a smaller type. In practice, I suspect that in relevant compilers, all enums whose values fit in the range INT32_MIN to INT32_MAX (inclusive) are the same size as int. ISO C allows compiler to break that assumption, but those that do would break code that works fine in other compilers, making the compiler look bad, for no significant benefit. I conjecture that such compilers are not popular. Let's statically assert that my assumption holds. If all goes well, GLib will continue to compile on every relevant platform; if it fails to compile on some platform as a result of this change, then there are probably a lot of naive uses of enums that need auditing for this assumption.
Created attachment 277449 [details] [review] glib-init: statically assert that int is at least 32 bits long The GVariant documentation says you can assume that types of no more than 32 bits may be assumed to be promoted to int by the usual promotions. If we're going to document that, we should statically assert that it's true.
Quoting myself, originally posted in the mailing list thread starting at <https://mail.gnome.org/archives/gtk-devel-list/2014-May/msg00029.html>: > Some other departures from ISO C that I am aware of: > > * ISO C does not guarantee that null pointer constants are > all-bits-zero, or that there is only one representation of a null > pointer. GLib assumes that there is exactly one representation of a > null pointer, NULL, and that it is all-bits-zero. I don't think we can statically assert this, because of the way null pointer constants work in C. We could have it as a runtime assertion if someone wants to implement one but it's probably not worth it. > * ISO C does not guarantee that signed integers use twos-complement for > negative numbers - they are allowed to use sign-and-magnitude or some > even weirder representation. There is almost certainly code in GLib > that assumes that they do use twos-complement. Also probably not worth it. CPU designers use twos-complement for a reason. > * ISO C does not guarantee that 8-, 16-, 32- and 64-bit types exist > (only that *if they do*, int8_t etc. are defined appropriately). > GLib assumes that they do exist; it probably also assumes that > char, short, int are exactly 8, 16, 32 bits respectively, and that > long is either 32 or 64 bits. I added an assertion that int is at least 32 bits. We could add /* If these assertions fail, relaxing them might be OK, but they * act as early-warning that your platform is probably the first * without these properties to have been tested. */ G_STATIC_ASSERT (CHAR_BITS == 8); G_STATIC_ASSERT (sizeof (short) == 2); G_STATIC_ASSERT (sizeof (int) == 4); G_STATIC_ASSERT (sizeof (long) == 4 || sizeof (long) == 8); G_STATIC_ASSERT (sizeof (void *) == 4 || sizeof (void *) == 8); G_STATIC_ASSERT (sizeof (void *) == sizeof (size_t)); if we want to get early warning of spectacularly odd platforms? > * ISO C does not guarantee that data pointers (e.g. void *) have the > same size and representation as function pointers (e.g. > void (*) (void)), or even that all function pointers are the same > size/representation. I already added static assertions for this one.
Review of attachment 277447 [details] [review]: Looks fine. Maybe we find the opposite: compilers that allocate larger than int. Let's see if anything fun turns up...
Review of attachment 277449 [details] [review]: I'd assert exactly equal, in fact. I don't know of anywhere that this isn't true...
Review of attachment 277447 [details] [review]: This doesn't seem to be a hard assumption that we fundamentally make in GLib in the same way that some of the other described assumptions are. I suspect that if this assertion ever triggers, the fix is going to be to remove the assertion, then wait until something else breaks on the affected platform.... E.g. - GLogLevelFlags is not going to be a problem until someone adds hundreds of custom log levels ... probably never. (Also that there is a lot of somewhat complicated code in GObjectIntrospection to figure out what the compiler does for choosing enum types, though it's never been exercised in anger AFAIK)
(In reply to comment #6) > This doesn't seem to be a hard assumption that we fundamentally make in GLib in > the same way that some of the other described assumptions are. > I suspect that if this assertion ever triggers, the fix is going to be to > remove the assertion, then wait until something else breaks on the affected > platform... This is really an assertion about "things that don't force library authors to bump SONAME" rather than "things that work". If this assertion ever triggers, the correct fix would be to add ..., _MY_ENUM_ENSURE_INT_SIZED = G_MAXINT } to each enum that (a) is intended to be able to expand "indefinitely", and (b) could conceivably appear in a struct. The vague idea was that if that assertion appears in GLib, then libraries depending on GLib can make that change without it being an ABI break on that platform, because their dependency (GLib) wouldn't previously have compiled on that platform anyway, so they had no functional ABI there; and once a reasonable spectrum of libraries have made that change, the assertion can be removed. I could be wrong, but based on the limited C++ code I've worked on, C++ library authors seem to be in the habit of putting a large dummy value in enums already, to ensure that on the hypothetical platforms where enums can be smaller than int, they have the freedom to expand the values later. The ideal outcome for me is that we demonstrate (via the assertion not failing) that that has never been necessary on any currently-relevant platform, and so we don't need to. The less-ideal outcome is that people who care about the platform where it fails have to mark enums as above; the bad outcome (that I'm trying to avoid) is that perfectly reasonable code randomly breaks ABI on that platform, and nobody knows until they waste a lot of time debugging it.
(In reply to comment #5) > I'd assert [sizeof (int) is] exactly equal [to sizeof (gint32)], > in fact. I don't know of anywhere that this isn't > true... According to <https://en.wikipedia.org/wiki/64-bit_computing>, ILP64 and SILP64 platforms do exist (Solaris on SPARC64 and "Classic" UNICOS, respectively). However, I don't think GLib would compile on such platforms anyway, because the GLib API includes both g(u)int16 and g(u)int32, and configure.ac only tries short, int and long as candidates for the underlying type for these. short can't be both 16-bit and 32-bit, so one of these types would have to not exist, breaking GLib (the g(u)int16 and/or g(u)int32 typedef would be missing). Commit db0e43d25a5 "gatomic: use GCC C11-style atomics, if available" assumes that int is exactly 4 bytes long, and should really statically assert it. I also notice that configure.ac quietly assumes that char is 8 bits long. Now that we have G_STATIC_ASSERT as a cheap and straightforward way to verify this, it would be good to have a G_STATIC_ASSERT that says so. Please note that I'm not saying that GLib should actually *support* bizarre and probably hypothetical platforms that violate reasonable assumptions, only that it should fail to compile in an obvious way if those reasonable assumptions are not met.
Created attachment 281364 [details] [review] gatomic: statically assert that our assumptions hold This code assumes that int is exactly 32 bits, and that pointers are either 4 or 8 bits, on platforms with __ATOMIC_SEQ_CST. In practice this is going to be true. --- > Commit db0e43d25a5 "gatomic: use GCC C11-style atomics, if available" assumes > that int is exactly 4 bytes long, and should really statically assert it. Done.
Created attachment 281365 [details] [review] glib-init: statically assert that we have 8-bit bytes configure.ac assumes this. --- > I also notice that configure.ac quietly assumes that char is 8 bits long. Now > that we have G_STATIC_ASSERT as a cheap and straightforward way to verify > this, it would be good to have a G_STATIC_ASSERT that says so. Here it is.
(In reply to comment #6) > This doesn't seem to be a hard assumption that we fundamentally make in GLib in > the same way that some of the other described assumptions are. Is that a veto, or should I merge it anyway? (In reply to comment #6) > E.g. - GLogLevelFlags is not going to be a problem until someone > adds hundreds of custom log levels ... probably never. The places most likely to fail (largest enums) seem like error enums. GIOErrorEnum and GDBusError in GIO, and TpError in telepathy-glib, have around 50 codes each, so they're within an order of magnitude of breaking the "fits in a signed char" boundary if any real compiler packs enums into chars. The place those could break if extended beyond 127 is if you have something like this: typedef struct { int bar; GDBusError error_code; char foo; } ThisThingIsAPI; and a change to the size of error_code changes the size of ThisThingIsAPI and the offset of foo. This would be unusual usage, I agree; but it would be nice to know whether this concern actually affects any practical platform (in which case library designers should be adding dummy members to their enums to reserve the full range), or whether it's purely theoretical. In general, if there are things that ISO C does not actually guarantee, but that are true on every practical platform, part of the general folklore of things you can safely rely on (i.e. pervasive assumptions in the GLib-based stack), and easy to G_STATIC_ASSERT, then I would like to do so. (In reply to comment #8) > (In reply to comment #5) > > I'd assert [sizeof (int) is] exactly equal [to sizeof (gint32)], > > in fact. I don't know of anywhere that this isn't > > true... > > According to <https://en.wikipedia.org/wiki/64-bit_computing>, ILP64 and SILP64 > platforms do exist (Solaris on SPARC64 and "Classic" UNICOS, respectively). > > However, I don't think GLib would compile on such platforms anyway Please clarify policy: does GLib theoretically support ILP64/SILP64 platforms in addition to LP64 (most 64-bit Unix, including x86-64 Linux) and LLP64 (x86-64 Windows), or does it consider int > 32 bits to be silly and not worth supporting? According to the Solaris 64-bit Developer's Guide <http://docs.oracle.com/cd/E19455-01/806-0477/806-0477.pdf> Solaris in 2000 was LP64 (same as x86-64 Linux), even on 64-bit SPARC - the ILP64 port may have been a preliminary version or something. UNICOS doesn't sound amazingly relevant.
Review of attachment 281364 [details] [review]: Looks right to me.
Review of attachment 281365 [details] [review]: Looks fine.
(In reply to comment #9) > Created an attachment (id=281364) [details] [review] > gatomic: statically assert that our assumptions hold > > This code assumes that int is exactly 32 bits, and that pointers > are either 4 or 8 bits, on platforms with __ATOMIC_SEQ_CST. ^^^^ bytes
Comment on attachment 281364 [details] [review] gatomic: statically assert that our assumptions hold (In reply to comment #14) > > are either 4 or 8 bits, on platforms with __ATOMIC_SEQ_CST. > > ^^^^ bytes Oops, yes :-) Pushed, with a corrected commit message that consistently talks about bytes. Commit 7269d75, for 2.41.3.
Comment on attachment 281365 [details] [review] glib-init: statically assert that we have 8-bit bytes Commit 9060a85, for 2.41.3
Comment on attachment 277449 [details] [review] glib-init: statically assert that int is at least 32 bits long Ryan asked for this check to be strengthened from >= to ==, but as noted above, platforms do exist where that is not true. Maintainer decision, please: * those platforms are silly and will never be supported, switch to == * in theory those platforms could be supported, keep >=
This breaks the build of anjuta - http://build.gnome.org/continuous/buildmaster/builds/2014/07/23/19/build/log-anjuta.txt === In file included from ../../../../plugins/symbol-db/anjuta-tags/jscript.c:19:0: ../../../../plugins/symbol-db/anjuta-tags/general.h:60:22: error: expected ')' before '__attribute__' # define __unused__ __attribute__((unused)) ^ /usr/include/glib-2.0/glib/gmacros.h:86:18: note: in expansion of macro '__unused__' __attribute__((__unused__)) ^ /usr/include/glib-2.0/glib/gmacros.h:175:120: note: in expansion of macro 'G_GNUC_UNUSED' #define G_STATIC_ASSERT(expr) typedef char G_PASTE (_GStaticAssertCompileTimeAssertion_, __COUNTER__)[(expr) ? 1 : -1] G_GNUC_UNUSED ^ /usr/include/glib-2.0/glib/gatomic.h:91:1: note: in expansion of macro 'G_STATIC_ASSERT' G_STATIC_ASSERT (sizeof (gint) == 4); ^ In file included from /usr/lib/glib-2.0/include/glibconfig.h:9:0, from /usr/include/glib-2.0/glib/gtypes.h:32, from /usr/include/glib-2.0/glib/galloca.h:32, from /usr/include/glib-2.0/glib.h:30, from ../../../../plugins/symbol-db/anjuta-tags/js-parser/jstypes.h:4, from ../../../../plugins/symbol-db/anjuta-tags/jscript.c:28: /usr/include/glib-2.0/glib/gmacros.h:86:29: error: expected ',' or ';' before ')' token __attribute__((__unused__)) ^ /usr/include/glib-2.0/glib/gmacros.h:175:120: note: in expansion of macro 'G_GNUC_UNUSED' #define G_STATIC_ASSERT(expr) typedef char G_PASTE (_GStaticAssertCompileTimeAssertion_, __COUNTER__)[(expr) ? 1 : -1] G_GNUC_UNUSED ^ /usr/include/glib-2.0/glib/gatomic.h:91:1: note: in expansion of macro 'G_STATIC_ASSERT' G_STATIC_ASSERT (sizeof (gint) == 4);, ^ == So anjuta (or a legacy code library in the code base includes a header file that redefines __unused__ and then includes glib.h. Since this actually expands to code. What anjuta is doing is pretty clear wrong - the breakage is that it's redefining a GCC keyword in the reserved double-underscore space. But is it a good idea to put assertions that will never trigger on *ANY* reasonable platform into a public header file? This is presumably our first usage of the somewhat-demanding G_STATIC_ASSERT macro in a header file. I'll file a bug against Anjuta, but this seems to be inviting problems.
Filed bug 733600 against Anjuta.
Created attachment 281474 [details] [review] Revert "gatomic: statically assert that our assumptions hold" This reverts commit 7269d75321b9d2a967a59fb35f243397f577eb41. Adding G_STATIC_ASSERT() into a header file caused compilation problems with at least one app (Anjuta). Reverting to keep GNOME continuous testing running.
Attachment 281474 [details] pushed as d0083f7 - Revert "gatomic: statically assert that our assumptions hold"
Created attachment 302848 [details] [review] gatomic: statically assert that our assumptions hold This code assumes that int is exactly 32 bits, and that pointers are either 4 or 8 bits, on platforms with __ATOMIC_SEQ_CST. In practice this is going to be true. A previous attempt at this assertion placed the G_STATIC_ASSERT at the top level in gatomic.h, but that broke anjuta, which redefined __unused__ at the time. These assertions are about the platform/compiler ABI, so it's sufficient to check them once, while compiling GLib itself; accordingly, move them to the implementation. --- This should avoid the problems that caused Attachment #281364 [details] to be reverted.
Created attachment 302849 [details] [review] glib-init: statically assert that int is at least 32 bits long The GVariant documentation says you can assume that types of no more than 32 bits may be assumed to be promoted to int by the usual promotions. If we're going to document that, we should statically assert that it's true. We cannot assert that int is *exactly* 32 bits long unless we intend to rule out support for ILP64 platforms (Solaris on SPARC64 at some point, according to Wikipedia, although as of 2000 it was documented to be LP64) and SILP64 platforms ("Classic" UNICOS, again according to Wikipedia). Whether we are going to do that or not, this weaker assertion still seems reasonable. --- This is one option, if the GLib maintainers care about these strange platforms.
Created attachment 302850 [details] [review] glib-init: statically assert that int is exactly 32 bits long The GVariant documentation says you can assume that types of no more than 32 bits may be assumed to be promoted to int by the usual promotions. If we're going to document that, we should statically assert that it's true, i.e. sizeof (int) >= sizeof (int32_t). All reasonable modern platforms are either ILP32 (32-bit platforms), LP64 (64-bit Linux, *BSD etc.), or LLP64 (64-bit Windows): there have been ILP64 platforms in the past, but ILP64 has the compelling disadvantage that {signed char, short, int} can't possibly provide all of {int8_t, int16_t, int32_t} unless int is 32 bits long. --- This is an alternative to Attachment #302849 [details], if the GLib maintainers' consensus is that ILP64 platforms are silly and should not be supported. I don't mind which one is applied, but I would like to apply one or the other.
(In reply to Simon McVittie from comment #22) > This code assumes that int is exactly 32 bits, and that pointers > are either 4 or 8 bits, on platforms with __ATOMIC_SEQ_CST. You brought back the bits/bytes confusion :)
(In reply to Simon McVittie from comment #3) > > * ISO C does not guarantee that signed integers use twos-complement for > > negative numbers - they are allowed to use sign-and-magnitude or some > > even weirder representation. There is almost certainly code in GLib > > that assumes that they do use twos-complement. > > Also probably not worth it. CPU designers use twos-complement for a reason. FWIW, I noticed recently that GLib assumes IEEE 754 floating point representation as well, in that it assumes that gdoubles have the same binary representation as D-Bus 'd' values. (IIRC, this means we are not portable to VAXes :-)
Since error codes have been brought up as a possible example of abi breaking by enum growth: GError treats error codes as ints, and that is the only guarantee we should make for enums of (G)Error codes: fits in an int.
(In reply to Dan Winship from comment #25) > (In reply to Simon McVittie from comment #22) > > This code assumes that int is exactly 32 bits, and that pointers > > are either 4 or 8 bits, on platforms with __ATOMIC_SEQ_CST. > > You brought back the bits/bytes confusion :) Oops, yes. I'll triple-check that before committing anything.
(In reply to Matthias Clasen from comment #27) > Since error codes have been brought up as a possible example of abi breaking > by enum growth: GError treats error codes as ints, and that is the only > guarantee we should make for enums of (G)Error codes: fits in an int. To be usable, GError codes must have sizeof (MyErrorEnum) <= sizeof (int), indeed. However, what I'm looking at here is the minimum size allocated by the compiler for an enum: sizeof (MyEnum) >= sizeof (some minimum type), but what is that minimum in practice? Is your position that the designers of C APIs should always be using some ugly idiom like this for any "extendable" flags/enum, and that it is an ABI break to extend any enum not of this form past 256 entries or 8 flags, in case that changes its effective size or alignment on some platform? typedef enum { MY_COLOR_RED, /* first value that conceptually makes sense */ MY_COLOR_BLUE, ... MY_COLOR_BEIGE, /* last value that conceptually makes sense */ _MY_COLOR_PADDING = G_MAXINT32 } MyColor; typedef enum { MY_ADJECTIVE_DEPRECATED = (1 << 0), MY_ADJECTIVE_INADVISABLE = (1 << 1), ... MY_ADJECTIVE_CONFUSING = (1 << 5), /* or whatever */ _MY_ADJECTIVE_PADDING = (1 << 31) } MyAdjective; In practice, nobody in the GLib ecosystem currently does this, AFAICS. In practice, I strongly suspect that all reasonable platforms will make enums at least int-sized, and hence API designers do not need to do this - which is why I would like a static assertion, to confirm that this is true on all platforms where GLib is used.
(In reply to Dan Winship from comment #26) > FWIW, I noticed recently that GLib assumes IEEE 754 floating point > representation as well, in that it assumes that gdoubles have the same > binary representation as D-Bus 'd' values. (IIRC, this means we are not > portable to VAXes :-) libdbus also assumes that a native double is an IEEE754 double, and both libdbus and GDBus assume that byteswapping a double works the same as byteswapping a 64-bit integer occupying the same 8 bytes. I think this assumption may actually have failed on ARM OABI, which assumed the ABI of a particular (rare) floating-point unit, which was not the same endianness as the CPU's integers (it might even have been middle-endian like VAXen). However, using D-Bus across a network between dissimilar non-x86 machines is vanishingly rare, so nobody noticed.
(In reply to Simon McVittie from comment #30) > I think this assumption may actually have failed on ARM OABI, which assumed > the ABI of a particular (rare) floating-point unit (For clarity: this does not affect ARM EABI, which uses either software floating point or a more sensible FPU ABI, both with the same endianness as integers.)
Review of attachment 302848 [details] [review]: LGTM.
Comment on attachment 302848 [details] [review] gatomic: statically assert that our assumptions hold Attachment 302848 [details] pushed as 1c47f62 - gatomic: statically assert that our assumptions hold
Review of attachment 302849 [details] [review]: OK.
Review of attachment 302850 [details] [review]: Sorry, I meant this one LGTM.
Comment on attachment 277447 [details] [review] glib-init: statically assert that "small" enums are all int-sized 5227630
Comment on attachment 302849 [details] [review] glib-init: statically assert that int is at least 32 bits long The consensus seems to be that ILP64 and SILP64 are unsupported, so using the other version instead.
Comment on attachment 302850 [details] [review] glib-init: statically assert that int is exactly 32 bits long be4fd3d
Landed for 2.51.1