GNOME Bugzilla – Bug 503096
add inline function for overflow-resistant arithmetic operations
Last modified: 2017-01-20 10:04:06 UTC
Working on bug 452361, I needed a way to add/subtract ints and detect overflows and cap at G_MININT/G_MAXINT. Was expecting to find in glib but didn't. Here's what I've got so far: +/* Overflow-safe arithmetic */ + +static inline int +int_add (int a, int b) +{ + int c = a + b; + + if (G_UNLIKELY ((a < 0) == (b < 0) && (a < 0) != (c < 0))) + c = (a < 0) ? G_MININT : G_MAXINT; + + return c; +} + +static inline int +int_sub (int a, int b) +{ + int c = a - b; + + if (G_UNLIKELY ((a < 0) != (b < 0) && (a < 0) != (c < 0))) + c = (a < 0) ? G_MININT : G_MAXINT; + + return c; +} I'm not actually quite sure about the checks. And just working on int is quite arbitrary, but that's what I needed. On a machine with a long larger than int, may be faster to just do the math in long and check.
The code above may lead to undefined behavior (according to C standard) because of integer overflow. My proposal is: G_INLINE_FUNC int g_int_add_with_saturation(int a, int b) { if (G_UNLIKELY (b >= 0 && a > G_MAXINT - b)) return G_MAXINT; else if (G_UNLIKELY (b < 0 && a < G_MININT - b)) return G_MININT; else return a + b; } G_INLINE_FUNC int g_int_sub_with_saturation(int a, int b) { if (G_UNLIKELY (b < 0 && a > G_MAXINT + b)) return G_MAXINT; else if (G_UNLIKELY (b >= 0 && a < G_MININT + b)) return G_MININT; else return a - b; }
Created attachment 117281 [details] [review] Implementation of g_int_add_with_saturation/g_int_sub_with_saturation
Created attachment 303083 [details] overflow safe functions The attached header provides the following API for gint/glong/gint32/gint64 and unsigned variants/gsize: gboolean g_int_add (gint *res, gint a, gint b); gint g_int_add_sat (gint a, gint b); gboolean g_int_sub (gint *res, gint a, gint b); gint g_int_sub_sat (gint a, gint b); gboolean g_int_mul (gint *res, gint a, gint b); gint g_int_mul_sat (gint a, gint b); The functions return FALSE in case of an overflow or the saturated value. In case of GCC/Clang a generic variant is provided. The type is chosen based on the first argument (which is why the 'sat' one also takes a pointer here): gint foo; g_add (&foo, 42, 42); // -> g_int_add (&foo, 42, 42) g_add_sat (&foo, 42, 42); // -> *(&foo) = g_int_add_sat (42, 42) In case of GCC >= 5 and Clang >= 3.4 the implementation uses the builtin overflow safe functions [0] for performance. In all cases the portable fallback implementations are provided with a leading "_" for testing. Any feedback welcome. I can provide a patch against glib (including tests and documentation) if wanted. [0] http://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins
We first need to hear from glib maintainers whether they want to take something like this.
(In reply to Behdad Esfahbod from comment #4) > We first need to hear from glib maintainers whether they want to take > something like this. Let's start with the basic rule: at least 2 applications will use it. It seems reasonable to me, but identifying those consumers is important.
(In reply to Colin Walters from comment #5) > Let's start with the basic rule: at least 2 applications will use it. It > seems reasonable to me, but identifying those consumers is important. Really this is the sort of thing that lots of code should be using... Like, any code that does "if (current_offset + untrusted_user_provided_length > max_size)" is wrong, and I know people have submitted patches to fix some of those in the past. (Seeing specific examples would help nail down the best API for it though.)
(In reply to Dan Winship from comment #6) > (Seeing specific examples would help nail down the best API for it though.) One example is bug #756026, where GLib's tzdata file loader currently calculates offsets into the file based on values in the file, which could lead to out of bounds accesses. Most of these accesses are done as loads of structs from the file, so an API which is useful for doing checks like g_malloc_n()'s, which takes n_blocks and n_block_bytes, would be good. More discussion is also in bug #756903, which is implementing the same feature but in a different way.
*** Bug 756903 has been marked as a duplicate of this bug. ***
I'm absolutely convinced that we should do at least a subset of this, because we hit this problem over and over again. I proposed my own patch in bug 756903. There, ebassi compares the two approaches and concludes: (In reply to Dan Winship from comment #6) > (In reply to Emmanuele Bassi (:ebassi) from comment #5) > > To be fair, I kind of like the approach in bug 503096 the most > > Which part of it exactly? Looking function-y rather than macro-y? Supporting > both int and uint? Having int and long versions in addition to int32 and > int64 versions? Having clamped versions in addition to the > give-the-wrong-answer-and-return-false versions? Having subtraction support? All of them. :-) - Type safe inlined functions instead of macros: ✓ (I want the compiler to kick me if I'm doing something stupid) - Supporting signed and unsigned integers: ✓ - Supporting more types: ✓ (though this is the one thing I can live without, as long as we have support for sized integers) - Safe clamped versions: ✓ (saves a branch to handle failure) - More arithmetic operators: ✓ (I'll admit that overflowing addition is rarer, but still very much possible in existing code, see bug 755514) In reply to some of the raised points: I personally prefer the macro look because it matches with functions like GUINT_TO_POINTER() and GUINT_FROM_LE() whereas having a function like g_uint_* splits the type name in half. That said, these "macros" probably really are inline functions, so there is more truth in using a name that looks like that. Even in the version that I implemented the macros evaluate directly to inline calls. Maybe we could break convention and start using function names like guint_checked_add() or so to reflect that. I'm not convinced that signed math support is required here. It seems the case where we want to do these operations over and over again is multiplication and addition in order to make calculations for memory offsets, typically against file contents. For the same reason, I don't think we need saturating variants. Feedback/comments?
(In reply to Allison Ryan Lortie (desrt) from comment #9) > I'm absolutely convinced that we should do at least a subset of this, > because we hit this problem over and over again. > > I proposed my own patch in bug 756903. > > There, ebassi compares the two approaches and concludes: > > (In reply to Dan Winship from comment #6) > > (In reply to Emmanuele Bassi (:ebassi) from comment #5) > > > To be fair, I kind of like the approach in bug 503096 the most > > > > Which part of it exactly? Looking function-y rather than macro-y? Supporting > > both int and uint? Having int and long versions in addition to int32 and > > int64 versions? Having clamped versions in addition to the > > give-the-wrong-answer-and-return-false versions? Having subtraction support? > > All of them. :-) > > - Type safe inlined functions instead of macros: ✓ (I want the compiler to > kick me if I'm doing something stupid) > - Supporting signed and unsigned integers: ✓ > - Supporting more types: ✓ (though this is the one thing I can live > without, as long as we have support for sized integers) > - Safe clamped versions: ✓ (saves a branch to handle failure) > - More arithmetic operators: ✓ (I'll admit that overflowing addition is > rarer, but still very much possible in existing code, see bug 755514) > > > In reply to some of the raised points: I personally prefer the macro look > because it matches with functions like GUINT_TO_POINTER() and > GUINT_FROM_LE() whereas having a function like g_uint_* splits the type name > in half. Agreed. > That said, these "macros" probably really are inline functions, so there is > more truth in using a name that looks like that. Even in the version that I > implemented the macros evaluate directly to inline calls. Maybe we could > break convention and start using function names like guint_checked_add() or > so to reflect that. What would be the downside of having a macro expand into an inline function call? Breaking convention would likely cause problems with introspection or languages like Vala. I’m not in favour of that. > I'm not convinced that signed math support is required here. It seems the > case where we want to do these operations over and over again is > multiplication and addition in order to make calculations for memory > offsets, typically against file contents. For the same reason, I don't > think we need saturating variants. Agreed. Let’s start small with the agreed big use case (loading stuff from mmap()ed files) and expand if needed later.
(In reply to Philip Withnall from comment #10) > Breaking convention would likely cause problems with introspection or > languages like Vala. I’m not in favour of that. Of this, I worry an amount of about 0%. Vala's GLib wrappers are already hand-written and deviate from the GLib namespacing and even API quite a bit. Other languages using introspection would definitely use native numeral types which have none of the undefined behaviour of C when it comes to integer overflowing. > > I'm not convinced that signed math support is required here. It seems the > > case where we want to do these operations over and over again is > > multiplication and addition in order to make calculations for memory > > offsets, typically against file contents. For the same reason, I don't > > think we need saturating variants. > > Agreed. Let’s start small with the agreed big use case (loading stuff from > mmap()ed files) and expand if needed later. I agree on this approach; it's definitely safer than a large API drop.
I think I changed my mind overnight about the macro thing. My argument that we don't have precedent for g_int_* is simply false because of g_int_hash(), g_int_equal(). g_atomic_int_* has also established a precedent for this sort of thing looking like a function rather than a macro. The ALL_CAPS thing is also not very nice to look at. I'm going to incorporate the review comments from the other bug report, lowercase-ify the macros and submit an updated patch here. If Behdad can tell me that he still will definitely use the signed/clamped versions of these things if we add them then I'm happy to add them, but for lack of that, I'd prefer to keep it to the set of things that we know we will actually use (which is 32/64bit unsigned addition and multiplication).
Created attachment 314308 [details] [review] macros: add dummy __has_builtin() Add a dummy definition for Clang's __has_builtin() macro. This will allow us to use __has_builtin() unconditionally, in the same way as we already do for __has_feature().
Created attachment 314309 [details] [review] GLib: add bounds-checked unsigned int arithmetic Add some helpers for builds-checked unsigned integer arithmetic to GLib. These will be based on compiler intrinsics where they are available, falling back to standard manual checks otherwise. The fallback case needs to be implemented as a function (which we do inline) because we cannot rely on statement expressions. We also implement the intrinsics case as an inline in order to avoid people accidentally writing non-portable code which depends on static evaluation of the builtin. For now there is only support for addition and multiplication for guint, guint64 and gsize. It may make sense to add support for subtraction or for the signed equivalents of those types in the future if we find a use for that.
Created attachment 314310 [details] [review] tests: test bounds-checked int arithmetic Add some simple testcases for the new bounds-checked integer arithmetic helpers. Include a second build of the testcase to make sure we test the fallback code even if we are on a compiler that supports the intrinsics.
Review of attachment 314309 [details] [review]: ::: docs/reference/glib/glib-sections.txt @@ +350,3 @@ +<TITLE>Bounds-checked integer arithmetic</TITLE> +<FILE>checkedmath</FILE> +GUINT_CHECKED_ADD I've already spotted that I missed this and corrected it locally.
Review of attachment 314308 [details] [review]: ++
Review of attachment 314309 [details] [review]: ::: glib/docs.c @@ +1442,3 @@ + * The helpers may be macros, normal functions or inlines. They may be + * implemented with inline assembly or compiler intrinsics where + * available. All the documentation comments here need a ‘Since:’ line. @@ +1452,3 @@ + * + * Performs a checked addition of @a and @b, storing the result in + * @dest. What happens to @dest if %FALSE is returned? Is it left unmodified, filled with garbage, or filled with some well-defined but not-useful value like the remainder after the overflow? (Same for the other documentation comments.) ::: glib/gtypes.h @@ +384,3 @@ +#if __GNUC__ >= 5 +#define _GLIB_HAVE_BUILTIN_OVERFLOW_CHECKS +#elif __has_builtin(__builtin_uadd_overflow) You’re using `__has_builtin(__builtin_uadd_overflow)` as a proxy for checking all of the builtins you use. Are you happy with that? Are there any platforms/compilers which would implement (for example) __builtin_uadd_overflow but not __builtin_umul_overflow?
Review of attachment 314310 [details] [review]: ::: glib/tests/overflow.c @@ +1,1 @@ +#include <glib.h> Missing copyright/licence header. @@ +20,3 @@ + { FALSE, 0, G_MAXUINT, 1 }, + { FALSE, 0, 1, G_MAXUINT }, + { FALSE, 0, G_MAXUINT, G_MAXUINT } I haven't reviewed the test vectors in detail.
Review of attachment 314309 [details] [review]: ::: glib/docs.c @@ +1442,3 @@ + * The helpers may be macros, normal functions or inlines. They may be + * implemented with inline assembly or compiler intrinsics where + * available. Thanks @@ +1452,3 @@ + * + * Performs a checked addition of @a and @b, storing the result in + * @dest. The fact is that you get the overflowed result. This is defined by GCC in an odd way, with talk of infinite precision integer spaces and casting. In practice, I'd prefer to make it undefined. If you're using these macros, probably you're wanting to throw an error if overflow ever happens. I'll be more explicit about that. ::: glib/gtypes.h @@ +384,3 @@ +#if __GNUC__ >= 5 +#define _GLIB_HAVE_BUILTIN_OVERFLOW_CHECKS +#elif __has_builtin(__builtin_uadd_overflow) I'm happy with that, yes. I don't know of any such compiler, and the theoretical possibility of one existing is not enough to make me want to make this more complex than it needs to be.
(In reply to Allison Ryan Lortie (desrt) from comment #20) > @@ +1452,3 @@ > + * > + * Performs a checked addition of @a and @b, storing the result in > + * @dest. > > The fact is that you get the overflowed result. > > This is defined by GCC in an odd way, with talk of infinite precision > integer spaces and casting. > > In practice, I'd prefer to make it undefined. If you're using these macros, > probably you're wanting to throw an error if overflow ever happens. I'll be > more explicit about that. ++ for being explicit about the undefinedness. > ::: glib/gtypes.h > @@ +384,3 @@ > +#if __GNUC__ >= 5 > +#define _GLIB_HAVE_BUILTIN_OVERFLOW_CHECKS > +#elif __has_builtin(__builtin_uadd_overflow) > > I'm happy with that, yes. > > I don't know of any such compiler, and the theoretical possibility of one > existing is not enough to make me want to make this more complex than it > needs to be. ++ for not over-complicating this.
Attachment 314308 [details] pushed as 89bda59 - macros: add dummy __has_builtin() Attachment 314309 [details] pushed as d0219f2 - GLib: add bounds-checked unsigned int arithmetic Attachment 314310 [details] pushed as 7dd9ffb - tests: test bounds-checked int arithmetic Thanks for the reviews. If there is a desire to expand these macros to cover new types or operations please open a separate bug about that and specify two usecases.
(In reply to Allison Ryan Lortie (desrt) from comment #12) > I think I changed my mind overnight about the macro thing. My argument that > we don't have precedent for g_int_* is simply false because of g_int_hash(), > g_int_equal(). g_atomic_int_* has also established a precedent for this > sort of thing looking like a function rather than a macro. The ALL_CAPS > thing is also not very nice to look at. > > I'm going to incorporate the review comments from the other bug report, > lowercase-ify the macros and submit an updated patch here. If Behdad can > tell me that he still will definitely use the signed/clamped versions of > these things if we add them then I'm happy to add them, Yes, the original need, in Pango, is still outstanding. Thanks! > but for lack of > that, I'd prefer to keep it to the set of things that we know we will > actually use (which is 32/64bit unsigned addition and multiplication).
static inline gboolean _GLIB_CHECKED_MUL_U32 (guint32 *dest, guint32 a, guint32 b) { *dest = a * b; return !a || *dest / a == b; } In HarfBuzz I implemented the overflow check this way: static inline bool _hb_unsigned_int_mul_overflows (unsigned int count, unsigned int size) { return (size > 0) && (count >= ((unsigned int) -1) / size); } the idea was that if the second argument is constant, the division is optimized away.
After patch 'GLib: add bounds-checked unsigned int arithmetic' I started to see following error: /mnt/wd20ezrx-00dc0b0/JHBuild/3.20/install/include/glib-2.0/glib/gtypes.h:423: syntax error, unexpected identifier in ' G_STATIC_ASSERT(sizeof (unsigned long long) == sizeof (guint64));' at 'G_STATIC_ASSERT' /mnt/wd20ezrx-00dc0b0/JHBuild/3.20/install/include/glib-2.0/glib/gtypes.h:423: syntax error, unexpected ')', expecting identifier or '(' or '*' or ';' in ' G_STATIC_ASSERT(sizeof (unsigned long long) == sizeof (guint64));' at ')' g-ir-scanner: Soup: warning: 5 warnings suppressed (use --warn-all to see them)
Created attachment 314616 [details] [review] math macros: move static assert to function scope We have a static assert at file scope in gtypes.h which is causing trouble with some compilers. Move it to the scope of a near-by function to prevent trouble.
Created attachment 314620 [details] [review] math macros: optimise fallback multiplication case Add an optimisation (suggested by Behdad) that will cause the fallback case of the checked multiplication inlines to be faster in the case that the second argument is a constant. We somewhat randomly choose to make it the second argument. Is it more likely that checked_mul(x, 2) or checked_mul(2, x) is what people would naturally write? I would probably personally write the second option, but we take the first possibility since there is prior art. It also doesn't matter too much -- this is only an optimisation on the fallback path. In any case, we also add a note about this implementation detail to the docs in order to hint people in the right direction.
(In reply to Alberts Muktupāvels from comment #25) > After patch 'GLib: add bounds-checked unsigned int arithmetic' I started to > see following error: > > /mnt/wd20ezrx-00dc0b0/JHBuild/3.20/install/include/glib-2.0/glib/gtypes.h: > 423: syntax error, unexpected identifier in ' G_STATIC_ASSERT(sizeof > (unsigned long long) == sizeof (guint64));' at 'G_STATIC_ASSERT' > /mnt/wd20ezrx-00dc0b0/JHBuild/3.20/install/include/glib-2.0/glib/gtypes.h: > 423: syntax error, unexpected ')', expecting identifier or '(' or '*' or ';' > in ' G_STATIC_ASSERT(sizeof (unsigned long long) == sizeof (guint64));' at > ')' > g-ir-scanner: Soup: warning: 5 warnings suppressed (use --warn-all to see > them) What compiler is that with? I thought G_STATIC_ASSERT was supposed to work anywhere a typedef is valid.
Review of attachment 314620 [details] [review]: I have no preference between checked_mul(x, 2) and hecked_mul(2, x). ::: glib/gtypes.h @@ +462,2 @@ static inline gboolean _GLIB_CHECKED_ADD_U32 (guint32 *dest, guint32 a, guint32 b) { *dest = a + b; return *dest >= a; } Could implement the same optimisation here: return a < G_MAXUINT32 - b @@ +466,2 @@ static inline gboolean _GLIB_CHECKED_ADD_U64 (guint64 *dest, guint64 a, guint64 b) { *dest = a + b; return *dest >= a; } And here.
(In reply to Philip Withnall from comment #28) > (In reply to Alberts Muktupāvels from comment #25) > > After patch 'GLib: add bounds-checked unsigned int arithmetic' I started to > > see following error: > > > > /mnt/wd20ezrx-00dc0b0/JHBuild/3.20/install/include/glib-2.0/glib/gtypes.h: > > 423: syntax error, unexpected identifier in ' G_STATIC_ASSERT(sizeof > > (unsigned long long) == sizeof (guint64));' at 'G_STATIC_ASSERT' > > /mnt/wd20ezrx-00dc0b0/JHBuild/3.20/install/include/glib-2.0/glib/gtypes.h: > > 423: syntax error, unexpected ')', expecting identifier or '(' or '*' or ';' > > in ' G_STATIC_ASSERT(sizeof (unsigned long long) == sizeof (guint64));' at > > ')' > > g-ir-scanner: Soup: warning: 5 warnings suppressed (use --warn-all to see > > them) > > What compiler is that with? I thought G_STATIC_ASSERT was supposed to work > anywhere a typedef is valid. gcc 5.2.1
(In reply to Philip Withnall from comment #28) > What compiler is that with? I thought G_STATIC_ASSERT was supposed to work > anywhere a typedef is valid. It is. But note the tail end of the errors: > > g-ir-scanner: Soup: warning: 5 warnings suppressed (use --warn-all to see > > them) maybe something having to do with how g-ir-scanner invokes things?
(In reply to Allison Ryan Lortie (desrt) from comment #27) > Created attachment 314620 [details] [review] [review] > math macros: optimise fallback multiplication case > > Add an optimisation (suggested by Behdad) that will cause the fallback > case of the checked multiplication inlines to be faster in the case that > the second argument is a constant. > > We somewhat randomly choose to make it the second argument. Is it more > likely that > > checked_mul(x, 2) > > or > > checked_mul(2, x) > > is what people would naturally write? I would probably personally write > the second option, but we take the first possibility since there is > prior art. FWIW, we ordered them that way because that follows calloc()'s parameter orders: void *calloc(size_t nmemb, size_t size); where the second item is constant.
There's a problem WRT Comment #29 (the new inline functions:- '_GLIB_CHECKED_WHATEVER_U32' etc) insofar as they've been added to glib/gtypes.h. This is causing compile problems for MSVC. Currently, the 'inline' keyword gets mapped correctly (for various compilers) in glib/gutils.h. But we can't #include that in gtypes.h because gutils.h already #includes gtypes.h (so we end up with a circular dependency). After some discussion on the gtk-devel list, a suggestion was made to move the 'inline' definitions out of gutils.h and into gtypes.h. I've created a patch which achieves this. If I can figure out how to add it here I'll attach it. Otherwise I'll post it to the gtk-devel list. My patch fixes the compile issue for MSVC but it needs to get tested for the other compilers before getting adopted upstream. Current thinking though is that it shouldn't cause any problems.
Created attachment 314913 [details] [review] Move some 'inline' definitions out of gutils.h 'inline' patch submitted
see bug 757374 which is waiting for a review