GNOME Bugzilla – Bug 659881
libffi marshling code may not be strict aliasing safe
Last modified: 2018-05-24 13:24:59 UTC
This has to do with the type of enums.
Created attachment 197297 [details] [review] tests: Add a some torture test cases for the generic marshaller In particular this has a test case for a G_TYPE_ENUM which we were running into a failure on ppc64.
<dcbw> walters: check this <dcbw> void <dcbw> g_value_set_enum (GValue *value, <dcbw> gint v_enum) <dcbw> { <dcbw> g_return_if_fail (G_VALUE_HOLDS_ENUM (value)); <dcbw> <dcbw> value->data[0].v_long = v_enum; <dcbw> } <dcbw> static ffi_type * <dcbw> value_to_ffi_type (const GValue *gvalue, gpointer *value) <dcbw> { <dcbw> ... <dcbw> case G_TYPE_ENUM: <dcbw> rettype = &ffi_type_sint; <dcbw> *value = (gpointer)&(gvalue->data[0].v_int); <dcbw> break; <dcbw> TYPE_ENUM needs to pull from v_long <walters> ah, indeed <dcbw> in value_to_ffi_type <walters> so the 64 bit is at play here <dcbw> yeah <walters> i did suspect our mapping most <walters> dcbw, want to do the patch? <dcbw> sure <walters> hmm though <walters> so the second line definitely needs to be long, since that's how we put it in the union <dcbw> yeah, we need to move the G_TYPE_ENUM down to the G_TYPE_LONG <dcbw> since it's really <dcbw> case G_TYPE_BOOLEAN: <dcbw> case G_TYPE_CHAR: <dcbw> case G_TYPE_INT: <dcbw> case G_TYPE_ENUM: <dcbw> rettype = &ffi_type_sint; <dcbw> *value = (gpointer)&(gvalue->data[0].v_int); <dcbw> break; <dcbw> I just elided that part <dcbw> it to this <dcbw> case G_TYPE_LONG: <dcbw> case G_TYPE_ENUM: <dcbw> rettype = &ffi_type_slong; <dcbw> *value = (gpointer)&(gvalue->data[0].v_long); <dcbw> break; <dcbw> so maybe in this case the problem is BE vs. LE instead of 32 vs. 64 <walters> but the type chosen for the enum is dependent on ABI; it looks like the SysV ABI says enum is always restricted to 32 bit? <dcbw> walters: you mean the FFI type chosen is dependent on the arch's ABI? <walters> oh i was looking at the i386 manual so of course <walters> the amd x86_64 one just says it's normally int but may be larger <walters> dcbw, depends on the compiler AIUI <dcbw> walters: ok, so for this issue that means... ? <walters> i think we should keep it as ffi_type_sint <dcbw> walters: do we need a pile of ifdefs when mapping GValue to ffi? <dcbw> walters: ok, so a separate section for G_TYPE_ENUM <dcbw> walters: so basically http://bigw.org/~dan/enum-ffi.patch <dcbw> walters: that seems odd though, since wont' ffi access it as an int when it's really stored in a long? <dcbw> walters: so any idea why we stuff it into v_long when g_value_set_enum() only takes a gint? <dcbw> walters: well, perhaps you don't, given that code hasn't changed in over 11 years <dcbw> I assume gint is always 32 bit <walters> gint is just int, so in C that means at least 32 bit, but on platforms we care about yes it's 32 <walters> so there is some historical stuff for the enum gvalue stuff <walters> but let me see something <dcbw> sure <walters> i think we need a good failing test case in glib <dcbw> yeah, we do <walters> writing it <dcbw> I assume that if we use a sufficiently large enum and a sufficiently small enum we can make it fail on both ppc64 and x86_64 <dcbw> walters: might be useful to expose the value/ffi transform functions internally for testcases to verify <hadess> dcbw, it failed on x86-64 as well? <dcbw> or maybe just a bunch of tests on g_cclosure_marshal_generic <dcbw> hadess: no <dcbw> hadess: passed there, but the enum value I was testing was '3' * drago01 has quit (Ping timeout: 600 seconds) <dcbw> so I assume that ppc64 just took the wrong 32 bits of 0x0000000000000003 <dcbw> since it always showed up as 0 <hadess> endianess clipped <dcbw> yep <dcbw> not really a 64/32 thing at all * nacho has quit (Abandonando) <dcbw> well, it is <dcbw> but more a be/le thing <hadess> doesn't it warn during compilation? <dcbw> hadess: it's the GValue union so probably not <hadess> the assignment should warn, certainly <hadess> hmm, given the code, maybe not <dcbw> well, on some arches gint is 64-bits, so we dont' want it to warn <dcbw> ppc64 abi is int == 32 though
Comment on attachment 197297 [details] [review] tests: Add a some torture test cases for the generic marshaller Attachment 197297 [details] pushed as 1df8160 - tests: Add a some torture test cases for the generic marshaller
So here's the issue I see. For the value we give to ffi, we need to match whatever the compiler chose for the enum. See the AMD64 SysV ABI linked from here: http://en.wikipedia.org/wiki/Executable_and_Linkable_Format It says enums are of sizeof(int) basically, with a caveat that large enums may require larger types. So if we say ffi_type_long, we're going to be passing 8 bytes down where 4 is expected - I don't think it can work. However if we say ffi_type_int, then we blow up on big endian 64 bit architectures, because we're passing a pointer to a long. I think we need to allocate some temporary space for an integer, and pass a pointer to that =/
Well, if we pass enums through ffi as longs, will they come out the same way they came in? Woodhouse suggested doing that and I was going to test. The problem is the address of v_long but parsing it in ffi as an int. Since the enum is assigned to v_long, if we passed it through as v_long it'll come out the other side of ffi as a long, shouldn't we be able to munge it from a long to an int just the same way that g_value_set_enum() does?
Created attachment 197361 [details] [review] Add a enum return value testcase for the generic marshaller Adds a testcase to check return value marshalling for enums.
Review of attachment 197361 [details] [review]: Looks good. ::: gobject/tests/signals.c @@ +230,3 @@ + g_assert_cmpint (v_int2, ==, 43); + + return (TestUnsignedEnum) GPOINTER_TO_UINT (user_data); Seems a little weird to pass the enum as user data, but OK.
(In reply to comment #7) > Review of attachment 197361 [details] [review]: > > Looks good. > > ::: gobject/tests/signals.c > @@ +230,3 @@ > + g_assert_cmpint (v_int2, ==, 43); > + > + return (TestUnsignedEnum) GPOINTER_TO_UINT (user_data); > > Seems a little weird to pass the enum as user data, but OK. It's so we can use the same signal handler to test the return value for both signed and unsigned enums. Otherwise we'd have to have two, and the only difference would be returning a different value. Either way works.
Review of attachment 197361 [details] [review]: different patch coming
For further discussion of libffi return value handling make sure you check out the original bug on RH bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=736489 especially comments #35 and later.
Ok, so the summary here is that there a few bugs in glib WRT its usage of libffi 1) marshalling enums is clearly wrong, since it's pulling from the wrong member of the GValue union; but we can't simply pass the address of v_long, because that will grab the wrong 32 bits of v_long on 64-bit BE architectures. So we need to do the stack-allocated temp variable dance 2) return value handling is wrong on 64-bit BE architectures as described by https://bugzilla.redhat.com/show_bug.cgi?id=736489#c35 so we need to treat the returned value not as void*/gpointer* but as an integral type and cast as appropriate
Created attachment 197682 [details] [review] add signal return value testcases
Created attachment 197683 [details] [review] fix generic closure tests to be Enums, not Flags Seems like a simple C&P error, but this was causing test failures for signed enums because they weren't deriving from G_TYPE_ENUM and thus weren't hitting the fixes in value_to_ffi_type()
Created attachment 197684 [details] [review] Add tests for signal return types int/uint These will be busted the same way as enums on 64-bit BE platforms that sign-extend to full 64-bits, since libffi passes back a full 64 bit value for ints. So we need to test these too.
Review of attachment 197682 [details] [review]: Looks good.
Review of attachment 197683 [details] [review]: Crap, sorry about that...looks good.
Review of attachment 197684 [details] [review]: Looks good.
Review of attachment 197682 [details] [review]: committed
Review of attachment 197683 [details] [review]: committed
Review of attachment 197684 [details] [review]: committed
Created attachment 197698 [details] [review] Fix enums and return types for generic closure marshaller From the commit message of the patch: enums are stored in v_long but need to be marshalled as signed integers. On platforms where int is 32 bits, taking the address of v_long resulted in the wrong 32 bits being marshalled. So we need to stuff the enum's int-sized value to a temporary int-sized variable and marshall that instead. Second, on return, libffi actually returns a pointer to a value that's sized according to platform conventions, not according to what the caller requested. ie if ffi_type_sint was requested, the value can still be a 64-bit sign-extended long on a 64-bit architecture like PPC64, thus the caller cannot simply cast the return value as a pointer to the desired type, but must cast as a pointer to an integral type and then cast to the desired type to remove any sign extension complications. For more information on how to correctly handle libffi return values, see the following bug, specifically comment 35: https://bugzilla.redhat.com/show_bug.cgi?id=736489 "For 64-bit ABIs that extend integral returns types to 64-bits, libffi always returns full 64-bit values that you can truncate in the calling code. It's just the way it is has always been. Please don't change libffi. I'll document this clearly for the next version (perhaps there is a mention of this, I haven't looked yet). The same is true for returning 8-bit values, for instance, on 32-bit systems. All ABIs extend those results to the full 32-bits so you need to provide a properly aligned buffer that's big enough to hold the result."
Somebody needs to review the patch from comment 21 for correctness. It builds and passes the testcases on ppc64 and on x86_64 but I haven't tested on 32-bit platforms at all.
(In reply to comment #21) > "For 64-bit ABIs that extend integral returns types to 64-bits, libffi always > returns full 64-bit values that you can truncate in the calling code. It's > just the way it is has always been. Please don't change libffi. I'll document > this clearly for the next version (perhaps there is a mention of this, I > haven't looked yet). > > The same is true for returning 8-bit values, for instance, on 32-bit systems. > All ABIs extend those results to the full 32-bits so you need to provide a > properly aligned buffer that's big enough to hold the result." From his comment then, this patch is insufficient, since the other cases are broken too: case G_TYPE_BOOLEAN: case G_TYPE_CHAR: case G_TYPE_INT: Right? These are all integral return values, so they also need a 64 bit buffer.
Review of attachment 197698 [details] [review]: See comment.
(In reply to comment #23) > (In reply to comment #21) > > > "For 64-bit ABIs that extend integral returns types to 64-bits, libffi always > > returns full 64-bit values that you can truncate in the calling code. It's > > just the way it is has always been. Please don't change libffi. I'll document > > this clearly for the next version (perhaps there is a mention of this, I > > haven't looked yet). > > > > The same is true for returning 8-bit values, for instance, on 32-bit systems. > > All ABIs extend those results to the full 32-bits so you need to provide a > > properly aligned buffer that's big enough to hold the result." > > From his comment then, this patch is insufficient, since the other cases are > broken too: > > case G_TYPE_BOOLEAN: > case G_TYPE_CHAR: > case G_TYPE_INT: > > Right? These are all integral return values, so they also need a 64 bit > buffer. Only for *return* values. When passing the value into libffi, it's fine to pass it as the native type AFAICT. But the posted patch handles this already because it assigns the return value to a pointer to an 'ffi_arg' variable like Andrew suggests, and in most cases this is just a typecast of 'long' anyway. So we get the correct result here by casting the ffi_arg.
Let me rephrase... Anthony suggests using ffi_arg in that bug in comment 36: "BTW - you can just use ffi_arg as the storage for all integral return values. Look at the libffi testsuite code, like return_sc.c. For ppc64 ffi_arg is an unsigned long." I adopted the suggested approach for return values as taken in return_sc.c from libffi. ffi_arg is almost always just "typedef unsigned long ffi_arg" (this depends on platform) and so it works out in the end, because we're assigning the returned void* to an integral value, then we're casting to the glib type, and thus the compiler handles the sign extension issues for us automatically. When passing values *into* libffi before the call, the only Glib type affected was ENUM due to the odd v_long/v_int choice for ENUM storage in GValue. That's fixed by the temp stack var stuff I posted.
Ok, I think I understand this now - your patch had two parts, and I was conflating the return size with the enum issue, but they are different. So one thing that seems like it'd make your patch a lot more obvious is to change value_from_ffi_type to take a ffi_arg *, not a gpointer *. Right now we're doing: rvalue = g_alloca (MAX (rtype->size, sizeof (ffi_arg))); But that'd only be necessary if the function returned a structure by value (i.e. not a pointer), right? And we simply don't allow that in g_cclosure_marshal_generic. I can do this patch as a followup after you commit this one if you like too.
Created attachment 198139 [details] [review] gclosure: Clean up handling of return values Pure code cleanup - since we don't support structure value returns, just use ffi_arg to hold the return value.
(In reply to comment #27) > Ok, I think I understand this now - your patch had two parts, and I was > conflating the return size with the enum issue, but they are different. > > So one thing that seems like it'd make your patch a lot more obvious is to > change > value_from_ffi_type to take a ffi_arg *, not a gpointer *. > > Right now we're doing: > > rvalue = g_alloca (MAX (rtype->size, sizeof (ffi_arg))); > > But that'd only be necessary if the function returned a structure by value > (i.e. not a pointer), right? And we simply don't allow that in > g_cclosure_marshal_generic. > > I can do this patch as a followup after you commit this one if you like too. My understanding is that it's only appropriate to use ffi_arg as the return type for integral values. ie we *shouldn't* use it for floats and structs, in that case we'd use native types. Because single precision floats are something like 80 bits, right? And ffi_arg is 'unsigned long' on most platforms, so clearly we can't expect to treat ffi_arg* as pointing to a float and always expect the right things to come out of it. I think the original code is appropriate though, since we do want to allocate a larger-than-sizeof(ffi_arg) variable for the return value in the case of floats and doubles. I think the MAX(rtype->size, sizeof(ffi_arg)) is appropriate here, and also that we want to treat the return value is a void* or an ffi_arg* based on the actual return type. In short, I think the patch I posted is the correct way to handle return types from libffi based on my understanding of Anthony Greens comments in the RH bug.
(In reply to comment #29) > > My understanding is that it's only appropriate to use ffi_arg as the return > type for integral values. ie we *shouldn't* use it for floats and structs, in > that case we'd use native types. Because single precision floats are something > like 80 bits, right? You're thinking of the X87 FPU on IA32 - but those 80 bits of precision are *internal* - it's not stored that way in RAM. See: http://en.wikipedia.org/wiki/X87 floats and doubles are stored in IEEE 754: http://en.wikipedia.org/wiki/IEEE_754-1985 > And ffi_arg is 'unsigned long' on most platforms, so > clearly we can't expect to treat ffi_arg* as pointing to a float and always > expect the right things to come out of it. Hmm. floats are an interesting question actually. On 64 bit architectures where floating point return values are always double, does libffi extend them or does it write a float into the provided buffer? > I think the original code is appropriate though, since we do want to allocate a > larger-than-sizeof(ffi_arg) variable for the return value in the case of floats > and doubles. As above, float and double never take up more space than sizeof(long) in RAM. > In short, I think the patch I posted is the > correct way to handle return types from libffi based on my understanding of > Anthony Greens comments in the RH bug. I agree your patch is correct! I also think mine is - i.e. I believe it does not regress your patch.
<dcbw> walters: I'm hearing that we cannot cast from integral to float type for your patch <walters> dcbw, hmm, can you get them to comment on the bug? <dcbw> walters: "(01:44:38 PM) bergner: dcbw: as I said above, you cannot take the address of a integral/float type and cast that pointer to float/integral type. That is against the language spec." <dcbw> (01:45:13 PM) bergner: to get the bits of a float type (or vise versa) into a integral type unmunged, you have to go through a union <dcbw> (01:45:39 PM) bergner: pointer casting will just give you bugs, especially with newer gccs <dcbw> aliasing issues I guess <walters> oh strict aliasing <walters> i don't think my patch regresses things there though <walters> carefully note i am not touching the case G_TYPE_FLOAT: case <walters> i think this is a general issue with ffi_arg though <dcbw> yeah, you're not, but the question is whether we can take a ffi_arg* and do *(gfloat*) value and get the right result <dcbw> is that operation any different than void* and doing *(gfloat*)value <dcbw> given that ffi_arg is 'unsigned long' <walters> where for clarity, on x86_64 ffi_arg is typedef unsigned long long ffi_arg;
For now let's just put in dcbw's patch and leave my patch out until we have a chance to come back to this.
Review of attachment 197698 [details] [review]: committed
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/455.