Bug 659881 – libffi marshling code may not be strict aliasing safe

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 659881 - libffi marshling code may not be strict aliasing safe


Summary:	libffi marshling code may not be strict aliasing safe


Status:	RESOLVED OBSOLETE

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-09-22 22:04 UTC by Colin Walters
Modified:	2018-05-24 13:24 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
tests: Add a some torture test cases for the generic marshaller (4.21 KB, patch) 2011-09-22 22:04 UTC, Colin Walters	committed	Details \| Review
Add a enum return value testcase for the generic marshaller (3.42 KB, patch) 2011-09-23 16:39 UTC, Dan Williams	rejected	Details \| Review
add signal return value testcases (5.43 KB, patch) 2011-09-28 15:35 UTC, Dan Williams	committed	Details \| Review
fix generic closure tests to be Enums, not Flags (1.94 KB, patch) 2011-09-28 15:36 UTC, Dan Williams	committed	Details \| Review
Add tests for signal return types int/uint (4.39 KB, patch) 2011-09-28 15:37 UTC, Dan Williams	committed	Details \| Review
Fix enums and return types for generic closure marshaller (7.72 KB, patch) 2011-09-28 19:23 UTC, Dan Williams	committed	Details \| Review
gclosure: Clean up handling of return values (3.43 KB, patch) 2011-10-03 18:58 UTC, Colin Walters	none	Details \| Review

Description Colin Walters 2011-09-22 22:04:08 UTC

This has to do with the type of enums.

Comment 1 Colin Walters 2011-09-22 22:04:33 UTC

Created attachment 197297 [details] [review]
tests: Add a some torture test cases for the generic marshaller

In particular this has a test case for a G_TYPE_ENUM which we were
running into a failure on ppc64.

Comment 2 Colin Walters 2011-09-22 22:05:50 UTC

<dcbw> walters: check this
<dcbw> void
<dcbw> g_value_set_enum (GValue *value,
<dcbw>     gint    v_enum)
<dcbw> {
<dcbw>   g_return_if_fail (G_VALUE_HOLDS_ENUM (value));
<dcbw>   
<dcbw>   value->data[0].v_long = v_enum;
<dcbw> }
<dcbw> static ffi_type *
<dcbw> value_to_ffi_type (const GValue *gvalue, gpointer *value)
<dcbw> {
<dcbw> ...
<dcbw>     case G_TYPE_ENUM:
<dcbw>       rettype = &ffi_type_sint;
<dcbw>       *value = (gpointer)&(gvalue->data[0].v_int);
<dcbw>       break;
<dcbw> TYPE_ENUM needs to pull from v_long
<walters> ah, indeed
<dcbw> in value_to_ffi_type
<walters> so the 64 bit is at play here
<dcbw> yeah
<walters> i did suspect our mapping most
<walters> dcbw, want to do the patch?
<dcbw> sure
<walters> hmm though
<walters> so the second line definitely needs to be long, since that's how we put it in the union
<dcbw> yeah, we need to move the G_TYPE_ENUM down to the G_TYPE_LONG
<dcbw> since it's really
<dcbw>     case G_TYPE_BOOLEAN:
<dcbw>     case G_TYPE_CHAR:
<dcbw>     case G_TYPE_INT:
<dcbw>     case G_TYPE_ENUM:
<dcbw>       rettype = &ffi_type_sint;
<dcbw>       *value = (gpointer)&(gvalue->data[0].v_int);
<dcbw>       break;
<dcbw> I just elided that part
<dcbw> it to this
<dcbw>     case G_TYPE_LONG:
<dcbw>     case G_TYPE_ENUM:
<dcbw>       rettype = &ffi_type_slong;
<dcbw>       *value = (gpointer)&(gvalue->data[0].v_long);
<dcbw>       break;
<dcbw> so maybe in this case the problem is BE vs. LE instead of 32 vs. 64
<walters> but the type chosen for the enum is dependent on ABI; it looks like the SysV ABI says enum is always restricted to 32 bit?
<dcbw> walters: you mean the FFI type chosen is dependent on the arch's ABI?
<walters> oh i was looking at the i386 manual so of course
<walters> the amd x86_64 one just says it's normally int but may be larger
<walters> dcbw, depends on the compiler AIUI
<dcbw> walters: ok, so for this issue that means... ?
<walters> i think we should keep it as ffi_type_sint
<dcbw> walters: do we need a pile of ifdefs when mapping GValue to ffi?
<dcbw> walters: ok, so a separate section for G_TYPE_ENUM
<dcbw> walters: so basically http://bigw.org/~dan/enum-ffi.patch
<dcbw> walters: that seems odd though, since wont' ffi access it as an int when it's really stored in a long?
<dcbw> walters: so any idea why we stuff it into v_long when g_value_set_enum() only takes a gint?
<dcbw> walters: well, perhaps you don't, given that code hasn't changed in over 11 years
<dcbw> I assume gint is always 32 bit
<walters> gint is just int, so in C that means at least 32 bit, but on platforms we care about yes it's 32
<walters> so there is some historical stuff for the enum gvalue stuff
<walters> but let me see something
<dcbw> sure
<walters> i think we need a good failing test case in glib
<dcbw> yeah, we do
<walters> writing it
<dcbw> I assume that if we use a sufficiently large enum and a sufficiently small enum we can make it fail on both ppc64 and x86_64
<dcbw> walters: might be useful to expose the value/ffi transform functions internally for testcases to verify
<hadess> dcbw, it failed on x86-64 as well?
<dcbw> or maybe just a bunch of tests on g_cclosure_marshal_generic
<dcbw> hadess: no
<dcbw> hadess: passed there, but the enum value I was testing was '3'
* drago01 has quit (Ping timeout: 600 seconds)
<dcbw> so I assume that ppc64 just took the wrong 32 bits of 0x0000000000000003
<dcbw> since it always showed up as 0
<hadess> endianess clipped
<dcbw> yep
<dcbw> not really a 64/32 thing at all
* nacho has quit (Abandonando)
<dcbw> well, it is
<dcbw> but more a be/le thing
<hadess> doesn't it warn during compilation?
<dcbw> hadess: it's the GValue union so probably not
<hadess> the assignment should warn, certainly
<hadess> hmm, given the code, maybe not
<dcbw> well, on some arches gint is 64-bits, so we dont' want it to warn
<dcbw> ppc64 abi is int == 32 though

Comment 3 Colin Walters 2011-09-22 23:23:10 UTC

Comment on attachment 197297 [details] [review]
tests: Add a some torture test cases for the generic marshaller

Attachment 197297 [details] pushed as 1df8160 - tests: Add a some torture test cases for the generic marshaller

Comment 4 Colin Walters 2011-09-22 23:37:40 UTC

So here's the issue I see.  For the value we give to ffi, we need to match whatever the compiler chose for the enum.  See the AMD64 SysV ABI linked from here:
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format

It says enums are of sizeof(int) basically, with a caveat that large enums may require larger types.  

So if we say ffi_type_long, we're going to be passing 8 bytes down where 4 is expected - I don't think it can work.

However if we say ffi_type_int, then we blow up on big endian 64 bit architectures, because we're passing a pointer to a long.

I think we need to allocate some temporary space for an integer, and pass a pointer to that =/

Comment 5 Dan Williams 2011-09-23 14:45:39 UTC

Well, if we pass enums through ffi as longs, will they come out the same way they came in?  Woodhouse suggested doing that and I was going to test.  The problem is the address of v_long but parsing it in ffi as an int.  Since the enum is assigned to v_long, if we passed it through as v_long it'll come out the other side of ffi as a long, shouldn't we be able to munge it from a long to an int just the same way that g_value_set_enum() does?

Comment 6 Dan Williams 2011-09-23 16:39:25 UTC

Created attachment 197361 [details] [review]
Add a enum return value testcase for the generic marshaller

Adds a testcase to check return value marshalling for enums.

Comment 7 Colin Walters 2011-09-23 17:18:51 UTC

Review of attachment 197361 [details] [review]:

Looks good.

::: gobject/tests/signals.c
@@ +230,3 @@
+  g_assert_cmpint (v_int2, ==, 43);
+
+  return (TestUnsignedEnum) GPOINTER_TO_UINT (user_data);

Seems a little weird to pass the enum as user data, but OK.

Comment 8 Dan Williams 2011-09-26 17:02:30 UTC

(In reply to comment #7)
> Review of attachment 197361 [details] [review]:
> 
> Looks good.
> 
> ::: gobject/tests/signals.c
> @@ +230,3 @@
> +  g_assert_cmpint (v_int2, ==, 43);
> +
> +  return (TestUnsignedEnum) GPOINTER_TO_UINT (user_data);
> 
> Seems a little weird to pass the enum as user data, but OK.

It's so we can use the same signal handler to test the return value for both signed and unsigned enums.  Otherwise we'd have to have two, and the only difference would be returning a different value.  Either way works.

Comment 9 Dan Williams 2011-09-26 23:28:23 UTC

Review of attachment 197361 [details] [review]:

different patch coming

Comment 10 Dan Williams 2011-09-27 01:53:30 UTC

For further discussion of libffi return value handling make sure you check out the original bug on RH bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=736489

especially comments #35 and later.

Comment 11 Dan Williams 2011-09-28 15:15:17 UTC

Ok, so the summary here is that there a few bugs in glib WRT its usage of libffi

1) marshalling enums is clearly wrong, since it's pulling from the wrong member of the GValue union; but we can't simply pass the address of v_long, because that will grab the wrong 32 bits of v_long on 64-bit BE architectures.  So we need to do the stack-allocated temp variable dance

2) return value handling is wrong on 64-bit BE architectures as described by https://bugzilla.redhat.com/show_bug.cgi?id=736489#c35 so we need to treat the returned value not as void*/gpointer* but as an integral type and cast as appropriate

Comment 12 Dan Williams 2011-09-28 15:35:38 UTC

Created attachment 197682 [details] [review]
add signal return value testcases

Comment 13 Dan Williams 2011-09-28 15:36:37 UTC

Created attachment 197683 [details] [review]
fix generic closure tests to be Enums, not Flags

Seems like a simple C&P error, but this was causing test failures for signed enums because they weren't deriving from G_TYPE_ENUM and thus weren't hitting the fixes in value_to_ffi_type()

Comment 14 Dan Williams 2011-09-28 15:37:48 UTC

Created attachment 197684 [details] [review]
Add tests for signal return types int/uint

These will be busted the same way as enums on 64-bit BE platforms that sign-extend to full 64-bits, since libffi passes back a full 64 bit value for ints.  So we need to test these too.

Comment 15 Colin Walters 2011-09-28 15:40:46 UTC

Review of attachment 197682 [details] [review]:

Looks good.

Comment 16 Colin Walters 2011-09-28 15:41:08 UTC

Review of attachment 197683 [details] [review]:

Crap, sorry about that...looks good.

Comment 17 Colin Walters 2011-09-28 15:42:26 UTC

Review of attachment 197684 [details] [review]:

Looks good.

Comment 18 Dan Williams 2011-09-28 16:03:49 UTC

Review of attachment 197682 [details] [review]:

committed

Comment 19 Dan Williams 2011-09-28 16:04:10 UTC

Review of attachment 197683 [details] [review]:

committed

Comment 20 Dan Williams 2011-09-28 16:04:21 UTC

Review of attachment 197684 [details] [review]:

committed

Comment 21 Dan Williams 2011-09-28 19:23:57 UTC

Created attachment 197698 [details] [review]
Fix enums and return types for generic closure marshaller

From the commit message of the patch:

enums are stored in v_long but need to be marshalled as signed
integers.  On platforms where int is 32 bits, taking the
address of v_long resulted in the wrong 32 bits being marshalled.
So we need to stuff the enum's int-sized value to a temporary
int-sized variable and marshall that instead.

Second, on return, libffi actually returns a pointer to a value
that's sized according to platform conventions, not according to
what the caller requested.  ie if ffi_type_sint was requested, the
value can still be a 64-bit sign-extended long on a 64-bit
architecture like PPC64, thus the caller cannot simply cast
the return value as a pointer to the desired type, but must cast
as a pointer to an integral type and then cast to the desired
type to remove any sign extension complications.

For more information on how to correctly handle libffi return
values, see the following bug, specifically comment 35:

https://bugzilla.redhat.com/show_bug.cgi?id=736489

"For 64-bit ABIs that extend integral returns types to 64-bits, libffi always
returns full 64-bit values that you can truncate in the calling code.   It's
just the way it is has always been.  Please don't change libffi.  I'll document
this clearly for the next version (perhaps there is a mention of this, I
haven't looked yet).

The same is true for returning 8-bit values, for instance, on 32-bit systems.
All ABIs extend those results to the full 32-bits so you need to provide a
properly aligned buffer that's big enough to hold the result."

Comment 22 Dan Williams 2011-09-28 19:25:30 UTC

Somebody needs to review the patch from comment 21 for correctness.  It builds and passes the testcases on ppc64 and on x86_64 but I haven't tested on 32-bit platforms at all.

Comment 23 Colin Walters 2011-09-29 15:13:16 UTC

(In reply to comment #21)

> "For 64-bit ABIs that extend integral returns types to 64-bits, libffi always
> returns full 64-bit values that you can truncate in the calling code.   It's
> just the way it is has always been.  Please don't change libffi.  I'll document
> this clearly for the next version (perhaps there is a mention of this, I
> haven't looked yet).
> 
> The same is true for returning 8-bit values, for instance, on 32-bit systems.
> All ABIs extend those results to the full 32-bits so you need to provide a
> properly aligned buffer that's big enough to hold the result."

From his comment then, this patch is insufficient, since the other cases are broken too:

     case G_TYPE_BOOLEAN:
     case G_TYPE_CHAR:
     case G_TYPE_INT:

Right?  These are all integral return values, so they also need a 64 bit buffer.

Comment 24 Colin Walters 2011-09-29 15:13:40 UTC

Review of attachment 197698 [details] [review]:

See comment.

Comment 25 Dan Williams 2011-09-30 17:42:59 UTC

(In reply to comment #23)
> (In reply to comment #21)
> 
> > "For 64-bit ABIs that extend integral returns types to 64-bits, libffi always
> > returns full 64-bit values that you can truncate in the calling code.   It's
> > just the way it is has always been.  Please don't change libffi.  I'll document
> > this clearly for the next version (perhaps there is a mention of this, I
> > haven't looked yet).
> > 
> > The same is true for returning 8-bit values, for instance, on 32-bit systems.
> > All ABIs extend those results to the full 32-bits so you need to provide a
> > properly aligned buffer that's big enough to hold the result."
> 
> From his comment then, this patch is insufficient, since the other cases are
> broken too:
> 
>      case G_TYPE_BOOLEAN:
>      case G_TYPE_CHAR:
>      case G_TYPE_INT:
> 
> Right?  These are all integral return values, so they also need a 64 bit
> buffer.

Only for *return* values.  When passing the value into libffi, it's fine to pass it as the native type AFAICT.  But the posted patch handles this already because it assigns the return value to a pointer to an 'ffi_arg' variable like Andrew suggests, and in most cases this is just a typecast of 'long' anyway.  So we get the correct result here by casting the ffi_arg.

Comment 26 Dan Williams 2011-09-30 17:46:22 UTC

Let me rephrase...

Anthony suggests using ffi_arg in that bug in comment 36:

"BTW - you can just use ffi_arg as the storage for all integral return values. 
Look at the libffi testsuite code, like return_sc.c.  For ppc64 ffi_arg is an
unsigned long."

I adopted the suggested approach for return values as taken in return_sc.c from libffi.  ffi_arg is almost always just "typedef unsigned long ffi_arg" (this depends on platform) and so it works out in the end, because we're assigning the returned void* to an integral value, then we're casting to the glib type, and thus the compiler handles the sign extension issues for us automatically.

When passing values *into* libffi before the call, the only Glib type affected was ENUM due to the odd v_long/v_int choice for ENUM storage in GValue.  That's fixed by the temp stack var stuff I posted.

Comment 27 Colin Walters 2011-10-03 18:47:16 UTC

Ok, I think I understand this now - your patch had two parts, and I was conflating the return size with the enum issue, but they are different.

So one thing that seems like it'd make your patch a lot more obvious is to change
value_from_ffi_type to take a ffi_arg *, not a gpointer *.

Right now we're doing:

  rvalue = g_alloca (MAX (rtype->size, sizeof (ffi_arg)));

But that'd only be necessary if the function returned a structure by value (i.e. not a pointer), right?  And we simply don't allow that in g_cclosure_marshal_generic.

I can do this patch as a followup after you commit this one if you like too.

Comment 28 Colin Walters 2011-10-03 18:58:20 UTC

Created attachment 198139 [details] [review]
gclosure: Clean up handling of return values

Pure code cleanup - since we don't support structure value returns,
just use ffi_arg to hold the return value.

Comment 29 Dan Williams 2011-10-03 21:33:17 UTC

(In reply to comment #27)
> Ok, I think I understand this now - your patch had two parts, and I was
> conflating the return size with the enum issue, but they are different.
> 
> So one thing that seems like it'd make your patch a lot more obvious is to
> change
> value_from_ffi_type to take a ffi_arg *, not a gpointer *.
> 
> Right now we're doing:
> 
>   rvalue = g_alloca (MAX (rtype->size, sizeof (ffi_arg)));
> 
> But that'd only be necessary if the function returned a structure by value
> (i.e. not a pointer), right?  And we simply don't allow that in
> g_cclosure_marshal_generic.
> 
> I can do this patch as a followup after you commit this one if you like too.

My understanding is that it's only appropriate to use ffi_arg as the return type for integral values.  ie we *shouldn't* use it for floats and structs, in that case we'd use native types.  Because single precision floats are something like 80 bits, right?  And ffi_arg is 'unsigned long' on most platforms, so clearly we can't expect to treat ffi_arg* as pointing to a float and always expect the right things to come  out of it.

I think the original code is appropriate though, since we do want to allocate a larger-than-sizeof(ffi_arg) variable for the return value in the case of floats and doubles.  I think the MAX(rtype->size, sizeof(ffi_arg)) is appropriate here, and also that we want to treat the return value is a void* or an ffi_arg* based on the actual return type.  In short, I think the patch I posted is the correct way to handle return types from libffi based on my understanding of Anthony Greens comments in the RH bug.

Comment 30 Colin Walters 2011-10-03 21:50:42 UTC

(In reply to comment #29)
> 
> My understanding is that it's only appropriate to use ffi_arg as the return
> type for integral values.  ie we *shouldn't* use it for floats and structs, in
> that case we'd use native types.  Because single precision floats are something
> like 80 bits, right?

You're thinking of the X87 FPU on IA32 - but those 80 bits of precision are *internal* - it's not stored that way in RAM.  See:
http://en.wikipedia.org/wiki/X87
floats and doubles are stored in IEEE 754: http://en.wikipedia.org/wiki/IEEE_754-1985

>  And ffi_arg is 'unsigned long' on most platforms, so
> clearly we can't expect to treat ffi_arg* as pointing to a float and always
> expect the right things to come  out of it.

Hmm.  floats are an interesting question actually.  On 64 bit architectures where floating point return values are always double, does libffi extend them or does it write a float into the provided buffer?

> I think the original code is appropriate though, since we do want to allocate a
> larger-than-sizeof(ffi_arg) variable for the return value in the case of floats
> and doubles. 

As above, float and double never take up more space than sizeof(long) in RAM.

> In short, I think the patch I posted is the
> correct way to handle return types from libffi based on my understanding of
> Anthony Greens comments in the RH bug.

I agree your patch is correct!  I also think mine is - i.e. I believe it does not regress your patch.

Comment 31 Colin Walters 2011-10-05 18:54:39 UTC

<dcbw> walters: I'm hearing that we cannot cast from integral to float type for your patch
<walters> dcbw, hmm, can you get them to comment on the bug?
<dcbw> walters: "(01:44:38 PM) bergner: dcbw: as I said above, you cannot take the address of a integral/float type and cast that pointer to float/integral type.  That is against the language spec."
<dcbw> (01:45:13 PM) bergner: to get the bits of a float type (or vise versa) into a integral type unmunged, you have to go through a union
<dcbw> (01:45:39 PM) bergner: pointer casting will just give you bugs, especially with newer gccs
<dcbw> aliasing issues I guess
<walters> oh strict aliasing
<walters> i don't think my patch regresses things there though
<walters> carefully note i am not touching the      case G_TYPE_FLOAT: case
<walters> i think this is a general issue with ffi_arg though
<dcbw> yeah, you're not, but the question is whether we can take a ffi_arg* and do *(gfloat*) value and get the right result
<dcbw> is that operation any different than void* and doing *(gfloat*)value
<dcbw> given that ffi_arg is 'unsigned long'
<walters> where for clarity, on x86_64 ffi_arg is typedef unsigned long long     ffi_arg;

Comment 32 Colin Walters 2011-10-05 18:55:19 UTC

For now let's just put in dcbw's patch and leave my patch out until we have a chance to come back to this.

Comment 33 Dan Williams 2011-10-05 19:11:15 UTC

Review of attachment 197698 [details] [review]:

committed

Comment 34 GNOME Infrastructure Team 2018-05-24 13:24:59 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/455.