After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 795726 - Lots of undefined behavior in extensions/gggl.c
Lots of undefined behavior in extensions/gggl.c
Status: RESOLVED OBSOLETE
Product: GEGL
Classification: Other
Component: babl
unspecified
Other NetBSD
: Normal normal
: ---
Assigned To: Default Gegl Component Owner
Default Gegl Component Owner
Depends on:
Blocks:
 
 
Reported: 2018-05-01 20:14 UTC by Martin Husemann
Modified: 2018-05-22 12:23 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
(mostly) mechanical transformation to use memcpy() to access data with unclear alignement (13.05 KB, patch)
2018-05-01 20:14 UTC, Martin Husemann
reviewed Details | Review

Description Martin Husemann 2018-05-01 20:14:27 UTC
Created attachment 371584 [details] [review]
(mostly) mechanical transformation to use memcpy() to access data with unclear alignement

The conversion functions in gggl.c assume they can cast pointers to unrelated types arbitrarily, both for lhs and rhs.

While this accidentally works on x86 and newer arm, it causes bus errors on most other RISC cpus. From a C standard point of view this all is undefined behavior.

Easy work around: use memcpy() and leave all optimizations to the compiler - which will usually understand the target restrictions and create the same code on x86.

I ran into this because the Gimp build on sparc64 crashed for me (so it is not a theoretical problem but actually prevents building of configurations that previously worked fine).

Side note: the attached patch should be carefully reviewed, this kind of mostly mechanical transforms always end up with errors in my experience.
Comment 1 Debarshi Ray 2018-05-02 11:13:37 UTC
Review of attachment 371584 [details] [review]:

Thanks for the patch!

::: extensions/gggl.c.orig
@@ +60,3 @@
+      int   uval;
+
+      memcpy(&f, src, sizeof(f));

I'd think that such code is quite common in babl. eg., see extensions/grey.c.  Are those conversions broken too?

(There's a built-in tool in babl to try out different conversions or you can write a separate toy program as the one in bug 791837, and then comment out competing conversions to ensure that the one you are testing gets executed.)

The src pointer does point to a series of floats, since we are converting single precision floating point values (mostly between 0 and 1) to 8-bit unsigned integers.  The other option might be to specify src as a 'float *', like we do in extensions/CIE.c.  For example,

static void
conv_F_8 (const Babl *conversion,
          float *src,
          unsigned char *dst,
          long samples)
{
  ...

  while (n--)
    {
      int uval = lrint (src[0] * 255.0);
      ...
      dst++;
      src++;
    }
}

Will that unbreak SPARC?  I slightly prefer that because it avoids all the confusing casting back and forth.

I'd also be a bit paranoid about any performance regression because this code is inherently a bit crack-infested to extract the last drops of speed out of the hardware, but the above version seems quite close to what we have and avoids the extra copy and casting.

@@ +61,3 @@
+
+      memcpy(&f, src, sizeof(f));
+      uval = lrint (f * 255.0);

Regardless of the original bug, it might be nice to:

(a) Use lrintf and 255.0f to avoid casting back and forth from double precision floats.  Those casts are cheaper than casting to and from integers, but still, given that these are some of the hottest code paths, it would be nice to be more conservative.

(b) Mark uval as long int because that's what lrint* returns.

@@ +65,3 @@
       if (uval < 0) uval = 0;
       if (uval > 255) uval = 255;
+      *dst = uval;

Good catch!  How about adding a cast on the right hand side because we are going from multi-byte integer to a single byte one:
  *dst = (unsigned char *) uval;
Comment 2 Martin Husemann 2018-05-02 11:22:03 UTC
For the performance: I expect all modern compilers to understand memcpy() (typically it is just a define for __builtin_memcpy()) and optimize it. That is: the performance difference only happens if the target requires something special - on amd64 you will most likely not see any difference
Comment 3 Martin Husemann 2018-05-02 11:30:30 UTC
(oops) any difference in generated object code. But I haven't verified that for all of this change yet.

For coverage/other places: so far I only made sure I could build Gimp.

Changing the function signatures to use the right types will indeed help, but only if you do not have to cast at the call side then.

If you read a binary stream and have a float value at some address that is not dividable by four (that is: sizeof(float), we call that "natural alignment") and you cast the pointer to that byte address to a float*, accessing that pointer will create a bus error on alignment critical architectures. This applies to both reading (the src pointer) and writing (the dst pointer). If you do NOT know the bytes in the stream are properly aligned, memcpy() is the only way to get them into an accessible float value (or store a float value into an arbitrary byte position). If the file format (or whatever) makes sure the address of the float is properly aligned, just casting the pointer is fine (again for both reading and writing).
Comment 4 Øyvind Kolås (pippin) 2018-05-02 11:43:43 UTC
GEGL/GIMP itself tries to 16byte align starts of buffers which should also make floats be properly aligned on 4byte boundaries - though this is to appease SIMD code paths and auto vectorization in GEGL code rather than babl, since we are also integrating with third party libraries that might give use otherwise aligned data.. for instance in file loaders and other third party libraries we integrate with, babl should not crash.

The tools rishi refer to for testing are in the tools subdir primarily tools/babl-verify

Running it with the textual version of the format should benchmark the involved conversions as well as verify that the results produced continue to be correct e.g.:

pippin@yogy:~/src/babl/tools$ BABL_PATH=`pwd`/../extensions/.libs  ./babl-verify "R'G'B'A u8" 'RGBA float' 
/home/pippin/src/babl/tools/../extensions/.libs/gegl-fixups.so 0: R'G'B'A u8 to RGBA float  error:0.000000 cost:19.000000  
/home/pippin/src/babl/tools/../extensions/.libs/gimp-8bit.so 0: R'G'B'A u8 to RGBA float  error:0.000000 cost:19.000000 

will list the direct conversions between formats and if you add another argument it lists the winning conversion-path, which might be multiple smaller conversion steps.

pippin@yogy:~/src/babl/tools$ BABL_PATH=`pwd`/../extensions/.libs  ./babl-verify "R'G'B'A u8" 'RGBA float'   foo
chosen R'G'B'A u8 to RGBA float: steps: 1 error: 0.000000 cost: 18.000000
	/home/pippin/src/babl/tools/../extensions/.libs/gimp-8bit.so 0: R'G'B'A u8 to RGBA float

For an exhaustive testing of babl to smoke out - at least some - further such alignment problems it could also be useful to run tools/babl_fish_path_fitness with output redirected to a file, this tool goes through all permutations of pre-registered pixel formats and thus exercises a lot of code paths that building and running GIMP will not normally trigger.
Comment 5 Debarshi Ray 2018-05-02 13:30:35 UTC
Looks like all three options discussed above generate the same instructions on x86_64:
  (a) Current implementation: https://godbolt.org/g/3k23Js
  (b) With float *src: https://godbolt.org/g/ZL5psF
  (c) With memcpy: https://godbolt.org/g/zJVfGY

That's good!

The reason I am hesitating about the memcpy is that patterns (a) and (b) are used a lot all over GEGL and babl.  So, I am wondering if all those code paths are broken on SPARC or is something particularly interesting about babl/extensions/gggl.c that causes it to crash; because if we are going to fix things to work on SPARC, we might as well fix all of them.  Or at least iteratively work towards such a goal.

(In reply to Martin Husemann from comment #3)
> If you read a binary stream and have a float value at some address that is
> not dividable by four (that is: sizeof(float), we call that "natural
> alignment") and you cast the pointer to that byte address to a float*,
> accessing that pointer will create a bus error on alignment critical
> architectures. This applies to both reading (the src pointer) and writing
> (the dst pointer). If you do NOT know the bytes in the stream are properly
> aligned, memcpy() is the only way to get them into an accessible float value
> (or store a float value into an arbitrary byte position). If the file format
> (or whatever) makes sure the address of the float is properly aligned, just
> casting the pointer is fine (again for both reading and writing).

Umm... ultimately somebody created an array of floats that's getting passed to these functions (eg., conf_F_8).  The function is passed as a pointer to babl_conversion_new, and gets called with this array.  If somebody created a naturally aligned array, wouldn't it crash while populating it?  I don't have access to SPARC hardware so I can't verify any subtleties that might be involved.  Out of curiosity, what were the exact steps in GIMP that trigger the crash?
Comment 6 Martin Husemann 2018-05-07 13:58:45 UTC
(In reply to Debarshi Ray from comment #5)
> Umm... ultimately somebody created an array of floats that's getting passed
> to these functions (eg., conf_F_8).  The function is passed as a pointer to
> babl_conversion_new, and gets called with this array.  If somebody created a
> naturally aligned array, wouldn't it crash while populating it?  I don't
> have access to SPARC hardware so I can't verify any subtleties that might be
> involved.  Out of curiosity, what were the exact steps in GIMP that trigger
> the crash?

If the array is properly allocated (e.g. via malloc() or on the stack as an automatic variable, but that is unlikely here), everything is fine. All accesses to the array will work.

But if you mmap() for example a bitmap file and pass a pointer to some arbitrary offset in there to a function that converts 16bit values to float, the access might crash.

That is: as long as no casts are involved, nothing bad can happen.

Since you said the float array are always created internally, the changes to the float functions I did can probably be removed from the patch (instead removing the superflous casts). I only ran into a crash in one of the non-"F" conv_rgba_* functions during a Gimp build and then saw the multitude of lhs-casts and did a full pass over the file.

If you want to remove casts instead, that would be good and I'll be happy to run tests on sparc.
Comment 7 Øyvind Kolås (pippin) 2018-05-17 08:20:07 UTC
I think the specific cases where unaligned data is used/passed in should be getting fixes then - rather than obfuscating many/all code paths in babl. Some code paths in GEGL even go out of their way in attempting to align data at 16byte offsets.
Comment 8 GNOME Infrastructure Team 2018-05-22 12:23:38 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gegl/issues/69.