After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 767219 - Unaligned access across babl source provokes crashes on sparc*
Unaligned access across babl source provokes crashes on sparc*
Status: RESOLVED OBSOLETE
Product: GEGL
Classification: Other
Component: babl
0.1.0
Other Linux
: Normal normal
: ---
Assigned To: Default Gegl Component Owner
Default Gegl Component Owner
Depends on:
Blocks:
 
 
Reported: 2016-06-03 20:51 UTC by John Paul Adrian Glaubitz
Modified: 2018-05-22 12:13 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description John Paul Adrian Glaubitz 2016-06-03 20:51:06 UTC
Hello!

We're currently shaping up Debian's sparc64 port to make it fit for release after Linux sparc64 development has been picked up by Oracle as upstream.

While working through packages which are failing to build on sparc64, I stumbled over gegl which fails to build from source with a bus error [1]. This is an indicator for unaligned access which is not allowed one some architectures like sparc64.

After some quick debugging with gdb, I was able to localize the problem in babl [2]:

Program received signal SIGBUS, Bus error.
0xfff80001009d5338 in conv_rgb8_rgba8 (
    src=0x22b7d3
"\373\327\253\276\242\\܄c\373\247\330\306\376\306\311{\347\340h\202G\374\316\323(d\246\205b\266\210\237\321\063g(\332\327\363\270\210ζ\271\261Y\333a\353\213K\246\313\320\371\374\375\\\265\350i\315\317\353\366\367\317\356\264_ĻlIW8\247k\355N\376\330\324\302\a\355\353\343Ψ\267\350ʢ\245\275\262~\271\313ױ\341ɱ\207\070\210\362\354Ӵ\354\367\322\362\016\215\314\333\351`\253\347\352ī\220ә\373}\347\374\213\351\363\264\346\332C\226\374`U\372\362\345\244yӯ\373\251\340{\334\062\300\367\321ݞ\262ӭý\274\223\221\325옍\232\325H\264幘\255\372\224\225\337\366n\265\334\373\031G\326",
<incomplete sequence \363>...,
    dst=0x22ec14 "", samples=128) at gggl.c:758
758	      *(unsigned int *) dst = (*(unsigned int *) src) | (255 << 24);

As you can see, this particular kind of pointer arithmetics provokes unaligned access. This is because both src and dst are actually declared as a pointer of char, then later cast into pointer of unsigned int.

According to the C99 specification, declarations which refer to the same object or function must have compatible types, otherwise the behavior is undefined [3] (chapter 6.2.7, paragraph 2).

So, either the declarations have to be fixed or the direct assignments have to be replaced with memcpy which guarantees aligned access.

With that in mind, the above example:

      *(unsigned int *) dst = (*(unsigned int *) src) | (255 << 24);

becomes:

      unsigned int tmp;
      memcpy(&tmp, src, sizeof(unsigned int));
      tmp = tmp | (255 << 24);
      memcpy(dst, &tmp, sizeof(unsigned int));

For some reason, lt-babl_fish_path_dhtml will then hang after making this change, so I'm not a 100% sure whether this particular change has other ramifications.

Either way, the pointer arithmetics should be replaced with memcpy to guarantee aligned access. On x86, memcpy will not produce any additional overhead while, on sparc*, it will actually fix the unaligned access.

Thanks,
Adrian

> [1] https://buildd.debian.org/status/fetch.php?pkg=gegl&arch=sparc64&ver=0.3.6-4&stamp=1464963616
> [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=806208
> [3] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
Comment 1 John Paul Adrian Glaubitz 2016-06-03 21:29:19 UTC
(In reply to John Paul Adrian Glaubitz from comment #0)
> For some reason, lt-babl_fish_path_dhtml will then hang after making this
> change, so I'm not a 100% sure whether this particular change has other
> ramifications.

Ok, it wasn't actually stuck. It just takes a bit longer. So the patch does not seem to have any negative impact :). Will do more testing and report back.

Adrian
Comment 2 Massimo 2016-06-04 05:15:16 UTC
(In reply to John Paul Adrian Glaubitz from comment #0)
> Hello!
> 
> We're currently shaping up Debian's sparc64 port to make it fit for release
> after Linux sparc64 development has been picked up by Oracle as upstream.
> 
> While working through packages which are failing to build on sparc64, I
> stumbled over gegl which fails to build from source with a bus error [1].
> This is an indicator for unaligned access which is not allowed one some
> architectures like sparc64.
> 
> After some quick debugging with gdb, I was able to localize the problem in
> babl [2]:
> 
> Program received signal SIGBUS, Bus error.
> 0xfff80001009d5338 in conv_rgb8_rgba8 (
>     src=0x22b7d3
> "\373\327\253\276\242\\܄c\373\247\330\306\376\306\311{\347\340h\202G\374\316\
> 323(d\246\205b\266\210\237\321\063g(\332\327\363\270\210ζ\271\261Y\333a\353\2
> 13K\246\313\320\371\374\375\\\265\350i\315\317\353\366\367\317\356\264_ĻlIW8
> \247k\355N\376\330\324\302\a\355\353\343Ψ\267\350ʢ\245\275\262~\271\313ױ\341ɱ
> \207\070\210\362\354Ӵ\354\367\322\362\016\215\314\333\351`\253\347\352ī\220ә\
> 373}\347\374\213\351\363\264\346\332C\226\374`U\372\362\345\244yӯ\373\251\340
> {\334\062\300\367\321ݞ\262ӭý\274\223\221\325옍\232\325H\264幘\255\372\224\225\3
> 37\366n\265\334\373\031G\326",
> <incomplete sequence \363>...,
>     dst=0x22ec14 "", samples=128) at gggl.c:758
> 758	      *(unsigned int *) dst = (*(unsigned int *) src) | (255 << 24);
> 
> As you can see, this particular kind of pointer arithmetics provokes
> unaligned access. This is because both src and dst are actually declared as
> a pointer of char, then later cast into pointer of unsigned int.
> 
> According to the C99 specification, declarations which refer to the same
> object or function must have compatible types, otherwise the behavior is
> undefined [3] (chapter 6.2.7, paragraph 2).
> 
> So, either the declarations have to be fixed or the direct assignments have
> to be replaced with memcpy which guarantees aligned access.
> 
> With that in mind, the above example:
> 
>       *(unsigned int *) dst = (*(unsigned int *) src) | (255 << 24);
> 
> becomes:
> 
>       unsigned int tmp;
>       memcpy(&tmp, src, sizeof(unsigned int));
>       tmp = tmp | (255 << 24);
>       memcpy(dst, &tmp, sizeof(unsigned int));
> 

These extensions are fast paths, if they don't work they
must be conditionally compiled only where they work.

I mean probably there are many more similar issues
if you plan to use GEGL/babl besides building it.

Anyway the code is this:

https://git.gnome.org/browse/babl/tree/extensions/gggl.c#n753

if you want to make a portable C version, you can write the 
loop like (untested): 

  long n = samples;

  while (n--)
   {
     dst[0] = src[0];
     dst[1] = src[1];
     dst[2] = src[2];
     dst[3] = 255;

     src   += 3;
     dst   += 4;
   }
Comment 3 John Paul Adrian Glaubitz 2016-06-06 12:33:50 UTC
(In reply to Massimo from comment #2)

> if you want to make a portable C version, you can write the 
> loop like (untested): 
> 
>   long n = samples;
> 
>   while (n--)
>    {
>      dst[0] = src[0];
>      dst[1] = src[1];
>      dst[2] = src[2];
>      dst[3] = 255;
> 
>      src   += 3;
>      dst   += 4;
>    }

I think you have to change the code that way anyway as the current code

  *(unsigned int *) dst = (*(unsigned int *) src) | (255 << 24);

makes assumptions on the endianess, doesn't it? I would assume the code to produce incorrect results on Big-Endian systems.

Adrian
Comment 4 Øyvind Kolås (pippin) 2016-06-06 13:18:49 UTC
If endianness causes a issues for a babl extension/fast-path it isn't really an issue, since it should cause the fast path to be discarded during the runtime competition/comparison between various fast and reference paths.
Comment 5 John Paul Adrian Glaubitz 2016-06-06 13:26:01 UTC
(In reply to Øyvind Kolås from comment #4)
> If endianness causes a issues for a babl extension/fast-path it isn't really
> an issue, since it should cause the fast path to be discarded during the
> runtime competition/comparison between various fast and reference paths.

Just to understand the reasoning correctly: The above casting and assignment of the char buffer as an unsigned integer buffer was chosen to make reduce for char assignments to one unsigned int assignment with a shift to make the code faster, correct?

If yes, I wonder whether this would actually make the code faster because as far as I know, the compiler would come up with the best optimization anyway when using something like memcpy plus one assignment with dst[3] = 255.
Comment 6 Øyvind Kolås (pippin) 2016-06-06 14:59:41 UTC
babl is designed to have multiple fast paths for the same conversions, and do its own run-time profiling. You can check if it would make the code faster with the default babl compilation flags and your C compiler and libc - by adding another fast path (or a copy of the .so under a different name) and examine the report in babl-fish-paths.html in docs.

When using -O0 -no-inline, musl instead of glibc or a different compiler than gcc one might very well lose out on inlined gcc. Whether optimized code using an inlined memcpy would be faster than manual integer + byte assignments is also hard to determine by guesswork without profiling.
Comment 7 GNOME Infrastructure Team 2018-05-22 12:13:55 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gegl/issues/36.