Bug 599546 – Faster GST_WRITE_* macros

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 599546 - Faster GST_WRITE_* macros


Summary:	Faster GST_WRITE_* macros


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gstreamer (core)
Version:	git master
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	1.11.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-10-25 12:24 UTC by Edward Hervey
Modified:	2016-11-12 12:38 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
gstutils: Faster read/write macros (10.55 KB, patch) 2009-10-25 12:25 UTC, Edward Hervey	rejected	Details \| Review
gstpluginloader: Fix code for new read/write macros (1.14 KB, patch) 2009-10-25 12:25 UTC, Edward Hervey	rejected	Details \| Review
check: Fix gdp test for new read/write macros (1.01 KB, patch) 2009-10-25 12:25 UTC, Edward Hervey	rejected	Details \| Review
gst-arch: Use target for AG_GST_UNALIGNED_ACCESS (2.70 KB, patch) 2012-06-06 17:01 UTC, Edward Hervey	rejected	Details \| Review
gstutils: Faster read macros (5.29 KB, patch) 2012-06-06 17:03 UTC, Edward Hervey	rejected	Details \| Review
gstutils: Faster read macros (5.29 KB, patch) 2012-06-07 08:41 UTC, Edward Hervey	rejected	Details \| Review
isomp4: Don't use GST_READ_ macros on arrays (4.57 KB, patch) 2012-06-08 09:20 UTC, Edward Hervey	rejected	Details \| Review
schrodec: Don't use GST_READ_ on arrays (1.87 KB, patch) 2012-06-08 09:24 UTC, Edward Hervey	rejected	Details \| Review
mxfdemux: Don't use GST_READ_ on arrays (1.51 KB, patch) 2012-06-08 09:24 UTC, Edward Hervey	rejected	Details \| Review
gstutils: Faster read macros (5.78 KB, patch) 2012-06-08 11:00 UTC, Edward Hervey	committed	Details \| Review

Description Edward Hervey 2009-10-25 12:24:30 UTC

The current GST_{READ|WRITE} can be optimized to do fast read/write on platforms that accept unaligned read/write.

Comment 1 Edward Hervey 2009-10-25 12:25:47 UTC

Created attachment 146202 [details] [review]
gstutils: Faster read/write macros

On platforms that can do unaligned read/write, we can read/write much faster
by just casting.

Comment 2 Edward Hervey 2009-10-25 12:25:51 UTC

Created attachment 146203 [details] [review]
gstpluginloader: Fix code for new read/write macros

Comment 3 Edward Hervey 2009-10-25 12:25:55 UTC

Created attachment 146204 [details] [review]
check: Fix gdp test for new read/write macros

Comment 4 Edward Hervey 2009-10-25 14:01:56 UTC

The only problem with this patch is that it will break code which doesn't pass a guint8* as target for the WRITE macros.

Comment 5 Tim-Philipp Müller 2009-10-26 10:13:12 UTC

We removed the "fast" variants a while back because they make it really easy to introduce extremely subtle bugs. For reference:

http://cgit.freedesktop.org/gstreamer/gstreamer/commit/?id=22dd04f7e037d13a10f78d060a6df8ea59d5bd70

Comment 6 Edward Hervey 2009-10-26 14:59:36 UTC

We previously had something similar but was reverted last year.

More details in bug #545714

Comment 7 Sebastian Dröge (slomo) 2009-10-27 07:01:21 UTC

Yes, on for that reason this bug should probably be closed ;)

In places where people know that everything is good and all that, local code could directly access the integers if HAVE_UNALIGNED_ACCESS is defined. If we make the GST_READ/WRITE macros do that again it's only a matter of time until something breaks again.

Or we could add something like GST_READ_UINT64_BE_UNSAFE() with a big, fat warning in the docs ;)

Comment 8 David Schleef 2009-10-27 07:21:34 UTC

It's only the (old-style) WRITE macros aliasing with another READ/WRITE macro that is unsafe.  An (old-style) READ macro aliasing with a READ macro is simply reading the same unchanged underlying memory.

I'm in favor of restoring some of the faster READ macro capability, either in the default macros or in _FAST versions.

Comment 9 Edward Hervey 2009-10-27 07:42:32 UTC

It also seems that latest gcc (since 4.4 maybe ?) will automatically error out on aliasing issues. That's the reason of those two extra commits.

If gcc 4.4 now properly errors out... can't we just fix all modules and be done with it ?

--

Just ran a comparision of the usage of gstpluginloader.c when recreating an empty registry (rm ~/.gstreamer-0.10/reg* && valgrind --tool=callgrind gst-inspect-0.10)

Before : 98 267 instructions called
After  : 88 740 instructions called

so that's 10% shaved off... for free. And the pluginloader doesn't do a *massive* usage of READ/WRITE unlike some parsers/(de)muxers/typefinders

Comment 10 Sebastian Dröge (slomo) 2009-10-27 08:34:29 UTC

(In reply to comment #9)
> It also seems that latest gcc (since 4.4 maybe ?) will automatically error out
> on aliasing issues. That's the reason of those two extra commits.
> 
> If gcc 4.4 now properly errors out... can't we just fix all modules and be done
> with it ?

gcc developers said, that there's no guarantee that gcc will warn about all aliasing issues. That was in gcc 4.2 times, not long ago. Are you sure this has changed? Does gcc 4.4 warn about the aliasing issue in bug #348114? And will gcc 4.4 on x86 warn about possible aliasing issues on other architectures (because of different possible optimizations and stuff, bug #348114 for example only happened on PPC)? If you can answer all those questions it's probably a good idea to change the normal macros, otherwise it's probably better to add new unsafe variants.

Comment 11 Tim-Philipp Müller 2009-10-27 10:31:49 UTC

How about suggestion (c) from https://bugzilla.gnome.org/show_bug.cgi?id=545714#c0 ?

I think we should try to fix the existing macros as best as we can for those platforms where unaligned access is possible and not introduce new _FAST macros.

Comment 12 Benjamin Otte (Company) 2009-10-27 12:10:05 UTC

I agree with Tim here.

I do think however that fixing the macros properly involves knowing more about how gcc works, before deciding on the proper way to do it.
Possible solutions I can come up with:
- make gcc detect the pattern and emit fast code automatically (most work, cleanest solution)
- restore the old code with some magic to work around gcc warnings (least work, very ugly)
- use memcpy() (will that even be faster? Also, it's still ugly)

Comment 13 Sebastian Dröge (slomo) 2009-11-06 14:20:21 UTC

Edward, how shall we proceed with this bug?

Comment 14 Sebastian Dröge (slomo) 2011-05-30 16:03:01 UTC

Ping?

Comment 15 Edward Hervey 2011-05-30 16:29:11 UTC

I just re-applied the patches against current git master to check if the optimisation still works and it does.

CFLAGS = -march=native -pipe -g2 -O3 -DG_DISABLE_CAST_CHECKS -DG_THREADS_MANDATORY

gcc (Gentoo 4.5.2 p1.0, pie-0.4.5) 4.5.2 // AMD Phenom(tm) II X6 1055T Processor

gstpluginloader.c:641
  magic = GST_READ_UINT32_BE(out + 8);

before:
  movzbl   0x8(%r13),%eax
  movzbl   0x9(%r13),%ecx
  shl      $0x18,%eax
  shl      $0x10,%ecx
  or       %ecx,%eax
  movzbl   0xb(%r13),%ecx
  or       %ecx,%eax
  shl      $0x8,%ecx
  or       %ecx,%eax

after:
  mov      0x8(%r13),%edx
  bswap    %edx

Comment 16 Edward Hervey 2011-05-30 16:37:12 UTC

As a side note... I could add a unit test to check for the various combinations of read/write/cast to detect at compile time if the compiler is broken...

Comment 17 David Schleef 2011-05-30 20:04:03 UTC

I'm in favor of applying the _FAST_READ part of the patch (after you change it to _GST_FAST_READ()), since I'm pretty confident of its usefulness and speed.

On the write side, mixing sizes of adjacent writes really slows down cpu write combining on current microarchitectures, so I'm uncertain whether _FAST_WRITE() would be faster when mixed with byte writing.  I could be convinced by a single data point, however.

Comment 18 Tim-Philipp Müller 2011-05-31 10:59:16 UTC

 - changing existing macros in a way that requires code
   changes elsewhere is an API break, isn't it? Don't think
   we can do that (re. the 'fix code for new macros' patches)

 - adding additional READ_FAST() macros is theoretically
   not a problem of course, but it just feels a bit fail to me.
   It feels like it means we can't be arsed to find a better
   more general solution

 - _FAST() macros are bound to be used in cases where
    they shouldn't be used (or are initially used correctly
    and then something is changed elsewhere and it's not
    correct any longer). Which is going to lead to hard to
    track down bugs (who's got 100% code path coverage?)

 - doing stuff based on HAVE_CPU_XYZ doesn't seem
   right either: this is re. the host system, isn't it?

 - I would still like to know how the inline-function + memcpy()
   approach fares with gcc + MSVC. I seem to remember that
   the memcpy() is equally efficient. Why try to be smarter
   than the compiler? Wouldn't it be better in most cases to
   just let the compiler optimise this?

Comment 19 Tim-Philipp Müller 2012-03-06 00:18:54 UTC

In addition to the above:

 - we can't use HAVE_CPU_XYZ in a public header since it
   comes from our private config.h (even if it was about the
   target system and not the host)

 - GST_HAVE_UNALIGNED_ACCESS in gstconfig.h is also
   broken because it is also derived from the host and not the
   target, so won't work right when cross-compiling.

 - as an additional note, even if GST_HAVE_UNALIGNED_ACCESS
   was fixed, one must not do
   #ifdef GST_HAVE_UNALIGNED_ACCESS
   or
   #ifndef GST_HAVE_UNALIGNED_ACCESS
   since it gets defined in any case (either to 0 or 1).

Comment 20 Edward Hervey 2012-06-05 09:33:41 UTC

We should get this in 0.11

Comment 21 Tim-Philipp Müller 2012-06-05 11:19:11 UTC

This has nothing to do with 0.10 or 0.11 really, it is not API relevant afaict. It's just that the patch is not correct and needs fixing.

Comment 22 Edward Hervey 2012-06-06 17:01:45 UTC

Created attachment 215761 [details] [review]
gst-arch: Use target for AG_GST_UNALIGNED_ACCESS

Only whitelist/blacklist architectures.

Ideally in the future we could re-enable the commented fix if
no cross-compiling is taking place

Comment 23 Edward Hervey 2012-06-06 17:03:59 UTC

Created attachment 215762 [details] [review]
gstutils: Faster read macros

On platforms that can do unaligned read/write, we can read/write much faster
by just casting.

Comment 24 Tim-Philipp Müller 2012-06-06 17:36:25 UTC

Comment on attachment 215762 [details] [review]
gstutils: Faster read macros

>+#ifdef HAVE_UNALIGNED_ACCESS

I think there are still two issues with this:

a) HAVE_XYZ is from config.h, which is not an installed header, so you can't use this in an installed header (well, you can, but it doesn't do what you want it to do). You can only use defines from gstconfig.h, like GST_HAVE_UNALIGNED_ACCESS

b) in configure.ac in core we always define GST_HAVE_UNALIGNED_ACCESS (to either 1 or 0), so doing #ifdef is always true, I think (?).

Comment 25 Edward Hervey 2012-06-07 08:41:02 UTC

Created attachment 215826 [details] [review]
gstutils: Faster read macros

On platforms that can do unaligned read/write, we can read/write much faster
by just casting.

Comment 26 Tim-Philipp Müller 2012-06-07 08:51:01 UTC

Looks good to me now.

Perhaps add an explicit #include <gst/gstconfig.h> to gstutils.h as well instead of relying on it being included indirectly.

Comment 27 Edward Hervey 2012-06-08 07:54:43 UTC

The remaining problem is the following (with the unit test I pushed yesterday, which is representative of how GST_READ_* is being used):

gst/gstutils.c: In function ‘test_read_macros’:
gst/gstutils.c:1100:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1100:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1108:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1117:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1117:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1120:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1178:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1184:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1204:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1206:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1209:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
gst/gstutils.c:1211:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

It never fails with a guint8* (or gpointer)... but with a guint<whatever>[] ... it'll complain.

The check passes fine fwiw, the question is just : what do we do ?
* We don't switch to using those macros (grmbl)
* We fix the few plugins that use GST_READ on arrays to being pointers instead

there are only a few culprits (one in good, two in bad)

Comment 28 Edward Hervey 2012-06-08 09:20:34 UTC

Created attachment 215920 [details] [review]
isomp4: Don't use GST_READ_ macros on arrays

Comment 29 Edward Hervey 2012-06-08 09:24:15 UTC

Created attachment 215921 [details] [review]
schrodec: Don't use GST_READ_ on arrays

We can just peek instead

Comment 30 Edward Hervey 2012-06-08 09:24:18 UTC

Created attachment 215922 [details] [review]
mxfdemux: Don't use GST_READ_ on arrays

Comment 31 Edward Hervey 2012-06-08 11:00:05 UTC

Created attachment 215927 [details] [review]
gstutils: Faster read macros

On platforms that can do unaligned read/write, we can read/write much faster
by just casting.

Comment 32 Edward Hervey 2012-06-08 11:01:39 UTC

This last variant also handles arrays and requires no changes in any plugin

Comment 33 Edward Hervey 2012-06-08 13:15:02 UTC

Committed and pushed to both 0.10 and master branches

Comment 34 Edward Hervey 2012-06-08 13:28:01 UTC

Aaaannd re-opening because it was only the read macros that were pushed <facepalm/>

Comment 35 Sebastian Dröge (slomo) 2012-06-11 08:07:12 UTC

Causes compiler warnings in gnonlin with clang 3.1:

  CC     libgnl_la-gnl.lo
In file included from gnl.c:25:
In file included from ./gnl.h:24:
In file included from /home/slomo/Projects/gstreamer/head/gstreamer/gst/gst.h:82:
/home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:120:11: error: 
      cast from 'const guint8 *' (aka 'const unsigned char *') to
      'const guint16 *' (aka 'const unsigned short *') increases required
      alignment from 1 to 2 [-Werror,-Wcast-align]
  return *(const guint16*)(v);
          ^~~~~~~~~~~~~~~~~~~
/home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:123:11: error: 
      cast from 'const guint8 *' (aka 'const unsigned char *') to
      'const guint32 *' (aka 'const unsigned int *') increases required
      alignment from 1 to 4 [-Werror,-Wcast-align]
  return *(const guint32*)(v);
          ^~~~~~~~~~~~~~~~~~~
/home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:126:11: error: 
      cast from 'const guint8 *' (aka 'const unsigned char *') to
      'const guint64 *' (aka 'const unsigned long *') increases required
      alignment from 1 to 8 [-Werror,-Wcast-align]
  return *(const guint64*)(v);
          ^~~~~~~~~~~~~~~~~~~
/home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:129:30: error: 
      cast from 'const guint8 *' (aka 'const unsigned char *') to
      'const guint16 *' (aka 'const unsigned short *') increases required
      alignment from 1 to 2 [-Werror,-Wcast-align]
  return GUINT16_SWAP_LE_BE(*(const guint16*)(v));
         ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/usr/include/glib-2.0/glib/gtypes.h:313:67: note: expanded from macro
      'GUINT16_SWAP_LE_BE'
#    define GUINT16_SWAP_LE_BE(val) (GUINT16_SWAP_LE_BE_CONSTANT (val))
                                                                  ^
/usr/include/glib-2.0/glib/gtypes.h:154:27: note: expanded from macro
      'GUINT16_SWAP_LE_BE_CONSTANT'
    (guint16) ((guint16) (val) >> 8) |  \
                          ^~~
In file included from gnl.c:25:
In file included from ./gnl.h:24:
In file included from /home/slomo/Projects/gstreamer/head/gstreamer/gst/gst.h:82:
/home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:129:30: error: 
      cast from 'const guint8 *' (aka 'const unsigned char *') to
      'const guint16 *' (aka 'const unsigned short *') increases required
      alignment from 1 to 2 [-Werror,-Wcast-align]
  return GUINT16_SWAP_LE_BE(*(const guint16*)(v));
         ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/usr/include/glib-2.0/glib/gtypes.h:313:67: note: expanded from macro
      'GUINT16_SWAP_LE_BE'
#    define GUINT16_SWAP_LE_BE(val) (GUINT16_SWAP_LE_BE_CONSTANT (val))
                                                                  ^
/usr/include/glib-2.0/glib/gtypes.h:155:27: note: expanded from macro
      'GUINT16_SWAP_LE_BE_CONSTANT'
    (guint16) ((guint16) (val) << 8)))
                          ^~~
In file included from gnl.c:25:
In file included from ./gnl.h:24:
In file included from /home/slomo/Projects/gstreamer/head/gstreamer/gst/gst.h:82:
/home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:132:30: error: 
      cast from 'const guint8 *' (aka 'const unsigned char *') to
      'const guint32 *' (aka 'const unsigned int *') increases required
      alignment from 1 to 4 [-Werror,-Wcast-align]
  return GUINT32_SWAP_LE_BE(*(const guint32*)(v));
         ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/usr/include/glib-2.0/glib/gtypes.h:315:67: note: expanded from macro
      'GUINT32_SWAP_LE_BE'
#      define GUINT32_SWAP_LE_BE(val) (GUINT32_SWAP_LE_BE_X86_64 (val))
                                                                  ^
/usr/include/glib-2.0/glib/gtypes.h:294:46: note: expanded from macro
      'GUINT32_SWAP_LE_BE_X86_64'
         ({ register guint32 __v, __x = ((guint32) (val));      \
                                                    ^~~
In file included from gnl.c:25:
In file included from ./gnl.h:24:
In file included from /home/slomo/Projects/gstreamer/head/gstreamer/gst/gst.h:82:
/home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:135:30: error: 
      cast from 'const guint8 *' (aka 'const unsigned char *') to
      'const guint64 *' (aka 'const unsigned long *') increases required
      alignment from 1 to 8 [-Werror,-Wcast-align]
  return GUINT64_SWAP_LE_BE(*(const guint64*)(v));
         ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/usr/include/glib-2.0/glib/gtypes.h:318:67: note: expanded from macro
      'GUINT64_SWAP_LE_BE'
#      define GUINT64_SWAP_LE_BE(val) (GUINT64_SWAP_LE_BE_X86_64 (val))
                                                                  ^
/usr/include/glib-2.0/glib/gtypes.h:304:45: note: expanded from macro
      'GUINT64_SWAP_LE_BE_X86_64'
        ({ register guint64 __v, __x = ((guint64) (val));       \
                                                   ^~~
7 errors generated.

Comment 36 Sebastian Dröge (slomo) 2013-07-24 08:26:59 UTC

Missing here is a fix for the clang compiler warnings and... the write macros :)

Comment 37 Edward Hervey 2013-07-24 08:50:55 UTC

I no longer get those warnings with clang 3.3

Comment 38 Tim-Philipp Müller 2016-11-12 12:38:56 UTC

Let's see what breaks!

commit cc5af6273f08e5614bf3a3a4ed455379f8fef22a
Author: Tim-Philipp Müller <tim@centricular.com>
Date:   Sat Nov 12 12:36:05 2016 +0000

    utils: faster GST_WRITE_* macros if unaligned access is possible
    
    https://bugzilla.gnome.org/show_bug.cgi?id=599546