GNOME Bugzilla – Bug 599546
Faster GST_WRITE_* macros
Last modified: 2016-11-12 12:38:56 UTC
The current GST_{READ|WRITE} can be optimized to do fast read/write on platforms that accept unaligned read/write.
Created attachment 146202 [details] [review] gstutils: Faster read/write macros On platforms that can do unaligned read/write, we can read/write much faster by just casting.
Created attachment 146203 [details] [review] gstpluginloader: Fix code for new read/write macros
Created attachment 146204 [details] [review] check: Fix gdp test for new read/write macros
The only problem with this patch is that it will break code which doesn't pass a guint8* as target for the WRITE macros.
We removed the "fast" variants a while back because they make it really easy to introduce extremely subtle bugs. For reference: http://cgit.freedesktop.org/gstreamer/gstreamer/commit/?id=22dd04f7e037d13a10f78d060a6df8ea59d5bd70
We previously had something similar but was reverted last year. More details in bug #545714
Yes, on for that reason this bug should probably be closed ;) In places where people know that everything is good and all that, local code could directly access the integers if HAVE_UNALIGNED_ACCESS is defined. If we make the GST_READ/WRITE macros do that again it's only a matter of time until something breaks again. Or we could add something like GST_READ_UINT64_BE_UNSAFE() with a big, fat warning in the docs ;)
It's only the (old-style) WRITE macros aliasing with another READ/WRITE macro that is unsafe. An (old-style) READ macro aliasing with a READ macro is simply reading the same unchanged underlying memory. I'm in favor of restoring some of the faster READ macro capability, either in the default macros or in _FAST versions.
It also seems that latest gcc (since 4.4 maybe ?) will automatically error out on aliasing issues. That's the reason of those two extra commits. If gcc 4.4 now properly errors out... can't we just fix all modules and be done with it ? -- Just ran a comparision of the usage of gstpluginloader.c when recreating an empty registry (rm ~/.gstreamer-0.10/reg* && valgrind --tool=callgrind gst-inspect-0.10) Before : 98 267 instructions called After : 88 740 instructions called so that's 10% shaved off... for free. And the pluginloader doesn't do a *massive* usage of READ/WRITE unlike some parsers/(de)muxers/typefinders
(In reply to comment #9) > It also seems that latest gcc (since 4.4 maybe ?) will automatically error out > on aliasing issues. That's the reason of those two extra commits. > > If gcc 4.4 now properly errors out... can't we just fix all modules and be done > with it ? gcc developers said, that there's no guarantee that gcc will warn about all aliasing issues. That was in gcc 4.2 times, not long ago. Are you sure this has changed? Does gcc 4.4 warn about the aliasing issue in bug #348114? And will gcc 4.4 on x86 warn about possible aliasing issues on other architectures (because of different possible optimizations and stuff, bug #348114 for example only happened on PPC)? If you can answer all those questions it's probably a good idea to change the normal macros, otherwise it's probably better to add new unsafe variants.
How about suggestion (c) from https://bugzilla.gnome.org/show_bug.cgi?id=545714#c0 ? I think we should try to fix the existing macros as best as we can for those platforms where unaligned access is possible and not introduce new _FAST macros.
I agree with Tim here. I do think however that fixing the macros properly involves knowing more about how gcc works, before deciding on the proper way to do it. Possible solutions I can come up with: - make gcc detect the pattern and emit fast code automatically (most work, cleanest solution) - restore the old code with some magic to work around gcc warnings (least work, very ugly) - use memcpy() (will that even be faster? Also, it's still ugly)
Edward, how shall we proceed with this bug?
Ping?
I just re-applied the patches against current git master to check if the optimisation still works and it does. CFLAGS = -march=native -pipe -g2 -O3 -DG_DISABLE_CAST_CHECKS -DG_THREADS_MANDATORY gcc (Gentoo 4.5.2 p1.0, pie-0.4.5) 4.5.2 // AMD Phenom(tm) II X6 1055T Processor gstpluginloader.c:641 magic = GST_READ_UINT32_BE(out + 8); before: movzbl 0x8(%r13),%eax movzbl 0x9(%r13),%ecx shl $0x18,%eax shl $0x10,%ecx or %ecx,%eax movzbl 0xb(%r13),%ecx or %ecx,%eax shl $0x8,%ecx or %ecx,%eax after: mov 0x8(%r13),%edx bswap %edx
As a side note... I could add a unit test to check for the various combinations of read/write/cast to detect at compile time if the compiler is broken...
I'm in favor of applying the _FAST_READ part of the patch (after you change it to _GST_FAST_READ()), since I'm pretty confident of its usefulness and speed. On the write side, mixing sizes of adjacent writes really slows down cpu write combining on current microarchitectures, so I'm uncertain whether _FAST_WRITE() would be faster when mixed with byte writing. I could be convinced by a single data point, however.
- changing existing macros in a way that requires code changes elsewhere is an API break, isn't it? Don't think we can do that (re. the 'fix code for new macros' patches) - adding additional READ_FAST() macros is theoretically not a problem of course, but it just feels a bit fail to me. It feels like it means we can't be arsed to find a better more general solution - _FAST() macros are bound to be used in cases where they shouldn't be used (or are initially used correctly and then something is changed elsewhere and it's not correct any longer). Which is going to lead to hard to track down bugs (who's got 100% code path coverage?) - doing stuff based on HAVE_CPU_XYZ doesn't seem right either: this is re. the host system, isn't it? - I would still like to know how the inline-function + memcpy() approach fares with gcc + MSVC. I seem to remember that the memcpy() is equally efficient. Why try to be smarter than the compiler? Wouldn't it be better in most cases to just let the compiler optimise this?
In addition to the above: - we can't use HAVE_CPU_XYZ in a public header since it comes from our private config.h (even if it was about the target system and not the host) - GST_HAVE_UNALIGNED_ACCESS in gstconfig.h is also broken because it is also derived from the host and not the target, so won't work right when cross-compiling. - as an additional note, even if GST_HAVE_UNALIGNED_ACCESS was fixed, one must not do #ifdef GST_HAVE_UNALIGNED_ACCESS or #ifndef GST_HAVE_UNALIGNED_ACCESS since it gets defined in any case (either to 0 or 1).
We should get this in 0.11
This has nothing to do with 0.10 or 0.11 really, it is not API relevant afaict. It's just that the patch is not correct and needs fixing.
Created attachment 215761 [details] [review] gst-arch: Use target for AG_GST_UNALIGNED_ACCESS Only whitelist/blacklist architectures. Ideally in the future we could re-enable the commented fix if no cross-compiling is taking place
Created attachment 215762 [details] [review] gstutils: Faster read macros On platforms that can do unaligned read/write, we can read/write much faster by just casting.
Comment on attachment 215762 [details] [review] gstutils: Faster read macros >+#ifdef HAVE_UNALIGNED_ACCESS I think there are still two issues with this: a) HAVE_XYZ is from config.h, which is not an installed header, so you can't use this in an installed header (well, you can, but it doesn't do what you want it to do). You can only use defines from gstconfig.h, like GST_HAVE_UNALIGNED_ACCESS b) in configure.ac in core we always define GST_HAVE_UNALIGNED_ACCESS (to either 1 or 0), so doing #ifdef is always true, I think (?).
Created attachment 215826 [details] [review] gstutils: Faster read macros On platforms that can do unaligned read/write, we can read/write much faster by just casting.
Looks good to me now. Perhaps add an explicit #include <gst/gstconfig.h> to gstutils.h as well instead of relying on it being included indirectly.
The remaining problem is the following (with the unit test I pushed yesterday, which is representative of how GST_READ_* is being used): gst/gstutils.c: In function ‘test_read_macros’: gst/gstutils.c:1100:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1100:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1108:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1117:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1117:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1120:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1178:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1184:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1204:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1206:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1209:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] gst/gstutils.c:1211:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] It never fails with a guint8* (or gpointer)... but with a guint<whatever>[] ... it'll complain. The check passes fine fwiw, the question is just : what do we do ? * We don't switch to using those macros (grmbl) * We fix the few plugins that use GST_READ on arrays to being pointers instead there are only a few culprits (one in good, two in bad)
Created attachment 215920 [details] [review] isomp4: Don't use GST_READ_ macros on arrays
Created attachment 215921 [details] [review] schrodec: Don't use GST_READ_ on arrays We can just peek instead
Created attachment 215922 [details] [review] mxfdemux: Don't use GST_READ_ on arrays
Created attachment 215927 [details] [review] gstutils: Faster read macros On platforms that can do unaligned read/write, we can read/write much faster by just casting.
This last variant also handles arrays and requires no changes in any plugin
Committed and pushed to both 0.10 and master branches
Aaaannd re-opening because it was only the read macros that were pushed <facepalm/>
Causes compiler warnings in gnonlin with clang 3.1: CC libgnl_la-gnl.lo In file included from gnl.c:25: In file included from ./gnl.h:24: In file included from /home/slomo/Projects/gstreamer/head/gstreamer/gst/gst.h:82: /home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:120:11: error: cast from 'const guint8 *' (aka 'const unsigned char *') to 'const guint16 *' (aka 'const unsigned short *') increases required alignment from 1 to 2 [-Werror,-Wcast-align] return *(const guint16*)(v); ^~~~~~~~~~~~~~~~~~~ /home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:123:11: error: cast from 'const guint8 *' (aka 'const unsigned char *') to 'const guint32 *' (aka 'const unsigned int *') increases required alignment from 1 to 4 [-Werror,-Wcast-align] return *(const guint32*)(v); ^~~~~~~~~~~~~~~~~~~ /home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:126:11: error: cast from 'const guint8 *' (aka 'const unsigned char *') to 'const guint64 *' (aka 'const unsigned long *') increases required alignment from 1 to 8 [-Werror,-Wcast-align] return *(const guint64*)(v); ^~~~~~~~~~~~~~~~~~~ /home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:129:30: error: cast from 'const guint8 *' (aka 'const unsigned char *') to 'const guint16 *' (aka 'const unsigned short *') increases required alignment from 1 to 2 [-Werror,-Wcast-align] return GUINT16_SWAP_LE_BE(*(const guint16*)(v)); ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ /usr/include/glib-2.0/glib/gtypes.h:313:67: note: expanded from macro 'GUINT16_SWAP_LE_BE' # define GUINT16_SWAP_LE_BE(val) (GUINT16_SWAP_LE_BE_CONSTANT (val)) ^ /usr/include/glib-2.0/glib/gtypes.h:154:27: note: expanded from macro 'GUINT16_SWAP_LE_BE_CONSTANT' (guint16) ((guint16) (val) >> 8) | \ ^~~ In file included from gnl.c:25: In file included from ./gnl.h:24: In file included from /home/slomo/Projects/gstreamer/head/gstreamer/gst/gst.h:82: /home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:129:30: error: cast from 'const guint8 *' (aka 'const unsigned char *') to 'const guint16 *' (aka 'const unsigned short *') increases required alignment from 1 to 2 [-Werror,-Wcast-align] return GUINT16_SWAP_LE_BE(*(const guint16*)(v)); ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ /usr/include/glib-2.0/glib/gtypes.h:313:67: note: expanded from macro 'GUINT16_SWAP_LE_BE' # define GUINT16_SWAP_LE_BE(val) (GUINT16_SWAP_LE_BE_CONSTANT (val)) ^ /usr/include/glib-2.0/glib/gtypes.h:155:27: note: expanded from macro 'GUINT16_SWAP_LE_BE_CONSTANT' (guint16) ((guint16) (val) << 8))) ^~~ In file included from gnl.c:25: In file included from ./gnl.h:24: In file included from /home/slomo/Projects/gstreamer/head/gstreamer/gst/gst.h:82: /home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:132:30: error: cast from 'const guint8 *' (aka 'const unsigned char *') to 'const guint32 *' (aka 'const unsigned int *') increases required alignment from 1 to 4 [-Werror,-Wcast-align] return GUINT32_SWAP_LE_BE(*(const guint32*)(v)); ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ /usr/include/glib-2.0/glib/gtypes.h:315:67: note: expanded from macro 'GUINT32_SWAP_LE_BE' # define GUINT32_SWAP_LE_BE(val) (GUINT32_SWAP_LE_BE_X86_64 (val)) ^ /usr/include/glib-2.0/glib/gtypes.h:294:46: note: expanded from macro 'GUINT32_SWAP_LE_BE_X86_64' ({ register guint32 __v, __x = ((guint32) (val)); \ ^~~ In file included from gnl.c:25: In file included from ./gnl.h:24: In file included from /home/slomo/Projects/gstreamer/head/gstreamer/gst/gst.h:82: /home/slomo/Projects/gstreamer/head/gstreamer/gst/gstutils.h:135:30: error: cast from 'const guint8 *' (aka 'const unsigned char *') to 'const guint64 *' (aka 'const unsigned long *') increases required alignment from 1 to 8 [-Werror,-Wcast-align] return GUINT64_SWAP_LE_BE(*(const guint64*)(v)); ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ /usr/include/glib-2.0/glib/gtypes.h:318:67: note: expanded from macro 'GUINT64_SWAP_LE_BE' # define GUINT64_SWAP_LE_BE(val) (GUINT64_SWAP_LE_BE_X86_64 (val)) ^ /usr/include/glib-2.0/glib/gtypes.h:304:45: note: expanded from macro 'GUINT64_SWAP_LE_BE_X86_64' ({ register guint64 __v, __x = ((guint64) (val)); \ ^~~ 7 errors generated.
Missing here is a fix for the clang compiler warnings and... the write macros :)
I no longer get those warnings with clang 3.3
Let's see what breaks! commit cc5af6273f08e5614bf3a3a4ed455379f8fef22a Author: Tim-Philipp Müller <tim@centricular.com> Date: Sat Nov 12 12:36:05 2016 +0000 utils: faster GST_WRITE_* macros if unaligned access is possible https://bugzilla.gnome.org/show_bug.cgi?id=599546