GNOME Bugzilla – Bug 775437
bayer2rgb: vectorized horiz_upsample cannot be compiled on arm7a
Last modified: 2018-11-03 14:01:02 UTC
Created attachment 341096 [details] Log of running a pipeline with bayer2rgb and ORC_DEBUG=10 I've filed this for v1.4.5 (since this is what I'm using now), but I checked upstream. There is no difference in the relevant code. The version of liborc used is 0.4.23, but again there does not seem to be any commit upstream that could affect this. When running a pipeline with bayer2rgb and ORC_DEBUG=10 on an embedded platform (armv7a), bayer_orc_horiz_upsample() cannot be compiled, so orc will use the fallback C version (see orc-log.txt). I've patched gstbayer2rgb.c, so the alternative bayer_orc_horiz_upsample_unaligned() is used on ARMv7a (see attached patch). This JIT-compiles fine. But frankly, I do not understand the difference between the two variants and why this did the trick. There are no code comments and no explanations. Perhaps one of the maintainers could elaborate on this. To my even bigger surprise, using the vectorized bayer_orc_horiz_upsample_unaligned() seems to bring no significant speed increase versus the fallback bayer_orc_horiz_upsample() version. But with about 90MB/s of debayered buffers being produced by my pipeline, I might also have run into a memory bandwidth bottleneck here (although the bottleneck would be unlikely low) or into some other bottleneck of course. In general, the liborc vectorizations do speed up things at least 1,7 times.
Created attachment 341097 [details] [review] Proposed patch, enabling bayer_orc_horiz_upsample_unaligned() on ARMv7a
Created attachment 341100 [details] [review] Proposed patch, enabling bayer_orc_horiz_upsample_unaligned() on ARM Enables bayer_orc_horiz_upsample_unaligned() on all ARM targets, assuming that the underlying issue is ARM-related.
ARM does not allow unaligned memory accesses though
(In reply to Sebastian Dröge (slomo) from comment #3) > ARM does not allow unaligned memory accesses though Mhh. How comes then that it does not crash at runtime and the image looks OK? Do you even necessarily have to take alignment into account when working with orc opcodes? Perhaps the code generator is already working around unaligned memory accesses on relevant targets.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/488.