GNOME Bugzilla – Bug 727464
ORC does not preserve NEON/VFP registers according to ARM PCS
Last modified: 2014-12-03 10:29:12 UTC
Symptoms: I'm working (on ARM hard float) with several SIMD-optimized subroutines including some written in ORC. Each gets passed through a QA check where one of the values sitting in VFP registers is a tolerance to compare results as a sanity check. This value seems to get over written in some cases where ORC is one of the kernels that gets run. This only happens when we run ORC generated code on ARM hard float platforms. Problem: ORC does not define a method to push (vpush) and pop (vpop) VFP and NEON registers to the stack. The ARM PCS (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf) section 5.1.2.1 states that registers s16-s31 should be preserved across subroutines. Related Code: ORC code: https://github.com/gnuradio/gnuradio/blob/master/volk/orc/volk_32fc_s32f_magnitude_16i_a_orc_impl.orc ASM output: http://pastebin.com/ErSqsXYf As you can see the ASM output by orcc uses registers d8-d10 (s16-s21) which should be preserved by a vpush in the prologue and vpop in the epilogue. I tested with orc 0.4.17 and 0.4.18.1 from git master (today).
This should probably be handled just like the emms for usage of MMX instructions (MMX registers shadow the FPU registers).
Created attachment 289408 [details] [review] Save d8-d15 when generating NEON code
Patch attached. I got bitten by this as well; the patch works for me.
Review of attachment 289408 [details] [review]: ::: orc/orcarm.c @@ +99,3 @@ + ORC_ASM_CODE(compiler,"}\n"); + + orc_arm_emit (compiler, 0xe92d0000 | regs); Previously this still emitted this even if regs==0. Does this make any functional difference? ::: orc/orcprogram-neon.c @@ +55,3 @@ + } + + if (regs || vregs) orc_arm_emit_push (compiler, regs, vregs); This check shouldn't be needed as you already do that inside push() @@ +98,3 @@ + } + + if (regs) orc_arm_emit_pop (compiler, regs, vregs); And here, also this check seems wrong and should be (regs || vregs) too AFAIU?
1. Actually it's better. The spec says that the behavior is undefined if all those bits are 0... 2. and 3. right; update coming up
Created attachment 291541 [details] [review] Save NEON registers if needed
Thanks for updating the patch :) commit acdaac31c648fd10f1bd0a49c4c33b483bb9c35c Author: Jerome Laheurte <jlaheurte@quividi.com> Date: Wed Nov 26 13:29:08 2014 +0100 Preserve NEON/VFP registers across subroutines according to ARM PCS (5.1.2.1) https://bugzilla.gnome.org/show_bug.cgi?id=727464
Coverity is not very happy about this change :) >>> CID 1256553: Wrong operator used (CONSTANT_EXPRESSION_RESULT) >>> "0xed2d0b00U | ((first & 0x10) << 22)" is always 0xed2d0b00 regardless of the values of its operands. This occurs as the bitwise first operand of '|'. 121 orc_arm_emit (compiler, 0xed2d0b00 | ((first & 0x10) << 22) | ((first & 0x0f) << 12) | (nregs << 1)); >>> CID 1256554: Wrong operator used (CONSTANT_EXPRESSION_RESULT) >>> "0xecbd0b00U | ((first & 0x10) << 22)" is always 0xecbd0b00 regardless of the values of its operands. This occurs as the bitwise first operand of '|'. 147 orc_arm_emit (compiler, 0xecbd0b00 | ((first & 0x10) << 22) | ((first & 0x0f) << 12) | (nregs << 1));
Created attachment 292057 [details] [review] Fix high-order bit My bad, 0x10 is already 4 bits to the left of 1 :)
Thanks for the fast fix :) commit 1291f79b81cf212ff14e7caaa29f6742e76f6c5e Author: Jerome Laheurte <jlaheurte@quividi.com> Date: Wed Dec 3 11:03:52 2014 +0100 Fix high-order bit of first register in VPUSH/VPOP generation CID 1256553 CID 1256554 https://bugzilla.gnome.org/show_bug.cgi?id=727464