After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 727464 - ORC does not preserve NEON/VFP registers according to ARM PCS
ORC does not preserve NEON/VFP registers according to ARM PCS
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: orc
unspecified
Other Linux
: Normal critical
: 0.4.23
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-04-01 21:56 UTC by Nathan West
Modified: 2014-12-03 10:29 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Save d8-d15 when generating NEON code (7.06 KB, patch)
2014-10-27 13:21 UTC, Jérôme Laheurte
reviewed Details | Review
Save NEON registers if needed (7.05 KB, patch)
2014-11-26 12:34 UTC, Jérôme Laheurte
committed Details | Review
Fix high-order bit (1.22 KB, patch)
2014-12-03 10:07 UTC, Jérôme Laheurte
committed Details | Review

Description Nathan West 2014-04-01 21:56:03 UTC
Symptoms:
I'm working (on ARM hard float) with several SIMD-optimized subroutines including some written in ORC. Each gets passed through a QA check where one of the values sitting in VFP registers is a tolerance to compare results as a sanity check. This value seems to get over written in some cases where ORC is one of the kernels that gets run.

This only happens when we run ORC generated code on ARM hard float platforms.

Problem:
ORC does not define a method to push (vpush) and pop (vpop) VFP and NEON registers to the stack. The ARM PCS (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf) section 5.1.2.1 states that registers s16-s31 should be preserved across subroutines.

Related Code:
ORC code: https://github.com/gnuradio/gnuradio/blob/master/volk/orc/volk_32fc_s32f_magnitude_16i_a_orc_impl.orc
ASM output: http://pastebin.com/ErSqsXYf

As you can see the ASM output by orcc uses registers d8-d10 (s16-s21) which should be preserved by a vpush in the prologue and vpop in the epilogue. I tested with orc 0.4.17 and 0.4.18.1 from git master (today).
Comment 1 Sebastian Dröge (slomo) 2014-04-04 08:21:21 UTC
This should probably be handled just like the emms for usage of MMX instructions (MMX registers shadow the FPU registers).
Comment 2 Jérôme Laheurte 2014-10-27 13:21:51 UTC
Created attachment 289408 [details] [review]
Save d8-d15 when generating NEON code
Comment 3 Jérôme Laheurte 2014-10-27 13:22:27 UTC
Patch attached. I got bitten by this as well; the patch works for me.
Comment 4 Sebastian Dröge (slomo) 2014-11-26 11:27:56 UTC
Review of attachment 289408 [details] [review]:

::: orc/orcarm.c
@@ +99,3 @@
+    ORC_ASM_CODE(compiler,"}\n");
+
+    orc_arm_emit (compiler, 0xe92d0000 | regs);

Previously this still emitted this even if regs==0. Does this make any functional difference?

::: orc/orcprogram-neon.c
@@ +55,3 @@
+  }
+
+  if (regs || vregs) orc_arm_emit_push (compiler, regs, vregs);

This check shouldn't be needed as you already do that inside push()

@@ +98,3 @@
+  }
+
+  if (regs) orc_arm_emit_pop (compiler, regs, vregs);

And here, also this check seems wrong and should be (regs || vregs) too AFAIU?
Comment 5 Jérôme Laheurte 2014-11-26 12:33:49 UTC
 1. Actually it's better. The spec says that the behavior is undefined if all those bits are 0...

 2. and 3. right; update coming up
Comment 6 Jérôme Laheurte 2014-11-26 12:34:45 UTC
Created attachment 291541 [details] [review]
Save NEON registers if needed
Comment 7 Sebastian Dröge (slomo) 2014-11-27 11:14:40 UTC
Thanks for updating the patch :)

commit acdaac31c648fd10f1bd0a49c4c33b483bb9c35c
Author: Jerome Laheurte <jlaheurte@quividi.com>
Date:   Wed Nov 26 13:29:08 2014 +0100

    Preserve NEON/VFP registers across subroutines according to ARM PCS (5.1.2.1)
    
    https://bugzilla.gnome.org/show_bug.cgi?id=727464
Comment 8 Sebastian Dröge (slomo) 2014-12-03 09:34:36 UTC
Coverity is not very happy about this change :)

>>>     CID 1256553:  Wrong operator used  (CONSTANT_EXPRESSION_RESULT)
>>>     "0xed2d0b00U | ((first & 0x10) << 22)" is always 0xed2d0b00 regardless of the values of its operands. This occurs as the bitwise first operand of '|'.
121         orc_arm_emit (compiler, 0xed2d0b00 | ((first & 0x10) << 22) | ((first & 0x0f) << 12) | (nregs << 1));


>>>     CID 1256554:  Wrong operator used  (CONSTANT_EXPRESSION_RESULT)
>>>     "0xecbd0b00U | ((first & 0x10) << 22)" is always 0xecbd0b00 regardless of the values of its operands. This occurs as the bitwise first operand of '|'.
147         orc_arm_emit (compiler, 0xecbd0b00 | ((first & 0x10) << 22) | ((first & 0x0f) << 12) | (nregs << 1));
Comment 9 Jérôme Laheurte 2014-12-03 10:07:39 UTC
Created attachment 292057 [details] [review]
Fix high-order bit

My bad, 0x10 is already 4 bits to the left of 1 :)
Comment 10 Sebastian Dröge (slomo) 2014-12-03 10:29:10 UTC
Thanks for the fast fix :)

commit 1291f79b81cf212ff14e7caaa29f6742e76f6c5e
Author: Jerome Laheurte <jlaheurte@quividi.com>
Date:   Wed Dec 3 11:03:52 2014 +0100

    Fix high-order bit of first register in VPUSH/VPOP generation
    
    CID 1256553
    CID 1256554
    
    https://bugzilla.gnome.org/show_bug.cgi?id=727464