GNOME Bugzilla – Bug 704420
Perform more sophisticated escape analysis
Last modified: 2018-05-22 14:53:20 UTC
Consider a simple program: int main() { uint64 f = 0; uint64 g = 0; Gee.List<int> list = new Gee.ArrayList<int> (); for (int i = 0; i < 0xFFFFF; i++) { int j = i*i; list.foreach (() => { g += j; return true; }); } for (int i = 0; i < 0xFFFFF; i++) { Test.dummy (); f += g; } Test.devnull (f); return 0; } Even though the closure is unowned and cannot cause any escaping both Vala (in highier-level respect) as well as optimizer treats the code as if the closure was owned. To be more exact (at least) the following optimal cases happens: - The structure is allocated every time in the first loop despite it is not needed. As no variable escapes the method the data may be allocated entirely on stack - As the outer scope block pointer escapes the C compiler cannot assume that Test.devnull does not have pointer to g and it is not using it. Therefore The second loop involves loading and storing the data to stack instead of keeping in register. Sufficiently smart compiler could even simply add to f a value of g * 0xFFFFF and call the Test.dummy () a 0xFFFFF times. The usage of data on stack have had a large impact on timing: % time ./test # The original Vala code ./test 0.27s user 0.00s system 99% cpu 0.275 total % time ./test # The modified code ./test 0.16s user 0.00s system 98% cpu 0.165 total In the test code the usage of restrict on pointers did not improved the code running time probably because gcc-4.8.1 -O2 -march=native was not smart enough to deal with Vala for loop. Lack of restrict (so only the first part is solved) simplifies the code generation at the cost of avoiding potentially beneficial further optimizations. The case where it might matter might look like int main() { uint64 f = 0; uint64 g = 0; bool b = false; Gee.List<int> list = new Gee.ArrayList<int> (); for (int i = 0; i < 0xFFFFF; i++) { int j = i*i; b = list.foreach (() => { g += j; return true; }); } for (int i = 0; i < 0xFFFFF; i++) { if (!b) Test.dummy (); f += g; } Test.devnull (f); return 0; } The second loop could be transformed into (using restrict and 'sufficiently smart C compiler'): for (int i = 0; i < 0xFFFFF; i++) { if (!b) Test.dummy (); } for (int i = 0; i < 0xFFFFF; i++) { f += g; } And further: f += g * 0xFFFFF; if (!b) { for (int i = 0; i < 0xFFFFF; i++) Test.dummy (); } Avoiding the need to execute loop at all.
Created attachment 249435 [details] The modified code (with restrict)
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vala/issues/396.