GNOME Bugzilla – Bug 645850
Vala produces incorrect code for foreach loop on list and map
Last modified: 2017-02-20 18:24:10 UTC
Code: foreach (G *ptr in list) { free (ptr); } Produced: _ptr_list = _tmp0_; _tmp1_ = gee_collection_get_size ((GeeCollection*) _ptr_list); _ptr_size = _tmp1_; _ptr_index = -1; while (TRUE) { gpointer _tmp2_ = NULL; gconstpointer* ptr; _ptr_index = _ptr_index + 1; if (!(_ptr_index < _ptr_size)) { break; } _tmp2_ = gee_list_get (_ptr_list, _ptr_index); ptr = _tmp2_; self->free (ptr); } _g_object_unref0 (_ptr_list); Why it is not optimal: Assume list is linked list. Then the iteration is done in O (N^2) instead of O (N).
It should have used the iterator() method, looks like a regression.
Commit 3439cb38316dc1f1834 uses get() explicitly for performance, but obviously doesn't work for linked lists. Workaround is to manually use the iterator instead of foreach.
(In reply to comment #2) > Commit 3439cb38316dc1f1834 uses get() explicitly for performance, but obviously > doesn't work for linked lists. Hmm. From my tests for small lists it gives performance to linked lists. However new foreach function seems to really shine when it is specialised. The build is from git with custom foreach for ArrayList. +-----------------------+-------+-------+-------+--------+ | | 15 | 255 | 511 | 1023 | +------------+----------+-------+-------+-------+--------+ | ArrayList | Get | 0.013 | 0.118 | 0.236 | 0.559 | | | Array | 0.461 | 2.873 | 5.662 | 10.882 | | | Foreach | 0.298 | 0.342 | 0.430 | 0.589 | | | Iterator | 0.466 | 3.070 | 5.640 | 11.765 | +------------+----------+-------+-------+-------+--------+ | LinkedList | Get | 0.039 | 1.911 | 7.281 | 26.730 | | | Array | 0.623 | 2.905 | 5.898 | 11.108 | | | Foreach | 0.428 | 3.144 | 5.739 | 11.028 | | | Iterator | 0.478 | 3.115 | 5.608 | 10.678 | +------------+----------+-------+-------+-------+--------+ using Gee; const int array_size = 255; struct DevNull { int devnull; } int main () { #if AL var list = new ArrayList<int> (); #else var list = new LinkedList<int> (); #endif for (int i = 0; i < array_size; i++) list.add (i); DevNull dn = DevNull(); for (int i = 0; i < 0xffff; i++) { #if GET int size = list.size; for (int j = 0; j < size; j++) dn.devnull = list[j]; #elif ARRAY var array = list.to_array (); foreach (var j in array) dn.devnull = j; #elif FOREACH list.iterator ().foreach ((i) => { dn.devnull = i; }); #else for (var iter = list.iterator (); iter.next ();) dn.devnull = iter.get (); #endif } return 0; }
After specialising foreach for LinkedList: (In reply to comment #3) > +------------------------+-------+-------+-------+--------+ > | | 15 | 255 | 511 | 1023 | > +------------+-----------+-------+-------+-------+--------+ > | ArrayList | Get | 0.013 | 0.118 | 0.236 | 0.559 | > | | Array | 0.461 | 2.873 | 5.662 | 10.882 | > | | Foreach | 0.298 | 0.342 | 0.430 | 0.589 | > | | Iterator | 0.466 | 3.070 | 5.640 | 11.765 | > +------------+-----------+-------+-------+-------+--------+ > | LinkedList | Get | 0.039 | 1.911 | 7.281 | 26.730 | > | | Array | 0.623 | 2.905 | 5.898 | 11.108 | > | | Foreach | 0.428 | 3.144 | 5.739 | 11.028 | | | Foreach' | 0.296 | 0.335 | 0.437 | 1.120 | > | | Iterator | 0.478 | 3.115 | 5.608 | 10.678 | > +------------+-----------+-------+-------+-------+--------+ Conclusion: - I belive that performance is gained by ommiting the virtual calls - The creation of single object have impact on performance on very small lists - It will/may behave correctly if the collection underlaying iteration changes while it cannot do this when using get () - foreach seems to be a good balance. May be it is time to finish functional iterators, release 0.7.0 and ask Vala devs to transform (I'll talk with Jürg and Ptitjes about it)? foreach (var item in collection) { ... } collection.iterator ().foreach ((tem => { .... }); Specs: CFLAGS (for program): -O2 -march=native -pipe -gdb CFLAGS (for system): unspoken horrors gcc: 4.6.0 Processor: Core 2 Duo T9600 2.8 GHz (underclocked to 2.6) Host/Build/Target: x86_64-pc-linux-gnu
Also with not-so hypothetical ConcurrentLinkedList the following code change meaning: list.add (0); list.add (1); list.add (2); Thread.create (() => { foreach (uint x in list) { stdout.printf ("%d\n", x); } }, false); Thread.create (() => { for (int i = 0;; i++) list.insert (0, i); }, false); - If get is used first thread may never terminate and will print some numbers in some order - If any other method is used the first method will first print numbers 0..N (for some N) in descending order and then 0, 1 and 2
Libgee 0.8.0 was released. Currently, according to benchmarks, calling foreach is the most optimal method of iterating over list (on master foreach is twice as fast as current iterating over list using get).
Updated for current 0.8 branch (and different system so please don't compare it with previous results: +-------------------------------+-------+-------+-------+--------+ | | 15 | 255 | 511 | 1023 | +------------+------------------+-------+-------+-------+--------+ | ArrayList | Get | 0.016 | 0.101 | 0.214 | 0.393 | | | Array | 0.489 | 2.846 | 5.298 | 10.564 | | | Foreach | 0.014 | 0.186 | 0.188 | 0.375 | | | Iterator+Foreach | 0.207 | 0.293 | 0.382 | 0.593 | | | Iterator | 0.334 | 2.807 | 5.419 | 10.743 | +------------+------------------+-------+-------+-------+--------+ | LinkedList | Get | 0.024 | 1.116 | 4.507 | 18.465 | | | Array | 0.398 | 2.801 | 5.411 | 10.484 | | | Foreach | 0.011 | 0.208 | 0.209 | 0.434 | | | Iterator+Foreach | 0.179 | 0.274 | 0.412 | 0.583 | | | Iterator | 0.329 | 2.787 | 5.335 | 10.541 | +------------+------------------+-------+-------+-------+--------+ Code: using Gee; const int array_size = 1023; struct DevNull { int devnull; } int main () { #if AL var list = new ArrayList<int> (); #else var list = new LinkedList<int> (); #endif for (int i = 0; i < array_size; i++) list.add (i); DevNull dn = DevNull(); for (int i = 0; i < 0xffff; i++) { #if GET int size = list.size; for (int j = 0; j < size; j++) dn.devnull = list[j]; #elif ARRAY var array = list.to_array (); foreach (var j in array) dn.devnull = j; #elif FOREACH list.foreach ((i) => { dn.devnull = i; return true; }); #elif FOREACH_ITERATOR list.iterator ().foreach ((i) => { dn.devnull = i; return true; }); #else for (var iter = list.iterator (); iter.next ();) dn.devnull = iter.get (); #endif } return 0; }
Should the title of the bug be changed? Technically the behaviour is documented now but it doesn't change that iteration on lists is inefficient by default.
*** Bug 664630 has been marked as a duplicate of this bug. ***
It looks like this bug breaks iterations on sets as well as there is get method there (see bug #664630 for details).
Can I ask what's the major overhead in using the iterator? Because, before fixing this in vala, it means that iterator is slow also in C... it must not be slow in any case, regardless of vala. For example, what happens if you create a straight concrete ArrayList or LinkedList without any abstraction/inheritance?
(In reply to comment #11) > Can I ask what's the major overhead in using the iterator? Because, before > fixing this in vala, it means that iterator is slow also in C... it must not be > slow in any case, regardless of vala. > Compared to get/set_size? The only possible difference is the creation of an object IMHO which is not avoidable. Currently the fastest method is using .foreach (see benchmarks). Both of them would improve compared to foreach by the v-call optimizations I wrote about. EDIT: Hmm. Looking once again on benchmarks it looks like the difference doesn't fully explain the difference. My second guess would be additional if put into .next which cannot be optimized out by compiler (and shouldn't be in general as it is important for multi-thread collections). > For example, what happens if you create a straight concrete ArrayList or > LinkedList without any abstraction/inheritance? As this depends only on existence of size property and get method it is even triggered by maps (bug #664630) which subsequently crash/produce strange results as Vala passes int instead of proper key. using Gee; int main() { Map<string, string> map = new HashMap<string, string>(); map.set("a", "b"); foreach (Map.Entry<string, string> e in map) { stdout.printf ("map[%s] = %s\n", e.key, e.value); } } EDIT. It looks like it is 'fixed' before 0.20 producing following error: test.vala:6.5-6.48: error: Argument 1: Cannot convert from `int' to `string' foreach (Map.Entry<string, string> e in map) { ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Compilation failed: 1 error(s), 0 warning(s)
(In reply to comment #12) > (In reply to comment #11) > > Can I ask what's the major overhead in using the iterator? Because, before > > fixing this in vala, it means that iterator is slow also in C... it must not be > > slow in any case, regardless of vala. > > > > Compared to get/set_size? The only possible difference is the creation of an > object IMHO which is not avoidable. Currently the fastest method is using > .foreach (see benchmarks). Both of them would improve compared to foreach by > the v-call optimizations I wrote about. > > EDIT: Hmm. Looking once again on benchmarks it looks like the difference > doesn't fully explain the difference. My second guess would be additional if > put into .next which cannot be optimized out by compiler (and shouldn't be in > general as it is important for multi-thread collections). > > > For example, what happens if you create a straight concrete ArrayList or > > LinkedList without any abstraction/inheritance? > > As this depends only on existence of size property and get method it is even > triggered by maps (bug #664630) which subsequently crash/produce strange > results as Vala passes int instead of proper key. I meant, what's the performance without the virtual method indirections? I'd like to understand why using the iterator is slow.
> I meant, what's the performance without the virtual method indirections? > I'd like to understand why using the iterator is slow. I tested and disprove that it is due to asserts - they did not have measurable impact. My current guess is that Iterator have following disadvantages: a) memory allocation which contributed ~0.2 s to the total time (compared with ~0.4 for total iteration through the whole array using foreach or get) b) many indirections. The ArrayList.Iterator.get getting item is _list._items[_index]; which is equivalent to self->priv->_list->priv->_items[self->priv->_index] which, counting in the optimization is 6 pointer dereferencing per loop iteration. Moving it to _list means that simply 2 indirection per loop iteration is performed. (Random idea - should we emit the Private struct if the class is internal or private? It would save a 1 dereferencing) c) virtual methods overhead. The Vala, similarly as C++ compilers, knows a lot about layout. However exploiting it is non-trivial. We can get the vpointer out of class and use it as mentioned in my blog post[1] and it would be efficient BUT we need to ensure that the backend knows that the method there does not change. This have 2 subproblems: - We need to ensure that it really does not change. I.e. the vtable is the same as during the previous time - We need to signal it to the backend. The simplest way is to store it to the variable - We need to ensure that the pointer is really what we want to call (It would be technically possible in C to have inside the method a prolog and epilog before calling the pointer) Please note that currently the virtual calls are 2-3 indirections from memory (PLT, interface and optionally abstract). d) Less of the problem in ArrayList but iterator also keeps a bit of bookkeeping. Foreach greatly simplifies it since we know the previous state and we can ensure that the main loop is in very simple form (no .next after .remove). Oh - and we haven't exploited the possibilities fully yet in ArrayList so it might improve further. I can dig into the performance without indirection but I'll take a bit to get true results (with PLT & everything). My guess is that it might be simpler (in short term) just to use .foreach if it exists and generate method on-the-fly then to try to fix all of the problems. [1] http://blog.piechotka.com.pl/2013/05/18/vala-abi-and-branch-prediction/
How would you break out of the loop with the foreach() method? How do you handle errors? How do you handle possible async calls?
(In reply to comment #15) > How would you break out of the loop with the foreach() method? By returned value ('generated' code in pseudo Vala/C + comments): foreach (string f in col) { if (f == "SKIP") { continue; } stdout.printf ("%s\n", f); if (f == "END") { break; } } ===> col.foreach (() => { if (f == "SKIP") { return true; } stdout.printf ("%s\n", f); if (f == "END") { // False means - do not continue return false; } return true; }); > How do you handle errors? I guess simplest way is: foreach (string f in col) { stdout.printf ("%s\n", f); if (f == "ERROR") { throw Some.Exception ("SomeException"); } } ===> GLib.Error? error = null' col.foreach ((f) => { stdout.printf ("%s\n", f); if (f == "ERROR") { error = Some.Exception ("SomeException"); return false; } return true; }); if (error != null) { throw error; } > How do you handle possible async calls? That one's in plans for some time (~ when the libgee 0.8 planning started IIRC) but it would not be possible to solve without async delegates (bug #604827). Arguably if there is yield in the body then the code overhead might be larger (I/O, indirect calls anyways etc.) and fallback to iterators would seems sensible. Alternatively the code can be transformed by allocating iterator and using it in following fashion: foreach (string f in col) { stdout.printf ("%s\n", f); if (f == "DO SOMETHING") { yield do_something (f); } do_something_else (f); } ===> bool closure (IterData data) { goto(data.state); state_0: stdout.printf ("%s\n", data.f); if (f == "DO SOMETHING") { data.state = 1; do_something.begin (() => { if (closure (data)) { // Iteration finished - call back to helper helper (data); } else { // Another yield - do nothing } }); return false; state_1: } do_something_else (f); return true; } // Should not really have finish in idle or similar async void helper (IterData data) { // The iteration: // - Did not begun so this moves to the first value (so no value is skipped) // - Was broken by yield so the iterator pointed to last ('old') value. Now it moved to first unprocessed value if (data.iter.next ()) { bool res = data.iter.foreach ((f) => { data.f = f; return closure (data); }); if (res) { // True means that the iteration was not broken by returned value data.callback (); } } else { // Iteration finished data.callback (); } } IterData data = IterData (); data.iter = col.iterator (); data.callback = self.callback(); helper (); yield; With following note: - The above is probably an inefficient emulation of async delegates and at least as hard to implement as they are - In code that yields the cost of the I/O, threading etc. probably dominates so the iteration efficiency with yields is probably not the most important part. I guess it might be not as trivial to implement as current foreach iteration but it should be easier then tracking all of the Iterator inefficiencies.
> My current guess is that Iterator have following disadvantages: > > a) memory allocation which contributed ~0.2 s to the total time (compared > with ~0.4 for total iteration through the whole array using foreach or get) > b) many indirections. The ArrayList.Iterator.get getting item is > _list._items[_index]; which is equivalent to > self->priv->_list->priv->_items[self->priv->_index] which, counting in the > optimization is 6 pointer dereferencing per loop iteration. Moving it to _list > means that simply 2 indirection per loop iteration is performed. (Random idea - > should we emit the Private struct if the class is internal or private? It would > save a 1 dereferencing) > c) virtual methods overhead. The Vala, similarly as C++ compilers, knows a > lot about layout. However exploiting it is non-trivial. We can get the vpointer > out of class and use it as mentioned in my blog post[1] and it would be > efficient BUT we need to ensure that the backend knows that the method there > does not change. This have 2 subproblems: > - We need to ensure that it really does not change. I.e. the vtable is > the same as during the previous time > - We need to signal it to the backend. The simplest way is to store it to > the variable > - We need to ensure that the pointer is really what we want to call (It > would be technically possible in C to have inside the method a prolog and > epilog before calling the pointer) > Please note that currently the virtual calls are 2-3 indirections from > memory (PLT, interface and optionally abstract). > d) Less of the problem in ArrayList but iterator also keeps a bit of > bookkeeping. Foreach greatly simplifies it since we know the previous state and > we can ensure that the main loop is in very simple form (no .next after > .remove). Oh - and we haven't exploited the possibilities fully yet in > ArrayList so it might improve further. > e) The .get method (both in interator and collection) copy the data. At best it is comparation against null per iteration (if the value is unowned) which cannot be optimized out. In most cases it is atomic increment (and corresponding decrement) and at worst it may be full memory allocation and copying of 'real' data and then freeing.
I don't see any bug here. Closing. If you need to discuss further, please re-open and keep tags for 1.2 or 2.0 milestones.