Bug 645850 – Vala produces incorrect code for foreach loop on list and map

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 645850 - Vala produces incorrect code for foreach loop on list and map


Summary:	Vala produces incorrect code for foreach loop on list and map


Status:	RESOLVED FIXED

Product:	vala
Classification:	Core
Component:	Control Flow Statements
Version:	0.18.x
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	1.2
Assigned To:	Vala maintainers
QA Contact:	Vala maintainers

URL:
Whiteboard:

Duplicates:	664630 (view as bug list)
Depends on:
Blocks:

Reported:	2011-03-27 18:37 UTC by Maciej (Matthew) Piechotka
Modified:	2017-02-20 18:24 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Maciej (Matthew) Piechotka 2011-03-27 18:37:41 UTC

Code:

			foreach (G *ptr in list) {
				free (ptr);
			}

Produced:

                _ptr_list = _tmp0_;
                _tmp1_ = gee_collection_get_size ((GeeCollection*) _ptr_list);
                _ptr_size = _tmp1_;
                _ptr_index = -1;
                while (TRUE) {
                        gpointer _tmp2_ = NULL;
                        gconstpointer* ptr;
                        _ptr_index = _ptr_index + 1;
                        if (!(_ptr_index < _ptr_size)) {
                                break;
                        }
                        _tmp2_ = gee_list_get (_ptr_list, _ptr_index);
                        ptr = _tmp2_;
                        self->free (ptr);
                }
                _g_object_unref0 (_ptr_list);

Why it is not optimal:

Assume list is linked list. Then the iteration is done in O (N^2) instead of O (N).

Comment 1 Luca Bruno 2011-04-19 14:46:22 UTC

It should have used the iterator() method, looks like a regression.

Comment 2 Luca Bruno 2011-04-19 20:03:00 UTC

Commit 3439cb38316dc1f1834 uses get() explicitly for performance, but obviously doesn't work for linked lists.
Workaround is to manually use the iterator instead of foreach.

Comment 3 Maciej (Matthew) Piechotka 2011-04-19 21:52:55 UTC

(In reply to comment #2)
> Commit 3439cb38316dc1f1834 uses get() explicitly for performance, but obviously
> doesn't work for linked lists.

Hmm. From my tests for small lists it gives performance to linked lists. However new foreach function seems to really shine when it is specialised. 

The build is from git with custom foreach for ArrayList.

+-----------------------+-------+-------+-------+--------+
|                       |   15  |  255  |  511  |  1023  |
+------------+----------+-------+-------+-------+--------+
| ArrayList  | Get      | 0.013 | 0.118 | 0.236 |  0.559 |
|            | Array    | 0.461 | 2.873 | 5.662 | 10.882 |
|            | Foreach  | 0.298 | 0.342 | 0.430 |  0.589 |
|            | Iterator | 0.466 | 3.070 | 5.640 | 11.765 |
+------------+----------+-------+-------+-------+--------+
| LinkedList | Get      | 0.039 | 1.911 | 7.281 | 26.730 |
|            | Array    | 0.623 | 2.905 | 5.898 | 11.108 |
|            | Foreach  | 0.428 | 3.144 | 5.739 | 11.028 |
|            | Iterator | 0.478 | 3.115 | 5.608 | 10.678 |
+------------+----------+-------+-------+-------+--------+

using Gee;

const int array_size = 255;

struct DevNull {
	int devnull;
}

int main () {
#if AL
	var list = new ArrayList<int> ();
#else
	var list = new LinkedList<int> ();
#endif
	for (int i = 0; i < array_size; i++)
		list.add (i);
	DevNull dn = DevNull();
	for (int i = 0; i < 0xffff; i++) {
#if GET
		int size = list.size;
		for (int j = 0; j < size; j++)
			dn.devnull = list[j];
#elif ARRAY
		var array = list.to_array ();
		foreach (var j in array)
			dn.devnull = j;
#elif FOREACH
		list.iterator ().foreach ((i) => {
			dn.devnull = i;
		});
#else
		for (var iter = list.iterator (); iter.next ();)
			dn.devnull = iter.get ();
#endif
	}
	return 0;
}

Comment 4 Maciej (Matthew) Piechotka 2011-04-20 06:39:33 UTC

After specialising foreach for LinkedList:

(In reply to comment #3)
> +------------------------+-------+-------+-------+--------+
> |                        |   15  |  255  |  511  |  1023  |
> +------------+-----------+-------+-------+-------+--------+
> | ArrayList  | Get       | 0.013 | 0.118 | 0.236 |  0.559 |
> |            | Array     | 0.461 | 2.873 | 5.662 | 10.882 |
> |            | Foreach   | 0.298 | 0.342 | 0.430 |  0.589 |
> |            | Iterator  | 0.466 | 3.070 | 5.640 | 11.765 |
> +------------+-----------+-------+-------+-------+--------+
> | LinkedList | Get       | 0.039 | 1.911 | 7.281 | 26.730 |
> |            | Array     | 0.623 | 2.905 | 5.898 | 11.108 |
> |            | Foreach   | 0.428 | 3.144 | 5.739 | 11.028 |
  |            | Foreach'  | 0.296 | 0.335 | 0.437 |  1.120 |
> |            | Iterator  | 0.478 | 3.115 | 5.608 | 10.678 |
> +------------+-----------+-------+-------+-------+--------+

Conclusion:
 - I belive that performance is gained by ommiting the virtual calls
 - The creation of single object have impact on performance on very small lists
 - It will/may behave correctly if the collection underlaying iteration changes while it cannot do this when using get ()
 - foreach seems to be a good balance. May be it is time to finish functional iterators, release 0.7.0 and ask Vala devs to transform (I'll talk with Jürg and Ptitjes about it)?

foreach (var item in collection) {
    ...
}

collection.iterator ().foreach ((tem => {
    ....
});

Specs:

CFLAGS (for program): -O2 -march=native -pipe -gdb
CFLAGS (for system): unspoken horrors
gcc: 4.6.0
Processor: Core 2 Duo T9600 2.8 GHz (underclocked to 2.6)
Host/Build/Target: x86_64-pc-linux-gnu

Comment 5 Maciej (Matthew) Piechotka 2011-04-27 15:53:36 UTC

Also with not-so hypothetical ConcurrentLinkedList the following code change meaning:

list.add (0);
list.add (1);
list.add (2);

Thread.create (() => {
    foreach (uint x in list) {
         stdout.printf ("%d\n", x);
    }
}, false);

Thread.create (() => {
    for (int i = 0;; i++)
        list.insert (0, i);
}, false);

- If get is used first thread may never terminate and will print some numbers in some order
- If any other method is used the first method will first print numbers 0..N (for some N) in descending order and then 0, 1 and 2

Comment 6 Maciej (Matthew) Piechotka 2012-09-30 06:46:21 UTC

Libgee 0.8.0 was released. Currently, according to benchmarks, calling foreach is the most optimal method of iterating over list (on master foreach is twice as fast as current iterating over list using get).

Comment 7 Maciej (Matthew) Piechotka 2012-10-02 09:01:56 UTC

Updated for current 0.8 branch (and different system so please don't compare it with previous results:

+-------------------------------+-------+-------+-------+--------+
|                               |   15  |  255  |  511  |  1023  |
+------------+------------------+-------+-------+-------+--------+
| ArrayList  | Get              | 0.016 | 0.101 | 0.214 |  0.393 |
|            | Array            | 0.489 | 2.846 | 5.298 | 10.564 |
|            | Foreach          | 0.014 | 0.186 | 0.188 |  0.375 |
|            | Iterator+Foreach | 0.207 | 0.293 | 0.382 |  0.593 |
|            | Iterator         | 0.334 | 2.807 | 5.419 | 10.743 |
+------------+------------------+-------+-------+-------+--------+
| LinkedList | Get              | 0.024 | 1.116 | 4.507 | 18.465 |
|            | Array            | 0.398 | 2.801 | 5.411 | 10.484 |
|            | Foreach          | 0.011 | 0.208 | 0.209 |  0.434 |
|            | Iterator+Foreach | 0.179 | 0.274 | 0.412 |  0.583 |
|            | Iterator         | 0.329 | 2.787 | 5.335 | 10.541 |
+------------+------------------+-------+-------+-------+--------+

Code:

using Gee;

const int array_size = 1023;

struct DevNull {
    int devnull;
}

int main () {
#if AL
    var list = new ArrayList<int> ();
#else
    var list = new LinkedList<int> ();
#endif
    for (int i = 0; i < array_size; i++)
        list.add (i);
    DevNull dn = DevNull();
    for (int i = 0; i < 0xffff; i++) {
#if GET
        int size = list.size;
        for (int j = 0; j < size; j++)
            dn.devnull = list[j];
#elif ARRAY
        var array = list.to_array ();
        foreach (var j in array)
            dn.devnull = j;
#elif FOREACH
	list.foreach ((i) => {
	    dn.devnull = i;
	    return true;
	});
#elif FOREACH_ITERATOR
        list.iterator ().foreach ((i) => {
            dn.devnull = i;
	    return true;
        });
#else
        for (var iter = list.iterator (); iter.next ();)
            dn.devnull = iter.get ();
#endif
    }
    return 0;
}

Comment 8 Maciej (Matthew) Piechotka 2013-03-22 14:34:03 UTC

Should the title of the bug be changed? Technically the behaviour is documented now but it doesn't change that iteration on lists is inefficient by default.

Comment 9 Maciej (Matthew) Piechotka 2013-05-11 19:18:35 UTC

*** Bug 664630 has been marked as a duplicate of this bug. ***

Comment 10 Maciej (Matthew) Piechotka 2013-05-11 19:20:04 UTC

It looks like this bug breaks iterations on sets as well as there is get method there (see bug #664630 for details).

Comment 11 Luca Bruno 2013-05-15 07:45:06 UTC

Can I ask what's the major overhead in using the iterator? Because, before fixing this in vala, it means that iterator is slow also in C... it must not be slow in any case, regardless of vala.

For example, what happens if you create a straight concrete ArrayList or LinkedList without any abstraction/inheritance?

Comment 12 Maciej (Matthew) Piechotka 2013-05-15 09:20:03 UTC

(In reply to comment #11)
> Can I ask what's the major overhead in using the iterator? Because, before
> fixing this in vala, it means that iterator is slow also in C... it must not be
> slow in any case, regardless of vala.
> 

Compared to get/set_size? The only possible difference is the creation of an object IMHO which is not avoidable. Currently the fastest method is using .foreach (see benchmarks). Both of them would improve compared to foreach by the v-call optimizations I wrote about.

EDIT: Hmm. Looking once again on benchmarks it looks like the difference doesn't fully explain the difference. My second guess would be additional if put into .next which cannot be optimized out by compiler (and shouldn't be in general as it is important for multi-thread collections).

> For example, what happens if you create a straight concrete ArrayList or
> LinkedList without any abstraction/inheritance?

As this depends only on existence of size property and get method it is even triggered by maps (bug #664630) which subsequently crash/produce strange results as Vala passes int instead of proper key.

using Gee;

int main() {
    Map<string, string> map = new HashMap<string, string>();
    map.set("a", "b");
    foreach (Map.Entry<string, string> e in map) {
        stdout.printf ("map[%s] = %s\n", e.key, e.value);
    }
}

EDIT. It looks like it is 'fixed' before 0.20 producing following error:

test.vala:6.5-6.48: error: Argument 1: Cannot convert from `int' to `string'
    foreach (Map.Entry<string, string> e in map) {
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Compilation failed: 1 error(s), 0 warning(s)

Comment 13 Luca Bruno 2013-05-15 09:59:56 UTC

(In reply to comment #12)
> (In reply to comment #11)
> > Can I ask what's the major overhead in using the iterator? Because, before
> > fixing this in vala, it means that iterator is slow also in C... it must not be
> > slow in any case, regardless of vala.
> > 
> 
> Compared to get/set_size? The only possible difference is the creation of an
> object IMHO which is not avoidable. Currently the fastest method is using
> .foreach (see benchmarks). Both of them would improve compared to foreach by
> the v-call optimizations I wrote about.
> 
> EDIT: Hmm. Looking once again on benchmarks it looks like the difference
> doesn't fully explain the difference. My second guess would be additional if
> put into .next which cannot be optimized out by compiler (and shouldn't be in
> general as it is important for multi-thread collections).
> 
> > For example, what happens if you create a straight concrete ArrayList or
> > LinkedList without any abstraction/inheritance?
> 
> As this depends only on existence of size property and get method it is even
> triggered by maps (bug #664630) which subsequently crash/produce strange
> results as Vala passes int instead of proper key.

I meant, what's the performance without the virtual method indirections?
I'd like to understand why using the iterator is slow.

Comment 14 Maciej (Matthew) Piechotka 2013-07-17 20:33:15 UTC

> I meant, what's the performance without the virtual method indirections?
> I'd like to understand why using the iterator is slow.

I tested and disprove that it is due to asserts - they did not have measurable impact.

My current guess is that Iterator have following disadvantages:

   a) memory allocation which contributed ~0.2 s to the total time (compared with ~0.4 for total iteration through the whole array using foreach or get)
   b) many indirections. The ArrayList.Iterator.get getting item is _list._items[_index]; which is equivalent to self->priv->_list->priv->_items[self->priv->_index] which, counting in the optimization is 6 pointer dereferencing per loop iteration. Moving it to _list means that simply 2 indirection per loop iteration is performed. (Random idea - should we emit the Private struct if the class is internal or private? It would save a 1 dereferencing)
   c) virtual methods overhead. The Vala, similarly as C++ compilers, knows a lot about layout. However exploiting it is non-trivial. We can get the vpointer out of class and use it as mentioned in my blog post[1] and it would be efficient BUT we need to ensure that the backend knows that the method there does not change. This have 2 subproblems:
      - We need to ensure that it really does not change. I.e. the vtable is the same as during the previous time
      - We need to signal it to the backend. The simplest way is to store it to the variable
      - We need to ensure that the pointer is really what we want to call (It would be technically possible in C to have inside the method a prolog and epilog before calling the pointer)
      Please note that currently the virtual calls are 2-3 indirections from memory (PLT, interface and optionally abstract).
  d) Less of the problem in ArrayList but iterator also keeps a bit of bookkeeping. Foreach greatly simplifies it since we know the previous state and we can ensure that the main loop is in very simple form (no .next after .remove). Oh - and we haven't exploited the possibilities fully yet in ArrayList so it might improve further.

I can dig into the performance without indirection but I'll take a bit to get true results (with PLT & everything). My guess is that it might be simpler (in short term) just to use .foreach if it exists and generate method on-the-fly then to try to fix all of the problems.

[1] http://blog.piechotka.com.pl/2013/05/18/vala-abi-and-branch-prediction/

Comment 15 Luca Bruno 2013-07-18 07:28:29 UTC

How would you break out of the loop with the foreach() method? How do you handle errors? How do you handle possible async calls?

Comment 16 Maciej (Matthew) Piechotka 2013-07-18 19:41:59 UTC

(In reply to comment #15)
> How would you break out of the loop with the foreach() method?

By returned value ('generated' code in  pseudo Vala/C + comments):

foreach (string f in col) {
    if (f == "SKIP") {
        continue;
    }
    stdout.printf ("%s\n", f);
    if (f == "END") {
        break;
    }
}

===>

col.foreach (() => {
    if (f == "SKIP") {
        return true;
    }
    stdout.printf ("%s\n", f);
     if (f == "END") {
        // False means - do not continue
        return false;
    }
    return true;
});

> How do you handle errors?

I guess simplest way is:

foreach (string f in col) {
     stdout.printf ("%s\n", f);
     if (f == "ERROR") {
         throw Some.Exception ("SomeException");
     }
}

===>

GLib.Error? error = null'
col.foreach ((f) => {
    stdout.printf ("%s\n", f);
    if (f == "ERROR") {
        error = Some.Exception ("SomeException");
        return false;
    }
    return true;
});
if (error != null) {
    throw error;
}

> How do you handle possible async calls?

That one's in plans for some time (~ when the libgee 0.8 planning started IIRC) but it would not be possible to solve without async delegates (bug #604827). Arguably if there is yield in the body then the code overhead might be larger (I/O, indirect calls anyways etc.) and fallback to iterators would seems sensible.

Alternatively the code can be transformed by allocating iterator and using it in following fashion:

foreach (string f in col) {
     stdout.printf ("%s\n", f);
     if (f == "DO SOMETHING") {
         yield do_something (f);
     }
     do_something_else (f);
}

===>

bool closure (IterData data) {
    goto(data.state);
state_0:
    stdout.printf ("%s\n", data.f);
    if (f == "DO SOMETHING") {
        data.state = 1;
        do_something.begin (() => {
            if (closure (data)) {
                // Iteration finished - call back to helper
                helper (data);
            } else {
                // Another yield - do nothing
            }
        });
        return false;
state_1:
    }
    do_something_else (f);
    return true;
}

// Should not really have finish in idle or similar
async void helper (IterData data) {
    // The iteration:
    //   - Did not begun so this moves to the first value (so no value is skipped)
    //   - Was broken by yield so the iterator pointed to last ('old') value. Now it moved to first unprocessed value
    if (data.iter.next ()) {
        bool res = data.iter.foreach ((f) => {
            data.f = f;
            return closure (data);
        });
        if (res) {
            // True means that the iteration was not broken by returned value
            data.callback ();
        }
    } else {
        // Iteration finished
        data.callback ();
    }
}

IterData data = IterData ();
data.iter = col.iterator ();
data.callback = self.callback();
helper ();
yield;

With following note:
 - The above is probably an inefficient emulation of async delegates and at least as hard to implement as they are
 - In code that yields the cost of the I/O, threading etc. probably dominates so the iteration efficiency with yields is probably not the most important part. 

I guess it might be not as trivial to implement as current foreach iteration but it should be easier then tracking all of the Iterator inefficiencies.

Comment 17 Maciej (Matthew) Piechotka 2013-08-01 11:51:21 UTC

> My current guess is that Iterator have following disadvantages:
> 
>    a) memory allocation which contributed ~0.2 s to the total time (compared
> with ~0.4 for total iteration through the whole array using foreach or get)
>    b) many indirections. The ArrayList.Iterator.get getting item is
> _list._items[_index]; which is equivalent to
> self->priv->_list->priv->_items[self->priv->_index] which, counting in the
> optimization is 6 pointer dereferencing per loop iteration. Moving it to _list
> means that simply 2 indirection per loop iteration is performed. (Random idea -
> should we emit the Private struct if the class is internal or private? It would
> save a 1 dereferencing)
>    c) virtual methods overhead. The Vala, similarly as C++ compilers, knows a
> lot about layout. However exploiting it is non-trivial. We can get the vpointer
> out of class and use it as mentioned in my blog post[1] and it would be
> efficient BUT we need to ensure that the backend knows that the method there
> does not change. This have 2 subproblems:
>       - We need to ensure that it really does not change. I.e. the vtable is
> the same as during the previous time
>       - We need to signal it to the backend. The simplest way is to store it to
> the variable
>       - We need to ensure that the pointer is really what we want to call (It
> would be technically possible in C to have inside the method a prolog and
> epilog before calling the pointer)
>       Please note that currently the virtual calls are 2-3 indirections from
> memory (PLT, interface and optionally abstract).
>   d) Less of the problem in ArrayList but iterator also keeps a bit of
> bookkeeping. Foreach greatly simplifies it since we know the previous state and
> we can ensure that the main loop is in very simple form (no .next after
> .remove). Oh - and we haven't exploited the possibilities fully yet in
> ArrayList so it might improve further.
> 

 e) The .get method (both in interator and collection) copy the data. At best it is comparation against null per iteration  (if the value is unowned) which cannot be optimized out. In most cases it is atomic increment (and corresponding decrement) and at worst it may be full memory allocation and copying of 'real' data and then freeing.

Comment 18 Daniel Espinosa 2017-02-20 18:24:10 UTC

I don't see any bug here. Closing.

If you need to discuss further, please re-open and keep tags for 1.2 or 2.0 milestones.