After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 653239 - Use a more iteration-friendly implementation of Set internally
Use a more iteration-friendly implementation of Set internally
Status: RESOLVED FIXED
Product: folks
Classification: Platform
Component: general
git master
Other All
: Normal enhancement
: Unset
Assigned To: folks-maint
folks-maint
Depends on: 673112
Blocks:
 
 
Reported: 2011-06-23 15:14 UTC by Philip Withnall
Modified: 2013-04-05 22:23 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Philip Withnall 2011-06-23 15:14:24 UTC
Since an operation performed frequently on the Gee.Sets of things exposed by folks is to iterate over the entire set, we might want to write an implementation of the Set interface which optimises iteration (and reduces memory overhead) over speed of checking for duplicates. This should be OK in many cases, since the sets we're dealing with are smaller.

e.g. Implement some kind of ArraySet.
Comment 1 Travis Reitter 2013-02-06 23:38:05 UTC
Didn't we go down this path and eventually revert it?

In light of the dependency in libgee being rejected, does this bug still make sense?
Comment 2 Philip Withnall 2013-02-07 00:02:53 UTC
(In reply to comment #1)
> Didn't we go down this path and eventually revert it?

Not as far as I remember. I have a poor memory though.

> In light of the dependency in libgee being rejected, does this bug still make
> sense?

I think so. I’d still like to investigate this further, when I manage to find some time.
Comment 3 Simon McVittie 2013-03-12 20:32:10 UTC
I tried this and it doesn't really help - I think it's probably because the overhead of creating a GObject for the iterator swamps the overhead of iteration.

However, I also tried a local copy of ArraySet that offers direct access via integer indexes, adding a peek(guint index) method). Using that for the Individual's Personas, and using that mode of iteration in Individual._update_single_valued_property(), cuts 3% off my benchmark[1] (that method is a bottleneck in my benchmark). I haven't yet tried maintaining this set in sorted order so it's binary-searchable; it might help or hinder.

I don't think Folks' use of Gee object'y data structures is helping our performance: if we wrapped GPtrArray, GList or GHashTable in a Gee-compatible object for external consumption, but used the "plain GLib" interface internally (at least on hot paths, as determined by benchmarking), we could probably speed it up substantially.

I also wonder whether we could benefit from having a FolksSmallMultiMap which is really a (sorted, binary-searchable?) GArray<struct<K, V>> for pointers K, V - this seems like an an appropriate data structure for things like e-d-s vCards.

[1] which is basically "connect to a Google account with 2049 locally-cached contacts, let the aggregator quiesce"
Comment 4 Philip Withnall 2013-03-13 09:53:05 UTC
(In reply to comment #3)
> I don't think Folks' use of Gee object'y data structures is helping our
> performance: if we wrapped GPtrArray, GList or GHashTable in a Gee-compatible
> object for external consumption, but used the "plain GLib" interface internally
> (at least on hot paths, as determined by benchmarking), we could probably speed
> it up substantially.

I agree, but I’m wary of increasing code complexity in folks unnecessarily.

If you take GPtrArray (or whatever) and wrap it in the same Gee interfaces as HashSet has (so we don’t break API), but also add a @get(index) method and a size property, it will have syntax support for foreach(){} iteration (see: https://live.gnome.org/Vala/Tutorial#Methods_With_Syntax_Support). We could then switch to it without significantly decreasing the readability of code in folks.

To get the best performance, you’ll probably want to use inline methods or macros for @get and size, in which case you’ll probably end up writing the GPtrArray (etc.) wrapper in C, then binding it for Vala.

> I also wonder whether we could benefit from having a FolksSmallMultiMap which
> is really a (sorted, binary-searchable?) GArray<struct<K, V>> for pointers K, V
> - this seems like an an appropriate data structure for things like e-d-s
> vCards.

If you have a few hours, read bug #673112, in which I assert that folks needs data structures optimised for being mostly empty, but iterated over a lot. If possible, optimising for reduced memory consumption would also be nice.
Comment 5 Simon McVittie 2013-03-14 12:18:23 UTC
(In reply to comment #4)
> If you take GPtrArray (or whatever) and wrap it in the same Gee interfaces as
> HashSet has (so we don’t break API) [...]

Good idea. I think it might be worth trying:

* "small set" (set optimized for few items):
  * GPtrArray<item>
  * GList<item>

* "small multi-map" (multi-map optimized for few keys and few values per key):
  * GArray<struct { key, value }>

* IndividualAggregator's { string => [Individual, ...] } map
  (many keys but few values per key):
  * GHashTable<string, GList<item>>
  * GHashTable<string, GPtrArray<item>>

> To get the best performance, you’ll probably want to use inline methods or
> macros for @get and size, in which case you’ll probably end up writing the
> GPtrArray (etc.) wrapper in C, then binding it for Vala.

Sure. "Use C for fast-paths" sounds good to me.

> If you have a few hours, read bug #673112, in which I assert that folks needs
> data structures optimised for being mostly empty, but iterated over a lot.

Yes, I'm inclined to agree. Unfortunately, data structures where iterating over an empty collection must construct a GObject, and getting a nonexistent item from a multimap must construct an empty-set GObject, are not ideal for this situation.
Comment 6 Philip Withnall 2013-03-15 00:23:38 UTC
(In reply to comment #5)
> * "small set" (set optimized for few items):
>   * GPtrArray<item>
>   * GList<item>

Where “few items” specifically includes 0 items. Most address books are sparsely populated by properties, so most of the properties of most personas and individuals are empty.

> > If you have a few hours, read bug #673112, in which I assert that folks needs
> > data structures optimised for being mostly empty, but iterated over a lot.
> 
> Yes, I'm inclined to agree. Unfortunately, data structures where iterating over
> an empty collection must construct a GObject,

Hence we implement @get and size and can use foreach(){} syntax rather than an iterator, to get the efficiency without uglifying the code.

> getting a nonexistent item
> from a multimap must construct an empty-set GObject, are not ideal for this
> situation.

Perhaps it would be worthwhile to add internal helper methods to the data structures for this use case.
Comment 7 Simon McVittie 2013-04-05 17:06:31 UTC
I'm going to consider this done, because Folks.SmallSet has been merged.
Comment 8 Philip Withnall 2013-04-05 22:23:42 UTC
Woo! \o/