GNOME Bugzilla – Bug 645685
Add copy methods to LinkedHashSet
Last modified: 2011-04-23 21:07:56 UTC
(This bug was originally filed as number 645691, but then Bugzilla went and lost data, so I'm refiling.) In my work on bug #640092 I need to copy LinkedHashSets, and it would also be generally useful to be able to copy LinkedHashSets, duplicating the containing data structure and incrementing the reference count of each of the items. Branch coming which will add three new methods: • copy_list(): copies a LinkedHashSet into a new LinkedList • copy_set(): copies a LinkedHashSet into a new HashSet • copy(): copies a LinkedHashSet into a new LinkedHashSet
Created attachment 184244 [details] [review] Squashed diff of the 645685-linked-hash-copy branch http://git.collabora.co.uk/?p=user/pwith/folks;a=shortlog;h=refs/heads/645685-linked-hash-set-copy Adds the three methods and test cases.
Review of attachment 184244 [details] [review]: This looks good except for these bits: + * Returns a copy of the linked hash set. The order of the items in the + * returned copy is undefined. + * Returns a copy of the set as a {@link Gee.LinkedList}. The order of the + * items in the returned list is undefined. It seems fairly important that the ordering be preserved, since that's half of the point of the LinkedHashSet. Why not preserve this (and check for it in the test)?
(In reply to comment #2) > Review of attachment 184244 [details] [review]: > > This looks good except for these bits: > > + * Returns a copy of the linked hash set. The order of the items in the > + * returned copy is undefined. > > + * Returns a copy of the set as a {@link Gee.LinkedList}. The order of the > + * items in the returned list is undefined. > > It seems fairly important that the ordering be preserved, since that's half of > the point of the LinkedHashSet. Why not preserve this (and check for it in the > test)? While working on bug #640092, I developed the idea that LinkedHashSet shouldn't actually be ordered; instead, the linked list part of it should be treated entirely as a method of making iteration (in some arbitrary but consistent order) over the items in the set more efficient. Most of the uses of LinkedHashSet in folks (e.g. Individual.personas, IndividualAggregator.individuals_changed, PersonaStore.personas_changed) are in returning sets which shouldn't be considered to be ordered. In the few cases (ImDetails.im_addresses, PostalAddress.types) where we want the converse (a linked list which has a hash set tacked on to make direct lookups more efficient; i.e. an ordered set), I wonder if it might make more sense to have a separately named data structure (e.g. OrderedSet)? Let me know if I'm not barking up completely the wrong tree and I'll file a bug and get a patch sorted to do this (create a separate OrderedSet data structure).
Review of attachment 184244 [details] [review]: Another thing I just noticed - your NEWS entries refer to a mystical LinkedHashMap, not LinkedHashSet :)
(In reply to comment #3) > (In reply to comment #2) > > Review of attachment 184244 [details] [review] [details]: > > > > This looks good except for these bits: > > > > + * Returns a copy of the linked hash set. The order of the items in the > > + * returned copy is undefined. > > > > + * Returns a copy of the set as a {@link Gee.LinkedList}. The order of the > > + * items in the returned list is undefined. > > > > It seems fairly important that the ordering be preserved, since that's half of > > the point of the LinkedHashSet. Why not preserve this (and check for it in the > > test)? > > While working on bug #640092, I developed the idea that LinkedHashSet shouldn't > actually be ordered; instead, the linked list part of it should be treated > entirely as a method of making iteration (in some arbitrary but consistent > order) over the items in the set more efficient. Most of the uses of > LinkedHashSet in folks (e.g. Individual.personas, > IndividualAggregator.individuals_changed, PersonaStore.personas_changed) are in > returning sets which shouldn't be considered to be ordered. > > In the few cases (ImDetails.im_addresses, PostalAddress.types) where we want > the converse (a linked list which has a hash set tacked on to make direct > lookups more efficient; i.e. an ordered set), I wonder if it might make more > sense to have a separately named data structure (e.g. OrderedSet)? > > Let me know if I'm not barking up completely the wrong tree and I'll file a bug > and get a patch sorted to do this (create a separate OrderedSet data > structure). We discussed this a little on IRC, but it clearly got lost in the ether. Why would we want to make LinkedHashSet not care about its order and assuming we preserve this behavior, why would we need the OrderedSet type? I don't see why the subtle distinction between these types you describe above would warrant two different types (that we have to maintain ourselves).
(In reply to comment #5) > We discussed this a little on IRC, but it clearly got lost in the ether. > > Why would we want to make LinkedHashSet not care about its order and assuming > we preserve this behavior, why would we need the OrderedSet type? We would want to make LinkedHashSet not care about its order because it was originally conceived as a way of speeding up iteration over the elements of the set, compared to a normal HashSet. Consequently, it should behave just like a normal HashSet (just faster when iterating). Most of the sets of elements in folks are unordered by design (e.g. personas in individuals[1]), and so should be using an unordered set — such as a HashSet or a LinkedHashSet. There are a few places in folks where the sets of elements should be ordered, such as ImDetails.im_addresses and PostalAddress.types. Note that these should still be sets, since we don't want to list the same IM address/protocol tuple multiple times for one persona or individual (and the same for postal address types). For these, I was suggesting a new OrderedSet data type, but upon doing a bit of research, I found that libgee already provides a SortedSet interface. This is implemented by TreeSet, which we could use. Consequently, I suggest the following: • Remove the AbstractList interface implementation from LinkedHashSet and re-base it on AbstractSet (so it becomes public class Folks.LinkedHashSet<G> : AbstractSet<G>). The linked list then becomes an implementation detail which is just used to speed up iteration. • Fix bug #640092 using the newly-changed LinkedHashSet. • Either wrap libgee's TreeSet as Folks.OrderedSet (so that we can change the implementation later if we want), or use TreeSet directly in the few cases where we want an ordered set. > I don't see why the subtle distinction between these types you describe above > would warrant two different types (that we have to maintain ourselves). The distinction isn't subtle: one data type is ordered, the other is unordered. Thankfully, since libgee has TreeSet, we don't need to maintain OrderedSet ourselves (either we use TreeSet directly, or just have a thin wrapper around it which needs little maintenance). The only downside I see to using TreeSet is that it's a proper red–black tree, so it's quite a heavyweight data structure (5 pointers plus a colour per element). That said, a LinkedList + HashSet is pretty much the same in terms of memory usage (3 pointers per LinkedList element, 2 pointers plus a hash value per HashSet element). [1]: http://telepathy.freedesktop.org/wiki/Folks#Concepts
(In reply to comment #6) > (In reply to comment #5) > > We discussed this a little on IRC, but it clearly got lost in the ether. > > > > Why would we want to make LinkedHashSet not care about its order and assuming > > we preserve this behavior, why would we need the OrderedSet type? > > We would want to make LinkedHashSet not care about its order because it was > originally conceived as a way of speeding up iteration over the elements of the > set, compared to a normal HashSet. Consequently, it should behave just like a > normal HashSet (just faster when iterating). As discussed on IRC, the only speedup here should be saving a little time calculating hashes and skipping some empty hash table buckets. Considering nearly all of our structures end up being very small (eg, Indiviudal.personas.size is generally 1 to 3, and almost never as big as 10), I'm not sure it's worth hanging on to LinkedHashSet. Let's just rip this out and replace it with Gee.HashSet. > Most of the sets of elements in folks are unordered by design (e.g. personas in > individuals[1]), and so should be using an unordered set — such as a HashSet or > a LinkedHashSet. > > There are a few places in folks where the sets of elements should be ordered, > such as ImDetails.im_addresses and PostalAddress.types. Note that these should > still be sets, since we don't want to list the same IM address/protocol tuple > multiple times for one persona or individual (and the same for postal address > types). For these, I was suggesting a new OrderedSet data type, but upon doing > a bit of research, I found that libgee already provides a SortedSet interface. > This is implemented by TreeSet, which we could use. As discussed, I don't think either of these data need to be ordered. If we do run into anything that truly should be ordered, let's just use Gee.TreeSet directly. > Consequently, I suggest the following: > • Remove the AbstractList interface implementation from LinkedHashSet and > re-base it on AbstractSet (so it becomes public class Folks.LinkedHashSet<G> : > AbstractSet<G>). The linked list then becomes an implementation detail which is > just used to speed up iteration. > • Fix bug #640092 using the newly-changed LinkedHashSet. > • Either wrap libgee's TreeSet as Folks.OrderedSet (so that we can change the > implementation later if we want), or use TreeSet directly in the few cases > where we want an ordered set. See above. > Thankfully, since libgee has TreeSet, we don't need to maintain OrderedSet > ourselves (either we use TreeSet directly, or just have a thin wrapper around > it which needs little maintenance). > > The only downside I see to using TreeSet is that it's a proper red–black tree, > so it's quite a heavyweight data structure (5 pointers plus a colour per > element). That said, a LinkedList + HashSet is pretty much the same in terms of > memory usage (3 pointers per LinkedList element, 2 pointers plus a hash value > per HashSet element). That makes me lean toward TreeSet (in the cases that we need an ordered set). I don't think it's worth abstracting this type.
As discussed above and elsewhere, these changes are unnecessary. The need for them has been removed with the fix for bug #640092.