GNOME Bugzilla – Bug 736444
Allow for SortedSet to be notified of element mutation
Last modified: 2019-03-20 10:39:54 UTC
If an element in the SortedSet is mutated in such a way that it affects the comparator's operation, then various problems will result, including the inability to remove that object from the SortedSet. For example, this will assert: class Xyzzy : Object, Gee.Comparable<Xyzzy> { public int value { get; set; } public Xyzzy(int value) { this.value = value; } public int compare_to(Xyzzy other) { if (this == other) return 0; return value - other.value; } } void main() { Xyzzy x1 = new Xyzzy(1); Xyzzy x2 = new Xyzzy(2); Xyzzy x3 = new Xyzzy(3); Gee.TreeSet<Xyzzy> redblack = new Gee.TreeSet<Xyzzy>(); redblack.add(x1); redblack.add(x2); redblack.add(x3); x1.value = 4; assert(redblack.contains(x1)); } Note that this fails with or without the identity test in compare_to() because the new value leads the internal iterator down the wrong tree branch. Worse, if this code is used after x1.value is set to 4: Gee.Iterator<Xyzzy> iter = redblack.iterator(); while (iter.next()) { if (iter.get() == x1) iter.remove(); } Gee will assert with: ERROR:treeset.c:3258:gee_tree_set_iterator_real_remove: assertion failed: (success) So, the user can't even manually remove and re-add the element to re-sort it into position. I realize this is not a bug per se, but in a more complicated program this issue may be indeterminate in reproducibility and even more difficult to track down. In the least, this situation should be documented so the caller understands the limitations of TreeSet (and probably most SortedSet implementations). What I'm proposing is a new method for SortedSet: public abstract void mutated(G element, Gee.EqualDataFunc? equal_func = null); The implementation would use the equal_func (which is obtained from Functions if null) instead of CompareDataFunc to locate the element, remove it, then re-add it using the CompareDataFunc. If mutated() seems too specific to the problem, it would at least be helpful if there was some way to remove the element using a supplied EqualDataFunc rather than rely solely on the CompareDataFunc for traversal.
(In reply to comment #0) > If an element in the SortedSet is mutated in such a way that it affects the > comparator's operation, then various problems will result, including the > inability to remove that object from the SortedSet. For example, this will > assert: > > (...) Yes, the invariants of tree are broken. The same would apply for hashes when the hash changes. > > Gee will assert with: > > ERROR:treeset.c:3258:gee_tree_set_iterator_real_remove: assertion failed: > (success) > Yes, naming a variable 'success' might provide a rather confusing assert ;) > So, the user can't even manually remove and re-add the element to re-sort it > into position. > > I realize this is not a bug per se, but in a more complicated program this > issue may be indeterminate in reproducibility and even more difficult to track > down. > Well. The easiest way of answering it is - don't mutate the data. If you really do need to modify data, you do you need to remove/modify/add on your own. Just handling such cases will make the performance 'terrible' (as in lowering all operations to O(n)). Alternatively provide one layer of indirection depending on what you need. > In the least, this situation should be documented so the caller understands the > limitations of TreeSet (and probably most SortedSet implementations). > That might be a good idea. I'm keeping the bug open for it. > What I'm proposing is a new method for SortedSet: > > public abstract void mutated(G element, Gee.EqualDataFunc? equal_func = null); > > The implementation would use the equal_func (which is obtained from Functions > if null) instead of CompareDataFunc to locate the element, remove it, then > re-add it using the CompareDataFunc. > > If mutated() seems too specific to the problem, it would at least be helpful if > there was some way to remove the element using a supplied EqualDataFunc rather > than rely solely on the CompareDataFunc for traversal. Once the invariants are off it's quite hard to do something about it. The mutated function would have O(n log n) performance (I'd need to calculcate) - and any other function for that matter (including ones that are currently O(1)). Because of the problems with distinguishing elements it would be best case O(n). Consider: x1.value = 3; Then you have 2 elements equal to each other - what should be done in such case - we can't have both but we need to pick one. ConcurrentSortedSet have a further problems with it as by design other threads might access the data. In short breaking the assumption that hashes and relationships are immutable are a can of worm I don't intend to open (nor do I know any library that do).
(In reply to comment #1) > If you really do need to modify data, you do you need to remove/modify/add on > your own. I would be happy to, but the problem is that the remove() has to happen prior to modification. Pre-notification of changes is rare in GObject-land. Most signals are fired after the change, including "notify". > Just handling such cases will make the performance 'terrible' (as in > lowering all operations to O(n)). The only operation that would be O(n) is mutated(). This change doesn't affect the other operations. I'm not suggesting that mutated() would be (or should be) performant. I see it as a way to keep the data structure cogent and the application sound. As things stand, it's incredibly easy for a programmer to corrupt the data structure. I say, document the performance issue w/ mutated() in the API; let the programmer weigh the trade-offs. > Alternatively provide one layer of indirection depending on what you need. That's not always feasible. Consider the case where the user is storing mutable objects originating from a separate library, or the design of the class is such that immutability is simply unworkable. > > In the least, this situation should be documented > That might be a good idea. I'm keeping the bug open for it. Thank you. > Because of the problems with distinguishing elements it would be best case > O(n). Consider: > > x1.value = 3; > > Then you have 2 elements equal to each other - what should be done in such case What would happen in the alternative you're suggesting, i.e. remove()/modify/add()? One is dropped. I would do the same for mutated(). I don't see a problem. mutated() is a specific solution to a general problem: if an element of TreeSet mutates, it's possible for it to become unreachable by the caller and for the tree to become incoherent. If TreeSet offered an interface to reach all elements without relying on the comparator, I would be fine with that too. There was a discussion some time back about adding SortedList to the library, but the response was "use a SortedSet" and WONTFIX. In fact, SortedList would solve this mutation problem: all elements reachable all the time, and elements inserted in sorted order (rather than having to re-sort the entire list after each insert).
I think I misspoke. A mutated element is unreachable via contains() or remove(). However, TreeSet iterator will walk all elements of the (incoherent) list. However, Iterator.remove() *cannot* be called, as that will cause an assertion. So if mutated() is unavailable, a solution could be to walk the entire TreeSet adding each element to another TreeSet, then replace the old one with the new one. I haven't tested this thoroughly, but so far it appears it will work. I'm not satisfied with this, however. I feel like this solution is relying on side-effects (i.e. the Iterator isn't checking the coherency of the tree). This is even less performant than my suggested in-place mutated() call, but just wanted to document it.
(In reply to comment #2) > (In reply to comment #1) > > If you really do need to modify data, you do you need to remove/modify/add on > > your own. > > I would be happy to, but the problem is that the remove() has to happen prior > to modification. Pre-notification of changes is rare in GObject-land. Most > signals are fired after the change, including "notify". > I see now what's the use case is better now. What about adding proxy object 'caching' the comparation and updated by notification as short term solution? Or caching the property in qdata? [Edit. Or use it with combination of Map] I see your point but my feeling is that 'uniqueness'/'comparation' should not be depending on such value then - the uniqueness is given by the pointer (or other unique data). What you might intent to use is some sort of priority queue which should work with such case (albeit not in efficient way). I'd be much more willing to extend this interface to handle that though I might need to think exactly how - probably by scrapping the stamps, which I intended for some time now, and adding function to iterator so you can have: queue.add(data); var iter = queue.find(data); data.notify["property"].connect(() => { iter.update_priority(); }); If this is not the case - could you describe your limitations/requirements on collections.
This is why my mutated() proposal includes an EqualDataFunc in the call, to allow the caller to declare how to locate the mutated element. I would suggest allowing for your find() call to accept one as well. But I know what you're saying, reference identity is almost always going to be what the user wants. Hence, default EqualDataFunc to null and let Functions work its magic. The reason I proposed adding this to SortedSet is because I don't think this issue is specific to TreeSet. I see where you're going with PriorityQueue, but I feel like that's overloading the utility of that class. Why do I need to put my objects in a queue, which is a FIFO, when I want to add and remove them randomly? TreeSet is more or less exactly what I want, save for this issue. I had a few other ideas, but they all were either too complicated or too hole-ridden. That's why I suggested mutated(). It's simple, straightforward, and it allows the user to define what "mutated" means, since they're the one supplying the comparator. (I considered proposing a Mutable interface, but that introduces a whole host of issues.) Re: your suggested pattern of attaching a signal to call iter.update_priority(), I see where you're going with this, but it seems kind of roundabout (although I do see how it would improve performance). My concern there is if the element's lifetime outlasts the lifetime of the iter, you'll have a dangling signal handler. I guess that's the caller's problem, though. They could store the iter in a Map (as I think you're suggesting) and use that. I don't know, though, it feels like that approach "leaks" implementation details, in a way. If the SortedSet implementation discards its internal wrappers when rebalancing its tree, now it's on the hook to keep multiple levels of indirection (the iter it returned to the user, which must be preserved as long as the element is present, referencing the disposable internal wrapper it uses for each element, and the element itself). Starting to sound complicated.
The problem is that the complexity needs to go somewhere. I'm still not convinced why the following is not possible: TreeMultiMap<int, Klass> map = .... // On add map.set(klass.value, klass); { int value = klass.value klass.notify["value"].connect(() => { map.remove(value, klass); map.set(klass.value, klass); }); } // On remove Deregister signal map.remove(klass.value, klass); It could be wrapped in the class to avoid code duplication. Maybe it should be added to libgee as separate class. ------------------------------------------------------------- I guess at the end I'm not sure if what you are trying to do is correct in a sense of using proper tools for the problem. The collections assumes that there is persistent, immutable sorting order or equality - a "natural" one. If there isn't then an artificial one should be used (like pointer) - similarly to 'id' fields in database. It makes the code of collection library much simpler and as far as I know those are assumptions in all languages. What I guess you are trying to do is to have a sorted collection by some key - not using one to determine the equality or 'natural' ordering. This is separate problem which I don't consider as belonging to 'primitive' operations like SortedMap - though probably important one and interesting nonetheless. Probably something like 'CollectionWhichIsKeptSorted' implementing MultiSet might be a better way to go. ------------------------------------------------------------- Also - looking again into the source code the problem is that currently it's a very deep assumption that objects are kept being sorted and removing it would required to rewrite whole code of remove. Possibly moving from LLRB to RB would be helpful in that matter but it's much more then just 'purity argument'. In short we need to descent from the root to the element to remove it so the easiest way is to do it by just 'finding' it. If the value is changed then the code cannot find the element.
> I guess at the end I'm not sure if what you are trying to do is correct in a > sense of using proper tools for the problem. Yeah, I'm less and less convinced that changing SortedSet/TreeSet is the way to go here. They were designed for a specific task. Still, I remain convinced this is a problem worth at least considering. Some inter-related thoughts: * If Gee offers a sorted set that allows for its elements' sorting order to mutate, then it should be a separate class with its own contract and terms of use. * That said, this assumption of invariants in SortedSet/TreeSet should be documented. It's obvious in hindsight, but can be overlooked if not careful. We've run into this problem before in our other projects, but it slipped my mind. * The mutated() method I proposed was not intended to be performant. However, the moment it's offered by Gee there will be users who design their systems around it, and they will complain about performance. * My core issue/problem/concern is not the mutation of the element, but that if it does mutate in a way that affects comparison, the red-black tree must be considered corrupt. Since mutation is trivial, this needs to be documented, but it also suggests (to me at least) that this is a problem looking for a solution. * I can't say this is universal, but in most (maybe all) of my designs the Hashable interface is implemented in an immutable manner. i.e. by identity or a system-wide unique identifier like a rowid or UUID. Two results of these thoughts: First, I agree with you, something like CollectionKeptSorted (MutatableTreeSet?) seems like a better fit. It could return an object/token of some kind that can be used by the caller to invalidate the sort order of a particular element. The storage of that token could be up to the caller, or a helper class is offered. But I keep going back to using an EqualDataFunc (i.e. Hashable) to locate the item instead of using token management: public MutableTreeSet(CompareDataFunc c, HashDataFunc h, EqualDataFunc e); where the HashDataFunc and EqualDataFunc are specifically *not* used for sorting but only for storing and locating an element via in an internal map of elements to tokens: public void mutated(G element) { Node node = map[element]; remove_mutated_node(node); /* i.e., don't check for coherency, assume leaves and parents are coherent */ add_node(node); } There's a one-time performance hit for each insert and remove, as the nodes must be stored in a map as well as a tree, but that's the price for this flexibility. Mutated elements are the cost of a tree add and remove. Second, while mutated() might not be the right fit for SortedSet, I'm now pondering if an invalidate_sort() would be. This method would re-sort the entire tree. This is definitely not a performance win but would (at least) offer the caller a way to maintain coherency in a standard set. Without this, the caller must maintain two sets -- a HashSet and a TreeSet -- and use the HashSet to re-build the TreeSet, since iterating TreeSet is iffy after mutation. I would have been fine using this in my particular use-case. I know invalidate_sort() sounds like beating a dead horse, but if TreeSet can only be reliably and safely used by immutable elements -- if mutable elements can cause corruption and assertions -- its utility narrows quickly. Maybe this all is getting too complicated, but these are my thoughts after a night's sleep.
(In reply to comment #7) > > I guess at the end I'm not sure if what you are trying to do is correct in a > > sense of using proper tools for the problem. > > Yeah, I'm less and less convinced that changing SortedSet/TreeSet is the way to > go here. They were designed for a specific task. Still, I remain convinced > this is a problem worth at least considering. Some inter-related thoughts: > > * If Gee offers a sorted set that allows for its elements' sorting order to > mutate, then it should be a separate class with its own contract and terms of > use. > > * That said, this assumption of invariants in SortedSet/TreeSet should be > documented. It's obvious in hindsight, but can be overlooked if not careful. > We've run into this problem before in our other projects, but it slipped my > mind. > Yes, sure - consider it on my pre 0.15.92 TODO list. > * The mutated() method I proposed was not intended to be performant. However, > the moment it's offered by Gee there will be users who design their systems > around it, and they will complain about performance. > > * My core issue/problem/concern is not the mutation of the element, but that if > it does mutate in a way that affects comparison, the red-black tree must be > considered corrupt. Since mutation is trivial, this needs to be documented, > but it also suggests (to me at least) that this is a problem looking for a > solution. > > * I can't say this is universal, but in most (maybe all) of my designs the > Hashable interface is implemented in an immutable manner. i.e. by identity or a > system-wide unique identifier like a rowid or UUID. > Other possibility is just use the pointer if you don't copy it (which is default). > Two results of these thoughts: > > First, I agree with you, something like CollectionKeptSorted > (MutatableTreeSet?) seems like a better fit. It could return an object/token > of some kind that can be used by the caller to invalidate the sort order of a > particular element. The storage of that token could be up to the caller, or a > helper class is offered. But I keep going back to using an EqualDataFunc (i.e. > Hashable) to locate the item instead of using token management: > > public MutableTreeSet(CompareDataFunc c, HashDataFunc h, EqualDataFunc e); > > where the HashDataFunc and EqualDataFunc are specifically *not* used for > sorting but only for storing and locating an element via in an internal map of > elements to tokens: > > public void mutated(G element) { > Node node = map[element]; > remove_mutated_node(node); /* i.e., don't check for coherency, assume leaves > and parents are coherent */ > add_node(node); > } > > There's a one-time performance hit for each insert and remove, as the nodes > must be stored in a map as well as a tree, but that's the price for this > flexibility. Mutated elements are the cost of a tree add and remove. > I thought, though I had still not do so, to add function to SortedSet public delegate int FindFunc<G>(G item); virtual Iterator<G>? find(FindFunc<G> f); The functions finds the value not by the item itself but allows to 'home' on it. Maybe extending the idea would be: Quark q = Quark.from_string("%p-saved-proprty".printf(set)); V value = e.value; e.notify["value"].connect(() => { // Second argument 'finds' the old element set.mutated(e, (elem) => { return elem.value - value; }); value = e.value; }); Or possibly even better: public delegate K GetSortKeyFunc<K, V>(V value); public class SetSortedByKey : MultiMap<K, V> { public SetSortedByKey(GetSortKeyFunc<K, V> get_sort_key_func, CompareDataFunc<K> key_compare_func, HashDataFunc<V> hash_value_func, EqualDataFunc<V> value_equal_func) {...} public void key_changed(V value) { K old_key = old_keys[value]; K new_key = get_sort_key_func(value); data.remove(old_key, value); data.set(new_key, value); old_keys[value] = new_key; } private HashMap<V, K> old_keys; private TreeMultiMap<K, V> data; } If you get a signal you can update it in place in O(log n) by: set.key_changed(element); [Everything including the names is just a thought so far] > Second, while mutated() might not be the right fit for SortedSet, I'm now > pondering if an invalidate_sort() would be. This method would re-sort the > entire tree. This is definitely not a performance win but would (at least) > offer the caller a way to maintain coherency in a standard set. Without this, > the caller must maintain two sets -- a HashSet and a TreeSet -- and use the > HashSet to re-build the TreeSet, since iterating TreeSet is iffy after > mutation. I would have been fine using this in my particular use-case. > Up to 0.16 the iteration should work 'fine' as long as you don't try to modify the tree or depend on it being in order. That said once you deal with mutable data all bets are off - libgee reserves full right of invoking nasal demons for such cases (unless, of course you use the discussed class). > I know invalidate_sort() sounds like beating a dead horse, but if TreeSet can > only be reliably and safely used by immutable elements -- if mutable elements > can cause corruption and assertions -- its utility narrows quickly. > To be honest - I could not find any library which do allows such things. As far as I could tell fix with remove would work in Java though I'd assume that's 'by accident' not 'by design' and C# interface does not allow even that. I'm not sure about C++ and Rust but it does not provide official method for it also (neither in STL nor in boost). For that reason I'd doubt that utility 'narrows quickly'. > Maybe this all is getting too complicated, but these are my thoughts after a > night's sleep.
Documentation updated as of b12c25 except for CompareDataFunc which needs the changes in glib (if you wish please fill a separate bug against glib).
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/libgee/issues/17.