GNOME Bugzilla – Bug 695082
g_hash_table_remove_all is not save against a call to g_hash_table_remove
Last modified: 2014-10-17 12:30:59 UTC
Created attachment 237905 [details] Testcase which never returns Long story short. GtkIconInfo is stored in a hash table, and this hash table is cleaned out with g_hash_table_remove_all. This is an issue, because when a GtkIconInfo for an emblemed icon is removed from a hash table, it also frees and removes other GtkIconInfo objects from the same hash table. This means that we get a g_hash_table_remove call nested inside g_hash_table_remove_all, and that causes an infinite loop under some circumstances. I am attaching a small test case which triggers the issue (program never exists) and a patch that fixes it. The patch works by setting all hash values to TOMBSTONE (and existing tombstones to UNUSED) and then destroying all objects where TOMBSTONE is set. By doing this, any call to g_hash_table_remove is prevented from doing anything. The backtrace from the testcase is: (gdb) bt
+ Trace 231585
Find closes shift never returns because size is negative. So right shifting it never clears the bits.
Created attachment 237907 [details] [review] Patch that fixes it for me All hash entries are first marked for removal, and then only removed in a second iteration. This means that any lookup will fail, and g_hash_table_remove, etc. think the item does not exist anymore.
Mmmm...I'm not sure we should guarantee anything about GHashTable reentrancy. Even if we did now, applications would have to have the new version of GLib to depend on it. Better to fix the application (or gtk+ in this case).
The "this will only help users of new GLib" argument is pretty weak. By that logic we should never fix any bugs... I'm not sure I like the approach in the patch here, though -- the way it uses tombstones and unused as magic values is pretty worrying -- what happens if other (non-remove) operations are done on the hash table during the remove_all? Tombstones could be overwritten. The table may even end up being resized... Then we're leaking memory. Either we need a really robust way to deal with this (which would basically look like storing a list of things to free in a separate linked list) or we need to document that it's just not supported... I'm not sure the overhead of the "really robust" way (or at least my non-creative proposal for how to implement it) is worth being able to re-enter during remove_all().
The question is which (if any) operations should be allowed in destructors. The hack in the patch only fixes the issue for _remove and by letting _lookup fail. Any _insert operation is not (and never was) supported by the patch. If one really wants to support more operations during _remove_all (eg. insert) then this could be done by replacing the hashes/keys/values arrays with new ones. At that point the hash table looks empty, and the function can simply call the destroy notifier for all elements. Any _lookup and _remove operation would fail (element does not exist), any _insert would go into the new table.
I can see the remove_all vs unref->remove complication when using a hash table as a cache. We've dealt with reentrancy issues before, see bug 650459
Created attachment 237952 [details] [review] Patch which first resets the complete GHashTable state Right, so this patch even lets _insert work fine (note that the last patch has a bug, in the first loop marking the nodes). This works fine, but creates the weird corner case where a destroy notifier adds a new node. That way it could happen that a non-empty GHashTable is destroyed, i.e.: 1. g_hash_table_unref 2. g_hash_table_remove_all_nodes 3. Destroy notifier adds a new node 4. g_hash_table_unref simply frees the node list Obviously that is a rather extreme case ...
Created attachment 238295 [details] [review] Patch that also handles foreach_remove This patch should be a lot saner. It is cleaned up a bit, but the main change is that it also delays destroy notifiers in g_hash_table_foreach_remove, so that function becomes save in the same way.
Created attachment 238296 [details] More complete testcase
The changes to remove_all() are actually fairly nice, except for the extra memory allocation. I wonder if you could try to avoid this for the last-unref() case, which is probably much more common than people calling destroy() and wanting to keep the table around after. I'm less thrilled with the changes to foreach_remove()...
I agree that the memory allocation is unfortunate, but I don't think it is trivial to remove it. It could work by setting the pointers to NULL and adding a guard to all functions that may be called recursively. Not sure about the foreach_remove. It just seemed sensible to also allow a recursive _remove from _foreach_remove if it is valid from _remove and _remove_all.
Created attachment 250714 [details] [review] new patch OK, here is a new patch which only changes the behaviour for remove_all. The patch does the optimization to not require a new allocation before destroying the hash table. Note that GTK+ does not need to be changed as it is using g_hash_table_destroy, which first runs g_hash_table_remove_all before unrefing the table. If it simply unref'ed the table (which implicitly destroys everything), then GTK+ would be crashing/asserting after the change. The patch also contains code to test the codepath. I have not done any performance tests. Ryan, I hope that you are happy with this :-)
Thanks for waiting :)