GNOME Bugzilla – Bug 692815
Using g_hash_table_insert() when using a hash table as a set can corrupt the set
Last modified: 2013-02-02 05:34:50 UTC
If you have: GHashTable *set = g_hash_table_new_full (g_str_hash, g_str_equal, g_free, NULL); gchar *a = g_strdup ("foo"); gchar *b = g_strdup ("foo"); And (erroneously) use g_hash_table_insert() rather than g_hash_table_add() or g_hash_table_replace() to add items to the set: g_hash_table_insert (set, a, a); g_hash_table_insert (set, b, b); Then, 'b' is freed (as clearly specified in the documentation for _insert()), but the set now contains 'b', not 'a'. So your set contains a free-d element, and 'a' has been leaked. This is because the optimisation is implemented by setting values = keys, and g_hash_table_insert_node() always sets: hash_table->values[node_index] = value; even if key == value, hash_table->values == hash_table->keys and it is about to free key.
Created attachment 234763 [details] [review] Test for g_hash_table_insert() corrupting sets
Making g_hash_table_insert() not replace the value in this case would make this case not crash, but also would violate the other half of its documented behaviour (to replace the old value with the new value, and free the old value). Plus, I'm loathe to add another special case. Perhaps this is just a matter for documentation – when writing the code where this bit me, I refreshed my memory of the difference between _insert() and _replace() but this nuance didn't occur to me.
(In reply to comment #0) > This is because the optimisation is implemented by > setting values = keys (To be clear, "the optimisation" here is that GHashTable does not allocate separate storage for values until you insert a value which differs from its key, which saves space if you are using a GHashTable as a set by storing only key-value pairs where key == value.)
I'm working on a patch...
Created attachment 234786 [details] [review] hashtable: properly handle insert() de-set-ifying GHashTable remains a set for as long as all of the keys are exactly equal (in pointer value) to all of the values. We check this by comparing keys to values when we do inserts. Unfortunately, when doing g_hash_table_insert() when a key is already in the table, the old key pointer value is kept, but the new value pointer is used. Now we have a situation where a key pointer is unequal to a value pointer, but we were not treating this case properly. Fix that.
Review of attachment 234786 [details] [review]: This patch looks correct. g_hash_table_insert() probably still doesn't do what the API user expected, but with this patch it does exactly what it is documented to do. I'll update the comment in my test case.
Created attachment 234819 [details] [review] Test for g_hash_table_insert() corrupting sets
Actually the issue arises from adding fragile API like g_hash_table_add() with questionable semantics of "key == value", instead of introducing a dedicated g_hash_set() API (that wraps, or even partly aliases) GHashTable API. So how about doing the right thing (in terms of API design) and deprecating fragile hacks like g_hash_table_add() in favor of a proper API?
Hmm... Read the patch and noticed the "hash_table->keys == hash_table->values" optimization. So technically this issue was a bug in the implementation of the set-aware GHashTable. Still think this is bad API, but see the optimization's benefit.
The API was a compromise...
Attachment 234786 [details] pushed as bb1df4d - hashtable: properly handle insert() de-set-ifying Attachment 234819 [details] pushed as a809650 - Test for g_hash_table_insert() corrupting sets