Bug 692815 – Using g_hash_table_insert() when using a hash table as a set can corrupt the set

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 692815 - Using g_hash_table_insert() when using a hash table as a set can corrupt the set


Summary:	Using g_hash_table_insert() when using a hash table as a set can corrupt the set


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-01-29 16:08 UTC by Will Thompson
Modified:	2013-02-02 05:34 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Test for g_hash_table_insert() corrupting sets (1.94 KB, patch) 2013-01-29 16:10 UTC, Will Thompson	none	Details \| Review
hashtable: properly handle insert() de-set-ifying (4.46 KB, patch) 2013-01-30 00:26 UTC, Allison Karlitskaya (desrt)	committed	Details \| Review
Test for g_hash_table_insert() corrupting sets (2.28 KB, patch) 2013-01-30 11:18 UTC, Will Thompson	committed	Details \| Review

Description Will Thompson 2013-01-29 16:08:39 UTC

If you have:

  GHashTable *set = g_hash_table_new_full (g_str_hash, g_str_equal, g_free, NULL);
  gchar *a = g_strdup ("foo");
  gchar *b = g_strdup ("foo");

And (erroneously) use g_hash_table_insert() rather than g_hash_table_add() or g_hash_table_replace() to add items to the set:

  g_hash_table_insert (set, a, a);
  g_hash_table_insert (set, b, b);

Then, 'b' is freed (as clearly specified in the documentation for _insert()), but the set now contains 'b', not 'a'. So your set contains a free-d element, and 'a' has been leaked. This is because the optimisation is implemented by setting values = keys, and g_hash_table_insert_node() always sets:

      hash_table->values[node_index] = value;

even if key == value, hash_table->values == hash_table->keys and it is about to free key.

Comment 1 Will Thompson 2013-01-29 16:10:06 UTC

Created attachment 234763 [details] [review]
Test for g_hash_table_insert() corrupting sets

Comment 2 Will Thompson 2013-01-29 16:24:20 UTC

Making g_hash_table_insert() not replace the value in this case would make this case not crash, but also would violate the other half of its documented behaviour (to replace the old value with the new value, and free the old value). Plus, I'm loathe to add another special case.

Perhaps this is just a matter for documentation – when writing the code where this bit me, I refreshed my memory of the difference between _insert() and _replace() but this nuance didn't occur to me.

Comment 3 Will Thompson 2013-01-29 16:45:31 UTC

(In reply to comment #0)
> This is because the optimisation is implemented by
> setting values = keys

(To be clear, "the optimisation" here is that GHashTable does not allocate separate storage for values until you insert a value which differs from its key, which saves space if you are using a GHashTable as a set by storing only key-value pairs where key == value.)

Comment 4 Allison Karlitskaya (desrt) 2013-01-29 18:52:18 UTC

I'm working on a patch...

Comment 5 Allison Karlitskaya (desrt) 2013-01-30 00:26:19 UTC

Created attachment 234786 [details] [review]
hashtable: properly handle insert() de-set-ifying

GHashTable remains a set for as long as all of the keys are exactly
equal (in pointer value) to all of the values.  We check this by
comparing keys to values when we do inserts.

Unfortunately, when doing g_hash_table_insert() when a key is already in
the table, the old key pointer value is kept, but the new value pointer
is used.  Now we have a situation where a key pointer is unequal to a
value pointer, but we were not treating this case properly.

Fix that.

Comment 6 Will Thompson 2013-01-30 11:18:40 UTC

Review of attachment 234786 [details] [review]:

This patch looks correct. g_hash_table_insert() probably still doesn't do what the API user expected, but with this patch it does exactly what it is documented to do. I'll update the comment in my test case.

Comment 7 Will Thompson 2013-01-30 11:18:58 UTC

Created attachment 234819 [details] [review]
Test for g_hash_table_insert() corrupting sets

Comment 8 Mathias Hasselmann (IRC: tbf) 2013-01-31 13:40:40 UTC

Actually the issue arises from adding fragile API like g_hash_table_add() with questionable semantics of "key == value", instead of introducing a dedicated g_hash_set() API (that wraps, or even partly aliases) GHashTable API. So how about doing the right thing (in terms of API design) and deprecating fragile hacks like g_hash_table_add() in favor of a proper API?

Comment 9 Mathias Hasselmann (IRC: tbf) 2013-01-31 13:52:29 UTC

Hmm... Read the patch and noticed the "hash_table->keys == hash_table->values" optimization. So technically this issue was a bug in the implementation of the set-aware GHashTable. Still think this is bad API, but see the optimization's benefit.

Comment 10 Allison Karlitskaya (desrt) 2013-02-01 04:53:29 UTC

The API was a compromise...

Comment 11 Matthias Clasen 2013-02-02 05:34:44 UTC

Attachment 234786 [details] pushed as bb1df4d - hashtable: properly handle insert() de-set-ifying
Attachment 234819 [details] pushed as a809650 - Test for g_hash_table_insert() corrupting sets