After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 695082 - g_hash_table_remove_all is not save against a call to g_hash_table_remove
g_hash_table_remove_all is not save against a call to g_hash_table_remove
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other Linux
: Normal major
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2013-03-03 21:15 UTC by Benjamin Berg
Modified: 2014-10-17 12:30 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Testcase which never returns (596 bytes, application/octet-stream)
2013-03-03 21:15 UTC, Benjamin Berg
  Details
Patch that fixes it for me (2.73 KB, patch)
2013-03-03 21:23 UTC, Benjamin Berg
none Details | Review
Patch which first resets the complete GHashTable state (3.27 KB, patch)
2013-03-04 11:36 UTC, Benjamin Berg
none Details | Review
Patch that also handles foreach_remove (6.06 KB, patch)
2013-03-07 14:06 UTC, Benjamin Berg
none Details | Review
More complete testcase (2.62 KB, application/octet-stream)
2013-03-07 14:08 UTC, Benjamin Berg
  Details
new patch (8.74 KB, patch)
2013-08-02 14:59 UTC, Benjamin Berg
committed Details | Review

Description Benjamin Berg 2013-03-03 21:15:26 UTC
Created attachment 237905 [details]
Testcase which never returns

Long story short. GtkIconInfo is stored in a hash table, and this hash table is cleaned out with g_hash_table_remove_all. This is an issue, because when a GtkIconInfo for an emblemed icon is removed from a hash table, it also frees and removes other GtkIconInfo objects from the same hash table.

This means that we get a g_hash_table_remove call nested inside g_hash_table_remove_all, and that causes an infinite loop under some circumstances.

I am attaching a small test case which triggers the issue (program never exists) and a patch that fixes it.

The patch works by setting all hash values to TOMBSTONE (and existing tombstones to UNUSED) and then destroying all objects where TOMBSTONE is set. By doing this, any call to g_hash_table_remove is prevented from doing anything.

The backtrace from the testcase is:

(gdb) bt
  • #0 g_hash_table_find_closest_shift
    at /build/buildd-glib2.0_2.35.8-1-amd64-lGLf9n/glib2.0-2.35.8/./glib/ghash.c line 313
  • #1 g_hash_table_set_shift_from_size
    at /build/buildd-glib2.0_2.35.8-1-amd64-lGLf9n/glib2.0-2.35.8/./glib/ghash.c line 324
  • #2 g_hash_table_resize
    at /build/buildd-glib2.0_2.35.8-1-amd64-lGLf9n/glib2.0-2.35.8/./glib/ghash.c line 532
  • #3 g_hash_table_maybe_resize
    at /build/buildd-glib2.0_2.35.8-1-amd64-lGLf9n/glib2.0-2.35.8/./glib/ghash.c line 594
  • #4 g_hash_table_remove_internal
    at /build/buildd-glib2.0_2.35.8-1-amd64-lGLf9n/glib2.0-2.35.8/./glib/ghash.c line 1277
  • #5 value_destroy_notify
  • #6 g_hash_table_remove_all_nodes
    at /build/buildd-glib2.0_2.35.8-1-amd64-lGLf9n/glib2.0-2.35.8/./glib/ghash.c line 500
  • #7 g_hash_table_remove_all
    at /build/buildd-glib2.0_2.35.8-1-amd64-lGLf9n/glib2.0-2.35.8/./glib/ghash.c line 1347
  • #8 main

Find closes shift never returns because size is negative. So right shifting it never clears the bits.
Comment 1 Benjamin Berg 2013-03-03 21:23:42 UTC
Created attachment 237907 [details] [review]
Patch that fixes it for me

All hash entries are first marked for removal, and then only removed in a second iteration. This means that any lookup will fail, and g_hash_table_remove, etc. think the item does not exist anymore.
Comment 2 Colin Walters 2013-03-03 23:14:33 UTC
Mmmm...I'm not sure we should guarantee anything about GHashTable reentrancy.  Even if we did now, applications would have to have the new version of GLib to depend on it.

Better to fix the application (or gtk+ in this case).
Comment 3 Allison Karlitskaya (desrt) 2013-03-04 04:36:09 UTC
The "this will only help users of new GLib" argument is pretty weak.  By that logic we should never fix any bugs...

I'm not sure I like the approach in the patch here, though -- the way it uses tombstones and unused as magic values is pretty worrying -- what happens if other (non-remove) operations are done on the hash table during the remove_all?  Tombstones could be overwritten.  The table may even end up being resized...  Then we're leaking memory.

Either we need a really robust way to deal with this (which would basically look like storing a list of things to free in a separate linked list) or we need to document that it's just not supported...  I'm not sure the overhead of the "really robust" way (or at least my non-creative proposal for how to implement it) is worth being able to re-enter during remove_all().
Comment 4 Benjamin Berg 2013-03-04 09:44:25 UTC
The question is which (if any) operations should be allowed in destructors. The hack in the patch only fixes the issue for _remove and by letting _lookup fail. Any _insert operation is not (and never was) supported by the patch.

If one really wants to support more operations during _remove_all (eg. insert) then this could be done by replacing the hashes/keys/values arrays with new ones. At that point the hash table looks empty, and the function can simply call the destroy notifier for all elements.
Any _lookup and _remove operation would fail (element does not exist), any _insert would go into the new table.
Comment 5 Matthias Clasen 2013-03-04 10:47:57 UTC
I can see the remove_all vs unref->remove complication when using a hash table as a cache.

We've dealt with reentrancy issues before, see bug 650459
Comment 6 Benjamin Berg 2013-03-04 11:36:03 UTC
Created attachment 237952 [details] [review]
Patch which first resets the complete GHashTable state

Right, so this patch even lets _insert work fine (note that the last patch has a bug, in the first loop marking the nodes).

This works fine, but creates the weird corner case where a destroy notifier adds a new node. That way it could happen that a non-empty GHashTable is destroyed, i.e.:

 1. g_hash_table_unref
 2. g_hash_table_remove_all_nodes
 3. Destroy notifier adds a new node
 4. g_hash_table_unref simply frees the node list

Obviously that is a rather extreme case ...
Comment 7 Benjamin Berg 2013-03-07 14:06:56 UTC
Created attachment 238295 [details] [review]
Patch that also handles foreach_remove

This patch should be a lot saner. It is cleaned up a bit, but the main change is that it also delays destroy notifiers in g_hash_table_foreach_remove, so that function becomes save in the same way.
Comment 8 Benjamin Berg 2013-03-07 14:08:35 UTC
Created attachment 238296 [details]
More complete testcase
Comment 9 Allison Karlitskaya (desrt) 2013-03-19 13:55:45 UTC
The changes to remove_all() are actually fairly nice, except for the extra memory allocation.  I wonder if you could try to avoid this for the last-unref() case, which is probably much more common than people calling destroy() and wanting to keep the table around after.

I'm less thrilled with the changes to foreach_remove()...
Comment 10 Benjamin Berg 2013-03-19 14:34:12 UTC
I agree that the memory allocation is unfortunate, but I don't think it is trivial to remove it. It could work by setting the pointers to NULL and adding a guard to all functions that may be called recursively.

Not sure about the foreach_remove. It just seemed sensible to also allow a recursive _remove from _foreach_remove if it is valid from _remove and _remove_all.
Comment 11 Benjamin Berg 2013-08-02 14:59:01 UTC
Created attachment 250714 [details] [review]
new patch

OK, here is a new patch which only changes the behaviour for remove_all. The patch does the optimization to not require a new allocation before destroying the hash table.

Note that GTK+ does not need to be changed as it is using g_hash_table_destroy, which first runs g_hash_table_remove_all before unrefing the table. If it simply unref'ed the table (which implicitly destroys everything), then GTK+ would be crashing/asserting after the change.

The patch also contains code to test the codepath. I have not done any performance tests.

Ryan, I hope that you are happy with this :-)
Comment 12 Allison Karlitskaya (desrt) 2014-10-17 12:30:55 UTC
Thanks for waiting :)