GNOME Bugzilla – Bug 526456
Open addressing in GHashTable
Last modified: 2016-08-30 22:49:25 UTC
I think GHashTable would be more memory efficient if it were to use open addressing. Memory fragmentation would be non-existent, and locality would improve too (consider swap pages). I have a patch against glib trunk that implements this using quadratic probing. It seems to perform better than the old implementation in every respect, but it could use some more testing.
Created attachment 108702 [details] [review] GHashTable using quadratic probing.
I wrote some junk about this at http://hpjansson.org/blag/2008/04/06/its-not-a-picnic-table/ in case anyone's interested.
Created attachment 108751 [details] [review] GHashTable using quadratic probing, take 2. I realized I could drop a variable from the GHashTable struct.
Hej, this works great. I was studying the overhead in g_object_{s,g}et and before oprofile gave me: opreport -l ~ensonic/debug/lib/libglib-2.0.so | head -n5 11000 24.2125 g_hash_table_lookup 5008 11.0233 g_atomic_pointer_compare_and_exchange 4907 10.8010 g_datalist_id_set_data_full 3509 7.7238 g_atomic_int_compare_and_exchange 3169 6.9754 g_slice_alloc 2934 6.4581 g_slice_free1 with the patch applied it goes down: opreport -l ~ensonic/debug/lib/libglib-2.0.so | head -n5 14137 13.7216 g_hash_table_lookup 13296 12.9054 g_slice_free1 12233 11.8736 g_slice_alloc 11068 10.7428 g_datalist_id_set_data_full 8189 7.9484 g_atomic_pointer_compare_and_exchange 6201 6.0188 g_slice_free_chain_with_offset on the actual test (2097151 loops) its just ~0.2 sec. but still nice to have.
Is anyone available to review this, please?
(In reply to comment #5) > Is anyone available to review this, please? Making the call to exchange our current hash table implementation for your approach is non trivial. Here are a few comments after reading your explanation: http://hpjansson.org/blag/2008/04/06/its-not-a-picnic-table/ The biggest problem i see is the wide variety of current hash table applications. I.e. hash tables can be used at various different sizes, and with hash functions of fairly poor quality. Testing this wide variety to eliminate possible worst cases upon an implementation switch is hard to impossible. E.g. it's perfectly ok currently to use a direct-hash, like return (uint) pointer; which for most heap-allocated data will have an alignment of 16 or bigger. Thus, i'd say the prime division we currently use is vital to keep around. Without diving into the exact reasons for 2^n table sizes in your current implementation, a possible solution could be to come up with a new table of primes which are very close to 2^n sizes and still divide by those when using your implementation. That way we could preserve adequat distribution at the cost of a few free slots per table.
Thanks, Tim. That's not a bad idea. What I can do is let the initial hash be % the first prime after 2^n, then if the slot is taken and it's a miss, probe modulo the 2^n first slots of the table. The probing algorithm must work modulo 2^n slots, or we can't guarantee more than 50% load (the probe sequence would end up always skipping over some slots). Effectively, this means that the last P-(2^n) slots will be "wasted" - that is, they will only be populated if the initial modulo lands on that slot, never as a result of probing. I think that's acceptable given the following table of primes and wastage relative to the corresponding 2^n: 5 ( 1 slots wasted) 11 ( 3 slots wasted) 17 ( 1 slots wasted) 37 ( 5 slots wasted) 67 ( 3 slots wasted) 131 ( 3 slots wasted) 257 ( 1 slots wasted) 521 ( 9 slots wasted) 1031 ( 7 slots wasted) 2053 ( 5 slots wasted) 4099 ( 3 slots wasted) 8209 (17 slots wasted) 16411 (27 slots wasted) 32771 ( 3 slots wasted) 65537 ( 1 slots wasted) 131101 (29 slots wasted) 262147 ( 3 slots wasted) 524309 (21 slots wasted) 1048583 ( 7 slots wasted) 2097169 (17 slots wasted) 4194319 (15 slots wasted) 8388617 ( 9 slots wasted) 16777259 (43 slots wasted) 33554467 (35 slots wasted) 67108879 (15 slots wasted) 134217757 (29 slots wasted) 268435459 ( 3 slots wasted) 536870923 (11 slots wasted)
Created attachment 112353 [details] [review] GHashTable using quadratic probing, take 3. I ended up doing it the other way around - i.e. making the initial prime modulo smaller than the n^2 table size. We get full utilization of the table this way (probe can find any bucket), and it makes the code simpler too.
Tim, did you want to consider this for 2.18 ?
(In reply to comment #9) > Tim, did you want to consider this for 2.18 ? We decided during Guadec to get this into 2.19 as early as possible, so it gets a long testing period.
Created attachment 117015 [details] [review] GHashTable using quadratic probing, take 3 (rediffed). Rediffed the patch against trunk of today. One hunk was no longer applying.
Just noting that with the latest patch, building (only) on 64bit systems fails with the "slice-threadinit" test: ... ** ERROR **: failed to reallocate slice from magazine (after g_thread_init): size=77 aborting... /bin/sh: line 4: 24829 Trace/breakpoint trap srcdir=. LIBCHARSET_ALIAS_DIR=../glib/libcharset MALLOC_CHECK_=2 MALLOC_PERTURB_=$((${RANDOM:-256} % 256)) ${dir}$tst FAIL: slice-threadinit ...
(In reply to comment #12) > Just noting that with the latest patch, building (only) on 64bit systems fails > with the "slice-threadinit" test: That test is failing for me on a clean, unpatched checkout. So it's almost certainly unrelated to the patch.
This patch looks really good. Can it be improved even more? I'm looking at the 'dictionary' implementation in Python, and I see the ingenious probing strategy that is used there. Pseudocode: slot = hash % size # (by bit-masking) perturb = hash while (slot not empty) # collision slot = (5*slot) + 1 + perturb perturb = perturb >> 5 This probing pattern take 'abitrary big' steps on the first collisions and then perturb reduces to slot += (5*slot) + 1, when all bits in perturb are shifted out. The table is resized at a load factor of 2/3. The size is then doubled or quadrupled depending on size. This scheme works great in Python, and I think it can be easily used in glib with this patch. However, I must admit that I have not done any tests to compare these two schemes. What kind of benchmark test should be used to test performance of hashing schemes?
We've branched now, so now would be a good time to get this patch in. We can certainly look at further improvements.
Committed to trunk. 2008-09-19 Hans Petter Jansson <hpj@novell.com> Rewrite most of GHashTable to use open addressing with quadratic probing instead of chaining. This has the potential to reduce memory fragmentation significantly, while being slightly faster due to better locality and no need to call alloc/free functions for nodes. Benchmarks suggest it also uses less memory overall. * glib/ghash.c (prime_mod): Table of suitable primes for initial-probe distribution. (g_hash_table_set_shift): New function. (g_hash_table_find_closest_shift): New function. (g_hash_table_set_shift_from_size): New function. (g_hash_table_lookup_node_for_insertion): New function. (g_hash_table_lookup_node): Rewritten to return node index instead of pointer, use quadratic probe on flat table, and not return insertion data. The latter saves some computation for read-only lookups. (g_hash_table_remove_node): Rewrite to take a pointer directly to the node structure to remove, and clear that. Remove unlinking code. (g_hash_table_remove_all_nodes): Rewrite to not clear nodes individually, but en masse using memset () after potentially calling notify functions. (iter_remove_or_steal): Use new data structure and algorithm. Vastly simplified - now just a call to g_hash_table_remove_node (). (g_hash_table_resize): New resize code, re-indexing with new prime and cleaning up tombstones. (g_hash_table_maybe_resize): Table may hold 8 buckets minimum, no less than 1/4 load excluding tombstones, and no more than 15/16 load including tombstones. These numbers are the results of a lot of benchmarking with multiple complex applications, and should not be changed lightly. (g_hash_table_iter_next) (g_hash_table_lookup) (g_hash_table_lookup_extended) (g_hash_table_insert_internal) (g_hash_table_remove_internal) (g_hash_table_foreach_remove_or_steal) (g_hash_table_foreach) (g_hash_table_find) (g_hash_table_get_keys) (g_hash_table_get_values): Use new data structure and algorithm, fairly trivial changes.
Øystein: I don't think the approach you outline would improve the performance beyond perhaps trading off memory for CPU consumption. The quadratic probing also has the benefit of better locality since the initial probes take short steps. You are free to prove me wrong, though: I think the best benchmarks come from real-world applications that make heavy use of GHashTable. For my tests, I took the CPU time spent by Evolution starting up in offline mode, indexing a set of mail folders with ~2GB of mail in them, and then checked the process' [heap] memory consumption as reported by pmap immediately after the indexing finished. I also tested with a couple of smaller apps (one of them being gcalctool which uses GHashTables indirectly through GTK+ and Glade) to ensure the memory footprint of small apps did not increase. I was able to cut [heap] consumption by 0-5% in these cases, relative to the old GHashTable code.
Created attachment 128263 [details] [review] optimize function g_hash_table_set_shift(). I think with this small patch, function g_hash_table_set_shift() would be faster and clearer.
The above patch is a trivial work, but I'd like to know if it is acceptable. Thanks.
Ops, sorry, forget the patch. It's an unsafe optimization in general. Since "1 << shift" might be undefined if "shift" is bigger than 32(but this wouldn't happen with the current ghash.c code, since the prime_mod has only 32 entries). The old code should be safer, maybe no need to change.
Did anyone ever look at using robin hood hashing for GHashTable: http://www.sebastiansylvan.com/post/robin-hood-hashing-should-be-your-default-hash-table-implementation/
(In reply to Alexander Larsson from comment #21) > Did anyone ever look at using robin hood hashing for GHashTable: I looked at it briefly a couple of years ago, but wasn't fully convinced. The issue is the reliance on linear probing for insertion -- as part of the open addressing rewrite, I tried with linear probing and found that the primary clustering became intolerable with largish tables. I.e. thousands of consecutive filled buckets separated by thousands of consecutive empty ones. Quadratic probing solves that. It looks like quadratic probing can't be applied with robin hood insertions, though, so you'll get the clustering, and each insertion will require iterating over (and ordering) half a cluster on average. https://en.wikipedia.org/wiki/Primary_clustering Obviously the performance is going to vary with load factor, quality of the hash function, ratio of insertions to lookups, typical hash table size and life span (removals/tombstone buildup), number of hash tables employed, etc -- but unless we can solve the clustering issue, I doubt it'd be a win for most applications. I'd love to be proven wrong though :)
So I actually had an old Robin Hood-enabled GHashTable kicking around. I've cleaned it up a little and posted it as a patch in bug 770616. Benchmarking help welcome!