Bug 526456 – Open addressing in GHashTable

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 526456 - Open addressing in GHashTable


Summary:	Open addressing in GHashTable


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2008-04-06 07:20 UTC by Hans Petter Jansson
Modified:	2016-08-30 22:49 UTC

See Also:
GNOME target:	---
GNOME version:	Unversioned Enhancement

Attachments
GHashTable using quadratic probing. (22.83 KB, patch) 2008-04-06 07:23 UTC, Hans Petter Jansson	none	Details \| Review
GHashTable using quadratic probing, take 2. (22.77 KB, patch) 2008-04-07 02:02 UTC, Hans Petter Jansson	none	Details \| Review
GHashTable using quadratic probing, take 3. (24.36 KB, patch) 2008-06-08 05:24 UTC, Hans Petter Jansson	accepted-commit_after_freeze	Details \| Review
GHashTable using quadratic probing, take 3 (rediffed). (24.39 KB, patch) 2008-08-19 23:24 UTC, Hans Petter Jansson	committed	Details \| Review
optimize function g_hash_table_set_shift(). (563 bytes, patch) 2009-02-09 02:54 UTC, zou	none	Details \| Review

Description Hans Petter Jansson 2008-04-06 07:20:07 UTC

I think GHashTable would be more memory efficient if it were to use open addressing. Memory fragmentation would be non-existent, and locality would improve too (consider swap pages).

I have a patch against glib trunk that implements this using quadratic probing. It seems to perform better than the old implementation in every respect, but it could use some more testing.

Comment 1 Hans Petter Jansson 2008-04-06 07:23:19 UTC

Created attachment 108702 [details] [review]
GHashTable using quadratic probing.

Comment 2 Hans Petter Jansson 2008-04-06 08:01:28 UTC

I wrote some junk about this at http://hpjansson.org/blag/2008/04/06/its-not-a-picnic-table/ in case anyone's interested.

Comment 3 Hans Petter Jansson 2008-04-07 02:02:30 UTC

Created attachment 108751 [details] [review]
GHashTable using quadratic probing, take 2.

I realized I could drop a variable from the GHashTable struct.

Comment 4 Stefan Sauer (gstreamer, gtkdoc dev) 2008-04-24 11:57:51 UTC

Hej, this works great. I was studying the overhead in g_object_{s,g}et and before oprofile gave me:

opreport -l ~ensonic/debug/lib/libglib-2.0.so | head -n5
11000    24.2125  g_hash_table_lookup
5008     11.0233  g_atomic_pointer_compare_and_exchange
4907     10.8010  g_datalist_id_set_data_full
3509      7.7238  g_atomic_int_compare_and_exchange
3169      6.9754  g_slice_alloc
2934      6.4581  g_slice_free1

with the patch applied it goes down:
opreport -l ~ensonic/debug/lib/libglib-2.0.so | head -n5
14137    13.7216  g_hash_table_lookup
13296    12.9054  g_slice_free1
12233    11.8736  g_slice_alloc
11068    10.7428  g_datalist_id_set_data_full
8189      7.9484  g_atomic_pointer_compare_and_exchange
6201      6.0188  g_slice_free_chain_with_offset

on the actual test (2097151 loops) its just ~0.2 sec. but still nice to have.

Comment 5 Hans Petter Jansson 2008-05-08 00:56:23 UTC

Is anyone available to review this, please?

Comment 6 Tim Janik 2008-06-06 23:53:59 UTC

(In reply to comment #5)
> Is anyone available to review this, please?

Making the call to exchange our current hash table implementation for your approach is non trivial. Here are a few comments after reading your explanation:
  http://hpjansson.org/blag/2008/04/06/its-not-a-picnic-table/ 

The biggest problem i see is the wide variety of current hash table applications. I.e. hash tables can be used at various different sizes, and with hash functions of fairly poor quality. Testing this wide variety to eliminate possible worst cases upon an implementation switch is hard to impossible.
E.g. it's perfectly ok currently to use a direct-hash, like return (uint) pointer; which for most heap-allocated data will have an alignment of 16 or bigger. Thus, i'd say the prime division we currently use is vital to keep around. Without diving into the exact reasons for 2^n table sizes in your current implementation, a possible solution could be to come up with a new table of primes which are very close to 2^n sizes and still divide by those when using your implementation. That way we could preserve adequat distribution at the cost of a few free slots per table.

Comment 7 Hans Petter Jansson 2008-06-07 07:06:05 UTC

Thanks, Tim. That's not a bad idea. What I can do is let the initial hash be % the first prime after 2^n, then if the slot is taken and it's a miss, probe modulo the 2^n first slots of the table.

The probing algorithm must work modulo 2^n slots, or we can't guarantee more than 50% load (the probe sequence would end up always skipping over some slots).

Effectively, this means that the last P-(2^n) slots will be "wasted" - that is, they will only be populated if the initial modulo lands on that slot, never as a result of probing. I think that's acceptable given the following table of primes and wastage relative to the corresponding 2^n:

        5 ( 1 slots wasted)
       11 ( 3 slots wasted)
       17 ( 1 slots wasted)
       37 ( 5 slots wasted)
       67 ( 3 slots wasted)
      131 ( 3 slots wasted)
      257 ( 1 slots wasted)
      521 ( 9 slots wasted)
     1031 ( 7 slots wasted)
     2053 ( 5 slots wasted)
     4099 ( 3 slots wasted)
     8209 (17 slots wasted)
    16411 (27 slots wasted)
    32771 ( 3 slots wasted)
    65537 ( 1 slots wasted)
   131101 (29 slots wasted)
   262147 ( 3 slots wasted)
   524309 (21 slots wasted)
  1048583 ( 7 slots wasted)
  2097169 (17 slots wasted)
  4194319 (15 slots wasted)
  8388617 ( 9 slots wasted)
 16777259 (43 slots wasted)
 33554467 (35 slots wasted)
 67108879 (15 slots wasted)
134217757 (29 slots wasted)
268435459 ( 3 slots wasted)
536870923 (11 slots wasted)

Comment 8 Hans Petter Jansson 2008-06-08 05:24:11 UTC

Created attachment 112353 [details] [review]
GHashTable using quadratic probing, take 3.

I ended up doing it the other way around - i.e. making the initial prime modulo smaller than the n^2 table size. We get full utilization of the table this way (probe can find any bucket), and it makes the code simpler too.

Comment 9 Matthias Clasen 2008-07-17 02:06:17 UTC

Tim, did you want to consider this for 2.18 ?

Comment 10 Tim Janik 2008-07-22 10:20:39 UTC

(In reply to comment #9)
> Tim, did you want to consider this for 2.18 ?

We decided during Guadec to get this into 2.19 as early as possible, so it gets a long testing period.

Comment 11 Hans Petter Jansson 2008-08-19 23:24:45 UTC

Created attachment 117015 [details] [review]
GHashTable using quadratic probing, take 3 (rediffed).

Rediffed the patch against trunk of today. One hunk was no longer applying.

Comment 12 Martin Szulecki 2008-08-21 17:03:52 UTC

Just noting that with the latest patch, building (only) on 64bit systems fails with the "slice-threadinit" test:

...
** ERROR **: failed to reallocate slice from magazine (after g_thread_init): size=77
aborting...
/bin/sh: line 4: 24829 Trace/breakpoint trap srcdir=. LIBCHARSET_ALIAS_DIR=../glib/libcharset MALLOC_CHECK_=2 MALLOC_PERTURB_=$((${RANDOM:-256} % 256)) ${dir}$tst
FAIL: slice-threadinit
...

Comment 13 Hans Petter Jansson 2008-08-21 17:48:10 UTC

(In reply to comment #12)
> Just noting that with the latest patch, building (only) on 64bit systems fails
> with the "slice-threadinit" test:

That test is failing for me on a clean, unpatched checkout. So it's almost certainly unrelated to the patch.

Comment 14 Øystein Johansen 2008-09-19 07:47:38 UTC

This patch looks really good. Can it be improved even more?

I'm looking at the 'dictionary' implementation in Python, and I see the ingenious probing strategy that is used there.

Pseudocode:
  slot = hash % size   # (by bit-masking)
  perturb = hash

  while (slot not empty)   # collision
     slot = (5*slot) + 1 + perturb
     perturb = perturb >> 5

This probing pattern take 'abitrary big' steps on the first collisions and then perturb reduces to slot += (5*slot) + 1, when all bits in perturb are shifted out.
The table is resized at a load factor of 2/3. The size is then doubled or quadrupled depending on size.

This scheme works great in Python, and I think it can be easily used in glib with this patch. However, I must admit that I have not done any tests to compare these two schemes. What kind of benchmark test should be used to test performance of hashing schemes?

Comment 15 Matthias Clasen 2008-09-19 12:59:23 UTC

We've branched now, so now would be a good time to get this patch in.
We can certainly look at further improvements.

Comment 16 Hans Petter Jansson 2008-09-20 04:07:09 UTC

Committed to trunk.

2008-09-19  Hans Petter Jansson  <hpj@novell.com>

	Rewrite most of GHashTable to use open addressing with quadratic
	probing instead of chaining. This has the potential to reduce memory
	fragmentation significantly, while being slightly faster due to
	better locality and no need to call alloc/free functions for nodes.
	Benchmarks suggest it also uses less memory overall.

	* glib/ghash.c (prime_mod): Table of suitable primes for
	initial-probe distribution.
	(g_hash_table_set_shift): New function.
	(g_hash_table_find_closest_shift): New function.
	(g_hash_table_set_shift_from_size): New function.
	(g_hash_table_lookup_node_for_insertion): New function.
	(g_hash_table_lookup_node): Rewritten to return node index instead of
	pointer, use quadratic probe on flat table, and not return insertion
	data. The latter saves some computation for read-only lookups.
	(g_hash_table_remove_node): Rewrite to take a pointer directly to the
	node structure to remove, and clear that. Remove unlinking code.
	(g_hash_table_remove_all_nodes): Rewrite to not clear nodes
	individually, but en masse using memset () after potentially calling
	notify functions.
	(iter_remove_or_steal): Use new data structure and algorithm. Vastly
	simplified - now just a call to g_hash_table_remove_node ().
	(g_hash_table_resize): New resize code, re-indexing with new prime
	and cleaning up tombstones.
	(g_hash_table_maybe_resize): Table may hold 8 buckets minimum, no less
	than 1/4 load excluding tombstones, and no more than 15/16 load
	including tombstones. These numbers are the results of a lot of
	benchmarking with multiple complex applications, and should not be
	changed lightly.
	(g_hash_table_iter_next)
	(g_hash_table_lookup)
	(g_hash_table_lookup_extended)
	(g_hash_table_insert_internal)
	(g_hash_table_remove_internal)
	(g_hash_table_foreach_remove_or_steal)
	(g_hash_table_foreach)
	(g_hash_table_find)
	(g_hash_table_get_keys)
	(g_hash_table_get_values): Use new data structure and algorithm,
	fairly trivial changes.

Comment 17 Hans Petter Jansson 2008-09-20 04:17:02 UTC

Øystein: I don't think the approach you outline would improve the performance beyond perhaps trading off memory for CPU consumption. The quadratic probing also has the benefit of better locality since the initial probes take short steps.

You are free to prove me wrong, though: I think the best benchmarks come from real-world applications that make heavy use of GHashTable. For my tests, I took the CPU time spent by Evolution starting up in offline mode, indexing a set of mail folders with ~2GB of mail in them, and then checked the process' [heap] memory consumption as reported by pmap immediately after the indexing finished.

I also tested with a couple of smaller apps (one of them being gcalctool which uses GHashTables indirectly through GTK+ and Glade) to ensure the memory footprint of small apps did not increase. I was able to cut [heap] consumption by 0-5% in these cases, relative to the old GHashTable code.

Comment 18 zou 2009-02-09 02:54:44 UTC

Created attachment 128263 [details] [review]
optimize function g_hash_table_set_shift(). 

I think with this small patch, function g_hash_table_set_shift() would be faster and clearer.

Comment 19 zou 2009-02-09 02:58:15 UTC

The above patch is a trivial work, but I'd like to know if it is acceptable. Thanks.

Comment 20 zou 2009-02-09 03:55:59 UTC

Ops, sorry, forget the patch. It's an unsafe optimization in general. 
Since "1 << shift" might be undefined if "shift" is bigger than 32(but this wouldn't happen with the current ghash.c code, since the prime_mod has only 32 entries). 

The old code should be safer, maybe no need to change.

Comment 21 Alexander Larsson 2016-08-29 15:48:12 UTC

Did anyone ever look at using robin hood hashing for GHashTable:

http://www.sebastiansylvan.com/post/robin-hood-hashing-should-be-your-default-hash-table-implementation/

Comment 22 Hans Petter Jansson 2016-08-29 20:27:58 UTC

(In reply to Alexander Larsson from comment #21)
> Did anyone ever look at using robin hood hashing for GHashTable:

I looked at it briefly a couple of years ago, but wasn't fully convinced. The issue is the reliance on linear probing for insertion -- as part of the open addressing rewrite, I tried with linear probing and found that the primary clustering became intolerable with largish tables. I.e. thousands of consecutive filled buckets separated by thousands of consecutive empty ones. Quadratic probing solves that.

It looks like quadratic probing can't be applied with robin hood insertions, though, so you'll get the clustering, and each insertion will require iterating over (and ordering) half a cluster on average.

https://en.wikipedia.org/wiki/Primary_clustering

Obviously the performance is going to vary with load factor, quality of the hash function, ratio of insertions to lookups, typical hash table size and life span (removals/tombstone buildup), number of hash tables employed, etc -- but unless we can solve the clustering issue, I doubt it'd be a win for most applications.

I'd love to be proven wrong though :)

Comment 23 Hans Petter Jansson 2016-08-30 22:49:25 UTC

So I actually had an old Robin Hood-enabled GHashTable kicking around. I've cleaned it up a little and posted it as a patch in bug 770616. Benchmarking help welcome!