Bug 500507 – GHashTableIter API

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 500507 - GHashTableIter API


Summary:	GHashTableIter API


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2007-11-29 20:31 UTC by Jean-Yves Lefort
Modified:	2007-12-15 05:01 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
GHashTableIter (1.89 KB, patch) 2007-11-29 20:32 UTC, Jean-Yves Lefort	none	Details \| Review
updated patch (2.33 KB, patch) 2007-11-30 08:49 UTC, Jean-Yves Lefort	none	Details \| Review
updated following Owen Taylor's comments (3.93 KB, patch) 2007-11-30 21:11 UTC, Jean-Yves Lefort	none	Details \| Review
Oops. Forgot updating hash_table->nnodes after removing a node. (3.96 KB, patch) 2007-11-30 21:23 UTC, Jean-Yves Lefort	none	Details \| Review
fixed remove() and steal(); added docs, tests and updated glib.symbols (9.14 KB, patch) 2007-12-03 12:21 UTC, Jean-Yves Lefort	accepted-commit_now	Details \| Review
added one pointer of padding; added new funcs to glib-sections.txt (9.53 KB, patch) 2007-12-03 19:18 UTC, Jean-Yves Lefort	none	Details \| Review
added g_hash_table_iter_get_hash_table() (10.05 KB, patch) 2007-12-04 14:50 UTC, Jean-Yves Lefort	accepted-commit_now	Details \| Review
update to SVN trunk; add versioning; mention that modifying the hash table invalidates the iter (13.18 KB, patch) 2007-12-13 06:04 UTC, Jean-Yves Lefort	none	Details \| Review
removed the flags business (12.30 KB, patch) 2007-12-14 10:32 UTC, Jean-Yves Lefort	committed	Details \| Review

Description Jean-Yves Lefort 2007-11-29 20:31:24 UTC

g_hash_table_foreach() is painful to use since C does not support closures.

The attached patch implements a GHashTableIter API, which can be used as follows:

  GHashTableIter i;
  gpointer key;
  gpointer value;

  g_hash_table_iter_init(table, &i);
  while (g_hash_table_iter_next(&i, &key, &value)) {
    /* do something with key and value */
  }

Comment 1 Jean-Yves Lefort 2007-11-29 20:32:04 UTC

Created attachment 99865 [details] [review]
GHashTableIter

Comment 2 Jean-Yves Lefort 2007-11-29 20:34:11 UTC

Comment on attachment 99865 [details] [review]
GHashTableIter

>--- glib/ghash.c.orig	2007-09-16 18:28:33.000000000 +0200
>+++ glib/ghash.c	2007-11-29 21:27:45.000000000 +0100
>@@ -877,6 +877,38 @@
>     }
> }
> 
>+void
>+g_hash_table_iter_init (GHashTable     *hash_table,
>+			GHashTableIter *iter)
>+{
>+  iter->hash_table = hash_table;
>+  iter->node = NULL;
>+  iter->position = 0;
>+}
>+
>+gboolean
>+g_hash_table_iter_next (GHashTableIter *iter,
>+			gpointer       *key,
>+			gpointer       *value)
>+{
>+  g_return_val_if_fail (iter != NULL, NULL);
>+
>+  if (iter->node != NULL)
>+    iter->node = iter->node->next;
>+
>+  while (iter->node == NULL)
>+    {
>+      if (iter->position >= iter->hash_table->size)
>+	return FALSE;
>+      iter->node = iter->hash_table->nodes[iter->position++];
>+    }
>+
>+  if (key != NULL)
>+    *key = iter->node->key;
>+  if (value != NULL)
>+    *value = iter->node->value;
>+
>+  return TRUE;
>+}
>+
> 
> #define __G_HASH_C__
> #include "galiasdef.c"
>--- glib/ghash.h.orig	2007-09-16 18:28:33.000000000 +0200
>+++ glib/ghash.h	2007-11-29 21:24:45.000000000 +0100
>@@ -38,6 +38,16 @@
>                                gpointer  value,
>                                gpointer  user_data);
> 
>+typedef struct _GHashTableIter GHashTableIter;
>+
>+struct _GHashTableIter
>+{
>+  /* private */
>+  GHashTable	*hash_table;
>+  GHashNode	*node;
>+  int		position;
>+};
>+
> /* Hash tables
>  */
> GHashTable* g_hash_table_new		   (GHashFunc	    hash_func,
>@@ -81,6 +91,12 @@
> GList *     g_hash_table_get_keys          (GHashTable     *hash_table);
> GList *     g_hash_table_get_values        (GHashTable     *hash_table);
> 
>+void        g_hash_table_iter_init         (GHashTable     *hash_table,
>+					    GHashTableIter *iter);
>+gboolean    g_hash_table_iter_next         (GHashTableIter *iter,
>+					    gpointer       *key,
>+					    gpointer       *value);
>+
> /* keeping hash tables alive */
> GHashTable* g_hash_table_ref   		   (GHashTable 	   *hash_table);
> void        g_hash_table_unref             (GHashTable     *hash_table);

Comment 3 Behdad Esfahbod 2007-11-30 01:38:10 UTC

Not a huge fan.  Hash tables are not designed primarily for iteration, and there are so many other callbacks in glib that can use an iterator too.

In any case, the iter should be made private.  No more opaque types please.

Comment 4 Havoc Pennington 2007-11-30 02:15:24 UTC

I think this is a nice feature... lists don't need iters since you can just do the "for (l = list; l; l = l->next)" loop, and trees are rarely used, so this is the only commonly-used place in GLib that I end up repeatedly using a foreach func.

Some suggestions:
 - it needs to be documented when the iter becomes valid; e.g. is it OK 
   to remove items while iterating (in the current implementation 
   it looks like it is not)
 - optionally, perhaps you could do a change serial to detect and 
   warn about changes to the hash table while iterating; but this may
   add too much overhead I suppose
 - I would do the iter like GtkTextIter, with "dummy1" or "__dummy1"
   field names. I don't think it can be truly opaque, 
   since having to malloc it would be too inconvenient. But 
   giving nice names and types to the public fields means that people 
   will try to use them.

Comment 5 Jean-Yves Lefort 2007-11-30 08:48:28 UTC

(In reply to comment #4)
>  - it needs to be documented when the iter becomes valid; e.g. is it OK 
>    to remove items while iterating (in the current implementation 
>    it looks like it is not)

Modifying the hash table while iterating yields undefined behaviour, like g_hash_table_foreach().

>  - optionally, perhaps you could do a change serial to detect and 
>    warn about changes to the hash table while iterating; but this may
>    add too much overhead I suppose

The overhead is small, but I decided not to add this feature since g_hash_table_foreach() does not bother with it either.

>  - I would do the iter like GtkTextIter, with "dummy1" or "__dummy1"
>    field names. I don't think it can be truly opaque, 
>    since having to malloc it would be too inconvenient. But 
>    giving nice names and types to the public fields means that people 
>    will try to use them.

Good point. I did this (and also added the customary entry assertions that I had forgotten in g_hash_table_iter_init()) in the updated patch.

Comment 6 Jean-Yves Lefort 2007-11-30 08:49:36 UTC

Created attachment 99892 [details] [review]
updated patch

Comment 7 Owen Taylor 2007-11-30 14:18:55 UTC

A) Unlike Behdad, I'd be a huge fan of this feature; I've written a *lot* of little closure structs over the years.

B) g_hash_table_iter_init should either be init_iter() or have it's arguments
reversed. (I'd probably do the second to keep it grouped with iter_next()

C) 
+      ri->node = ri->hash_table->nodes[ri->position++];

 Would be distinctly clearer with ri->position++ on a separate line.
 Also clearer if 'position' was 'next_position'

D) Would it make sense to add

   g_hash_table_iter_remove/steal()

   that remove/steal the *last* node? (Means adding an extra field to the
   iter to store the last value of ri->node) foreach_remove() is certainly
   less common than foreach(), but does come up reasonably often.

   One thing you couldn't do with that is the  final 
   G_HASH_TABLE_RESIZE (hash_table)
   but that isn't really a big deal.... there's not a whole lot of gain
   from resizing hash tables smaller.

Comment 8 Jean-Yves Lefort 2007-11-30 21:11:49 UTC

Created attachment 99925 [details] [review]
updated following Owen Taylor's comments

Comment 9 Jean-Yves Lefort 2007-11-30 21:20:56 UTC

(In reply to comment #7)
> A) Unlike Behdad, I'd be a huge fan of this feature; I've written a *lot* of
> little closure structs over the years.

Who hasn't!

> B) g_hash_table_iter_init should either be init_iter() or have it's arguments
> reversed. (I'd probably do the second to keep it grouped with iter_next()

Fixed.

> +      ri->node = ri->hash_table->nodes[ri->position++];
> 
>  Would be distinctly clearer with ri->position++ on a separate line.
>  Also clearer if 'position' was 'next_position'

Fixed.

> D) Would it make sense to add
> 
>    g_hash_table_iter_remove/steal()
> 
>    that remove/steal the *last* node? (Means adding an extra field to the
>    iter to store the last value of ri->node) foreach_remove() is certainly
>    less common than foreach(), but does come up reasonably often.

I think you're right. I checked in my projects and found 3 occurrences in one.
I've added these functions. Note that it also required another extra bool in the iter, but it's not like it matters a lot.

>    One thing you couldn't do with that is the  final 
>    G_HASH_TABLE_RESIZE (hash_table)
>    but that isn't really a big deal.... there's not a whole lot of gain
>    from resizing hash tables smaller.

I could actually do it when _iter_next() returns FALSE (with yet another extra field "node_removed" in the iter), but that would assume that the user finishes walking the hash table after removing an entry. I'm not sure about the usefulness.

Comment 10 Jean-Yves Lefort 2007-11-30 21:23:46 UTC

Created attachment 99926 [details] [review]
Oops. Forgot updating hash_table->nnodes after removing a node.

Comment 11 Matthias Clasen 2007-12-03 03:41:58 UTC

I'd welcome this feature, too. 

Things that are still missing in the patch:

- Docs. We could probably handle this after the initial commit.

- What happens if you call remove() twice in a row ? Considering the
  pre-advancing, I guess it removes the next node. The docs should make
  that clear, too.

- Symbols. Need to add the new functions to glib.symbols and docs/reference/glib/glib-symbols.txt

Comment 12 Jean-Yves Lefort 2007-12-03 12:21:16 UTC

Created attachment 100109 [details] [review]
fixed remove() and steal(); added docs, tests and updated glib.symbols

Comment 13 Jean-Yves Lefort 2007-12-03 12:24:45 UTC

(In reply to comment #11)
> Things that are still missing in the patch:
> 
> - Docs. We could probably handle this after the initial commit.
> 
> - What happens if you call remove() twice in a row ? Considering the
>   pre-advancing, I guess it removes the next node. The docs should make
>   that clear, too.
> 
> - Symbols. Need to add the new functions to glib.symbols and
> docs/reference/glib/glib-symbols.txt

I've addressed these points (I cannot find glib-symbols.txt though) and also added regression tests. They pointed out a bug in the remove code, which corrupted the hash table when the removed node was a head node (I did not update hash_table->nodes).

Comment 14 Matthias Clasen 2007-12-03 14:10:40 UTC

Looks great, please commit. Cool that you added tests too.
The file I meant is actually called docs/reference/glib/glib-sections.txt,
but I can handle that myself, too.

Comment 15 Behdad Esfahbod 2007-12-03 18:29:46 UTC

+struct _GHashTableIter
+{
+  /*< private >*/
+  gpointer	dummy1;
+  gpointer	dummy2;
+  gpointer	dummy3;
+  int		dummy4;
+  gboolean	dummy5;
+};

I still prefer an opaque struct and gslice, but if you're doing this, at least add one or two paddings for future use.

Comment 16 Jean-Yves Lefort 2007-12-03 19:18:06 UTC

Created attachment 100136 [details] [review]
added one pointer of padding; added new funcs to glib-sections.txt

Comment 17 Jean-Yves Lefort 2007-12-03 19:39:00 UTC

(In reply to comment #15)
> +struct _GHashTableIter
> +{
> +  /*< private >*/
> +  gpointer     dummy1;
> +  gpointer     dummy2;
> +  gpointer     dummy3;
> +  int          dummy4;
> +  gboolean     dummy5;
> +};
> 
> I still prefer an opaque struct and gslice, but if you're doing this, at least
> add one or two paddings for future use.

I've added one pointer. Future extension can also use several bits of the position int, since I see that HASH_TABLE_MAX_SIZE is rather small. And the pre_advanced bool can also be moved into a bit of the position int.

Comment 18 Behdad Esfahbod 2007-12-03 22:24:14 UTC

(In reply to comment #17)
> (In reply to comment #15)
> > +struct _GHashTableIter
> > +{
> > +  /*< private >*/
> > +  gpointer     dummy1;
> > +  gpointer     dummy2;
> > +  gpointer     dummy3;
> > +  int          dummy4;
> > +  gboolean     dummy5;
> > +};
> > 
> > I still prefer an opaque struct and gslice, but if you're doing this, at least
> > add one or two paddings for future use.
> 
> I've added one pointer. Future extension can also use several bits of the
> position int, since I see that HASH_TABLE_MAX_SIZE is rather small. And the
> pre_advanced bool can also be moved into a bit of the position int.

Sure, there are many hacks one can use to work around such issues.  My question is still unanswered though: why should we introduce a new public struct when its publicness is not needed?  It's even easier to use a pointer as you don't have to make pointer (&iter) all the time.  The allocation is shortlived and small, perfect for gslice.

Comment 19 Jean-Yves Lefort 2007-12-03 22:51:56 UTC

(In reply to comment #18)
> Sure, there are many hacks one can use to work around such issues.  My question
> is still unanswered though: why should we introduce a new public struct when
> its publicness is not needed?  It's even easier to use a pointer as you don't
> have to make pointer (&iter) all the time.  The allocation is shortlived and
> small, perfect for gslice.

It is needed in order to avoid the burden (need to free after use) and performance penalty of heap allocation. Iterators have value semantics and allocating them on the stack is a commonly accepted idiom (see for instance GTK+ or C#).

Comment 20 Behdad Esfahbod 2007-12-03 22:57:38 UTC

(In reply to comment #19)
> (In reply to comment #18)
> > Sure, there are many hacks one can use to work around such issues.  My question
> > is still unanswered though: why should we introduce a new public struct when
> > its publicness is not needed?  It's even easier to use a pointer as you don't
> > have to make pointer (&iter) all the time.  The allocation is shortlived and
> > small, perfect for gslice.
> 
> It is needed in order to avoid the burden (need to free after use) and

On the other hand, _new/_free is a very common glib idiom.


> performance penalty of heap allocation.

Nonsense.  gslice is faster than you can pick on it.


> Iterators have value semantics and allocating them on the stack is a
> commonly accepted idiom (see for instance GTK+ or C#).

I'm ok if it was a documented public struct like GList that users could use directly instead of calling functions on to access key/value.  But I see no point in what you are doing.  Is new in glib for the least.  What C# does is irrelevant too btw.  Abstract talking doesn't help.

Comment 21 Owen Taylor 2007-12-03 23:03:21 UTC

The thing that makes me prefer stack allocation here is because of the
usage pattern .. you are always going to use this scoped to a single function
and a loop. Having to free() is just annoying and easy to get wrong. 

This API is being added to be convenient. I see no reason for it to be
annoying, even if the annoyance is slight.

While something like PangoLayoutIter there's a big win to opacity,
GHashTable and iteration over it is really simple and straightforward, and
unlikely to ever change. (Sure, GHashTable *could* be reimplemented in a different
way, but that hasn't happened in a decade...)

Comment 22 Jean-Yves Lefort 2007-12-03 23:28:12 UTC

(In reply to comment #20)
> (In reply to comment #19)
> > (In reply to comment #18)
> > > Sure, there are many hacks one can use to work around such issues.  My question
> > > is still unanswered though: why should we introduce a new public struct when
> > > its publicness is not needed?  It's even easier to use a pointer as you don't
> > > have to make pointer (&iter) all the time.  The allocation is shortlived and
> > > small, perfect for gslice.
> > 
> > It is needed in order to avoid the burden (need to free after use) and
> 
> On the other hand, _new/_free is a very common glib idiom.

Sure, they are often needed. But in this case they are not.

> > performance penalty of heap allocation.
> 
> Nonsense.  gslice is faster than you can pick on it.

It obviously cannot be faster than static stack allocation. It wouldn't even beat dynamic stack allocation. I don't see where the nonsense is.

> > Iterators have value semantics and allocating them on the stack is a
> > commonly accepted idiom (see for instance GTK+ or C#).
> 
> I'm ok if it was a documented public struct like GList that users could use
> directly instead of calling functions on to access key/value.  But I see no
> point in what you are doing.  Is new in glib for the least.  What C# does is
> irrelevant too btw.  Abstract talking doesn't help.

I have already stated the two main reasons (no burden, no performance penalty) for choosing stack allocation in my previous message.

Regarding C#, I mentioned it and GTK+ to suggest that I did not wake up one morning with the odd idea of allocating iterators on the stack. Well-designed systems do it.

Comment 23 Behdad Esfahbod 2007-12-04 00:03:11 UTC

> > > performance penalty of heap allocation.
> > 
> > Nonsense.  gslice is faster than you can pick on it.
> 
> It obviously cannot be faster than static stack allocation. It wouldn't even
> beat dynamic stack allocation. I don't see where the nonsense is.

If you can't measure it, it's not optimization/penalty.

 
> > > Iterators have value semantics and allocating them on the stack is a
> > > commonly accepted idiom (see for instance GTK+ or C#).
> > 
> > I'm ok if it was a documented public struct like GList that users could use
> > directly instead of calling functions on to access key/value.  But I see no
> > point in what you are doing.  Is new in glib for the least.  What C# does is
> > irrelevant too btw.  Abstract talking doesn't help.
> 
> I have already stated the two main reasons (no burden, no performance penalty)
> for choosing stack allocation in my previous message.
> 
> Regarding C#, I mentioned it and GTK+ to suggest that I did not wake up one
> morning with the odd idea of allocating iterators on the stack. Well-designed
> systems do it.

Last time I had to add a member to a public struct in Pango (PangoAnalysis), I ended up relying on the padding the compiler added in to align a bitfield.  Worse, we then found that gcc and Visual Studio pack bitfields differently, so had to get rid of the bitfield completely.  This is the kind of issues I'm trying to avoid.  Not all well-designed systems have the constraints we do.

My job is to be critical.  If Owen says it won't be a problem, who am I to disagree.  Just don't act like it doesn't exist.

Comment 24 Matthias Clasen 2007-12-04 00:32:42 UTC

I have to agree with Owen here. Freeing iterators is just annoying. 
Do we need to bring this up for vote at tomorrow team meeting, Behdad ?

Comment 25 Behdad Esfahbod 2007-12-04 00:43:36 UTC

Nah, go for it if you are comfortable.  As for freeing being annoying, well, this is not the first case :).

Comment 26 Jean-Yves Lefort 2007-12-04 00:51:06 UTC

(In reply to comment #23)
> > > > performance penalty of heap allocation.
> > > 
> > > Nonsense.  gslice is faster than you can pick on it.
> > 
> > It obviously cannot be faster than static stack allocation. It wouldn't even
> > beat dynamic stack allocation. I don't see where the nonsense is.
> 
> If you can't measure it, it's not optimization/penalty.

This mentality is not good. If you push it too much you end up rewriting glib in python or some other toy. Seriously, I know that the performance penalty is perfectly neglectible in the common case (just one alloc), but what about iterating over a small hash table inside a few large loops?

> > > > Iterators have value semantics and allocating them on the stack is a
> > > > commonly accepted idiom (see for instance GTK+ or C#).
> > > 
> > > I'm ok if it was a documented public struct like GList that users could use
> > > directly instead of calling functions on to access key/value.  But I see no
> > > point in what you are doing.  Is new in glib for the least.  What C# does is
> > > irrelevant too btw.  Abstract talking doesn't help.
> > 
> > I have already stated the two main reasons (no burden, no performance penalty)
> > for choosing stack allocation in my previous message.
> > 
> > Regarding C#, I mentioned it and GTK+ to suggest that I did not wake up one
> > morning with the odd idea of allocating iterators on the stack. Well-designed
> > systems do it.
> 
> Last time I had to add a member to a public struct in Pango (PangoAnalysis), I
> ended up relying on the padding the compiler added in to align a bitfield. 
> Worse, we then found that gcc and Visual Studio pack bitfields differently, so
> had to get rid of the bitfield completely.  This is the kind of issues I'm
> trying to avoid.  Not all well-designed systems have the constraints we do.
> 
> My job is to be critical.  If Owen says it won't be a problem, who am I to
> disagree.  Just don't act like it doesn't exist.

We all got bitten by compiler layout issues one day or another. It doesn't mean that public structs should be considered obscene regardless of the context. My little public struct is well-behaved and deserves to live.

Comment 27 Matthias Clasen 2007-12-04 02:50:06 UTC

Discussed this some more with behdad on irc, and he pointed at gsequence for an
example of iterators in glib that are not stack-allocated, yet don't require freeing. 

I believe we can do pretty much the same for hash tables. 
In fact, Behdad may have a patch almost ready.

Comment 28 Behdad Esfahbod 2007-12-04 08:40:18 UTC

While it's technically possible, it's very hard to come up with maintainable code for that without enlarging GHashNode.  The hardest part is implementing remove/steal, but even without that it involves black pointer magic that spreads all over the code (not only iter).

Also, the hash table needs to be passed to all iter functions.  And iter_next returns the new iterator that should be saved (ala g_list_append, etc).

Seems like it's not worth the extra effort and inconvenience.  I'm giving up.

Comment 29 Jean-Yves Lefort 2007-12-04 14:50:11 UTC

Created attachment 100181 [details] [review]
added g_hash_table_iter_get_hash_table()

I saw g_sequence_iter_get_sequence() and realized I should also add a getter here. Hopefully there will be no objections.

Comment 30 Matthias Clasen 2007-12-04 17:21:41 UTC

> it involves black pointer magic that spreads all over the code

hmm, didn't look that hard to me at first sight. I'll give it a try tonight.
If it doesn't work out I'll concur and go with the stack-allocated iter approach.

Comment 31 Havoc Pennington 2007-12-04 19:25:06 UTC

I think you guys might be overthinking it. There is just no way we'd need to extend this struct, and if there ever is, having an extra pointer of padding should allow it.

Proof: the existing GHashTable API is fine, this is just convenience API, and so by definition there can't be a truly _necessary_ extension to it, at best an extension would be some kind of small additional convenience. So, your worst case for the non-opaque struct is you want to add some small additional convenience and can't. There's no way that contingency is worth jumping through hoops to plan for.

Especially when nobody has even a speculative example of a possible extension...
I mean, it's an iterator. This is pretty well-understood.

I say keep it simple.

Comment 32 Matthias Clasen 2007-12-04 19:47:42 UTC

Probably :-)

Comment 33 Matthias Clasen 2007-12-07 06:44:12 UTC

Yeah, lets go with the current approach.

It might be worthwhile to add a small not to the g_hash_table_foreach docs that 
points out the iter api.

Comment 34 Morten Welinder 2007-12-07 14:38:54 UTC

I don't see the rules of the game spelled out anywhere.

Can you delete hash elements without invalidating the iterator?  Insert?
Have two iterators for the same hash?

Also, a large fraction of hash iterations are for the special cases of
getting a list of the domain or range.  We really ought to have special
functions for that.

Comment 35 Matthias Clasen 2007-12-07 15:49:37 UTC

> Also, a large fraction of hash iterations are for the special cases of
> getting a list of the domain or range.  We really ought to have special
> functions for that.

You mean like ?

GList *     g_hash_table_get_keys          (GHashTable     *hash_table);
GList *     g_hash_table_get_values        (GHashTable     *hash_table);

I believe deletion of hash elements is covered reasonably well with 
the g_hash_table_iter_remove()/steal() functions. We should probably 
document somewhere that calling g_hash_table_insert/_remove() makes you
lose any guarantees about ongoing iterators. And we should probably have some
timestamp mechanism to prevent hard-to-find bugs due to this. Has this
been discussed before ?

Comment 36 Morten Welinder 2007-12-07 16:04:23 UTC

> GList *     g_hash_table_get_keys          (GHashTable     *hash_table);
> GList *     g_hash_table_get_values        (GHashTable     *hash_table);

Hmm...  I am so behind the times, it seems.  But GList was the wrong
choice -- there is no ordering so going backwards makes no sense.
Oh well, too late.

The above have some impact on bug 60198.

Comment 37 Jean-Yves Lefort 2007-12-08 00:52:42 UTC

(In reply to comment #35)
> I believe deletion of hash elements is covered reasonably well with 
> the g_hash_table_iter_remove()/steal() functions. We should probably 
> document somewhere that calling g_hash_table_insert/_remove() makes you
> lose any guarantees about ongoing iterators.

Maybe, yes. I just thought it was so obvious that it did not need to be mentioned. 

> And we should probably have some
> timestamp mechanism to prevent hard-to-find bugs due to this. Has this
> been discussed before ?

I haven't added timestamping because g_hash_table_foreach() does not have it either. I imagine that if deemed desirable, it should be added inside !G_DISABLE_ASSERT sections?

Comment 38 Morten Welinder 2007-12-08 01:01:59 UTC

Obvious?  Let's check the C++ equivalent:

"Map has the important property that inserting a new element into a map does not invalidate iterators that point to existing elements. Erasing an element from a map also does not invalidate any iterators, except, of course, for iterators that actually point to the element that is being erased."
(http://www.sgi.com/tech/stl/Map.html)

So if you want to be obvious, you should perhaps make it work.  At the very
least it should be documented.

Comment 39 Mathias Hasselmann (IRC: tbf) 2007-12-08 09:35:48 UTC

Morton: I seriously doubt, that C++ is useful for measuruing obviousness... ;-)

Also Jean-Yves is very right in claiming obviousness, as g_hash_table_foreach also forbids modifications to the iterated hash table.

Additionally in Java, C# and Python for instance, modifications to hash tables invalidate iterators.

Java (http://java.sun.com/j2se/1.4.2/docs/api/java/util/HashMap.html):

»The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.«

C# (http://msdn2.microsoft.com/en-us/library/ms132090.aspx):
Dictionary.System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<TKey,TValue>>.GetEnumerator Method

»An enumerator remains valid as long as the collection remains unchanged. If changes are made to the collection, such as adding, modifying, or deleting elements, the enumerator is irrecoverably invalidated and its behavior is undefined.

The enumerator does not have exclusive access to the collection; therefore, enumerating through a collection is intrinsically not a thread-safe procedure. To guarantee thread safety during enumeration, you can lock the collection during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.«

Python (http://www.python.org/dev/peps/pep-0234/):

»Dictionaries implement a tp_iter slot that returns an efficient
iterator that iterates over the keys of the dictionary. During
such an iteration, the dictionary should not be modified, except
that setting the value for an existing key is allowed (deletions
or additions are not, nor is the update() method). This means
that we can write«

Comment 40 Matthias Clasen 2007-12-08 14:40:32 UTC

Right.

So I guess my question is: do we want to have similar best-effort warnings ?

I think we probably do.

Comment 41 Behdad Esfahbod 2007-12-08 22:52:51 UTC

(In reply to comment #37)

> > And we should probably have some
> > timestamp mechanism to prevent hard-to-find bugs due to this. Has this
> > been discussed before ?
> 
> I haven't added timestamping because g_hash_table_foreach() does not have it
> either. I imagine that if deemed desirable, it should be added inside
> !G_DISABLE_ASSERT sections?

g_hash_table_foreach() is much harder to misuse.

Comment 42 Jean-Yves Lefort 2007-12-13 02:45:41 UTC

Will someone commit this as-is, or should I add timestamping? I don't know when 2.16 will be released, but I'd like to make sure this will be part of it.

Comment 43 Matthias Clasen 2007-12-13 02:56:57 UTC

If you are willing to work on the timestamping, that would be great.

Comment 44 Jean-Yves Lefort 2007-12-13 06:04:36 UTC

Created attachment 100866 [details] [review]
update to SVN trunk; add versioning; mention that modifying the hash table invalidates the iter

Comment 45 Matthias Clasen 2007-12-14 00:27:43 UTC

Looks fine to me; I somewhat dislike the amount of
#ifndef G_DISABLE_ASSERT
but I'm not sure there is much we can do about it.

Is it really worth bothering with the combined 
flags_and_version field ? It would probably simplify
the code a bit to just alway allocate the version field.

Comment 46 Jean-Yves Lefort 2007-12-14 10:32:46 UTC

Created attachment 100940 [details] [review]
removed the flags business

You're right, sometimes I just like to toy too much. For now the version will live in the padding pointer. If more space is needed later on, it'll always be time to compact things a bit to recover the pointer.

Comment 47 Matthias Clasen 2007-12-15 05:01:18 UTC

2007-12-14  Matthias Clasen  <mclasen@redhat.com>

        * glib/glib.symbols:
        * glib/ghash.[hc]: Add hash table iterators.  (#500507,
        Jean-Yves Lefort)

        * tests/hash-test.c: Test iterators.