Bug 560061 – GLib could use a collection API

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 560061 - GLib could use a collection API


Summary:	GLib could use a collection API


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	gobject
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2008-11-09 20:08 UTC by Philip Van Hoof
Modified:	2015-03-11 14:10 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
A first patch that introduces the API (25.12 KB, patch) 2008-11-09 20:09 UTC, Philip Van Hoof	none	Details \| Review
Round one, the convenience API and the removal of the padding (25.19 KB, patch) 2008-12-22 22:17 UTC, Philip Van Hoof	none	Details \| Review

Description Philip Van Hoof 2008-11-09 20:08:23 UTC

These GLib based libraries are inventing there own hot water and wheels trying to create collection APIs:

* Camel (CamelIterator)
* libanjuta (IAnjutaIterable)
* libedataserver (EIterator)
* libgda (GdaDataModelIter)
* libvala (using libgee)
* Tinymail (TnyList and ?TnyIterator)
* GVariant (g_variant_iter*, etc)
* GIO (gfileenumerator.h)
* Gtk+ (GtkTreeModel is an iterable)
* Clutter and Tidy with ClutterModelIter
* GStreamer with GstIterator 

To stop this trend of incompatible collection APIs, and to aid the GObject introspection work with making it easy for people to write language bindings, in particular for collections and lists, I propose that we add a generic collection API to GLib.

I will attach a patch that introduces GSeq, GCollection, GIterator and GIterable as first step towards a useful collection API.

Several face-to-face meetings (at the Berlin hackfest) and IRC meetings have happened around this subject already. Among the concerns were boxing of non-boxed and non-object C types to be fit as instances to go into the collections.

We (Jürg and me) and planning to further extend this proposal and patch so that the API offers convenient ways for this.

Meanwhile I already propose this as a patch so that GLib maintainers can inform us about nuances that they don't like in the API already.

This API is heavily based on Vala's libgee, with the exception that Gee.List was renamed to GLib.Seq, else after namespace translation the type would end up being called GList, which is obviously already in-use for GLib's doubly linked list structure.

A better analysis can be found here: http://live.gnome.org/IteratorsAPI

I urge people who want to comment on this to in any case read that wiki page too. That page explains vitally important things about the design and will serve as home for future work on this.

Comment 1 Philip Van Hoof 2008-11-09 20:09:01 UTC

Created attachment 122279 [details] [review]
A first patch that introduces the API

Comment 2 David Zeuthen (not reading bugmail) 2008-11-09 21:45:16 UTC

You don't need padding for interfaces...

Comment 3 Philip Van Hoof 2008-11-10 15:34:30 UTC

Agree that adding a method to an interface breaks the API anyway (unlike with abstract classes, where you can still provide an abstract implementation of it, for old implementers who don't yet implement the new method).

I can of course remove the padding. Let me know.

Comment 4 Wouter Bolsterlee (uws) 2008-11-10 23:48:49 UTC

Is there a reason this code goes into gio/ instead of glib/ itself?

Comment 5 Tomaz Vajngerl 2008-11-24 18:13:14 UTC

Is is possible to rename GSeq as GSequence. IMHO there is no point to cut it off like that - especially when collection, iterator, iterable and others are written in full. 

Thanks

Comment 6 Philip Van Hoof 2008-11-24 18:39:14 UTC

@Tomaz:

Sure, can be done. It's just a s/Seq/Sequence, s/seq/sequence and s/SEQ/SEQUENCE.

Does the person deciding about this patch want a new patch for this request? It's relatively easy to rename so I plan to do it as soon as there's some sort of approval.

@Wouter:

Because gio already depends on gobject/, and gio is the only component atm that does (in glib). Otherwise we would have to make yet another separate .so (with all the tricks that come with gio for this, if any) that depends on gobject/.

Comment 7 Matthias Clasen 2008-11-24 18:49:25 UTC

GLib already has a GSequence...

Comment 8 Wouter Bolsterlee (uws) 2008-11-24 18:51:42 UTC

(In reply to comment #6)
> Because gio already depends on gobject/, and gio is the only component atm that
> does (in glib). Otherwise we would have to make yet another separate .so (with
> all the tricks that come with gio for this, if any) that depends on gobject/.

This is not a convincing argument, in my opinion. A collection API is not inherently IO related, so GIO is not really the place for it. Why not directly in gobject itself?

Comment 9 Philip Van Hoof 2008-11-24 23:15:57 UTC

@Wouter:
I'd be fine with putting the components directly in gobject/, or in a new gio style subdirectory and .so file. Let us let the GLib maintainers decide about this? I'll refactor and recode stuff, just let me know.

@Matthias:
Bummer, then I guess we better keep naming it GSeq instead of GSequence. Note that the difference between a GCollection and a GSeq is that a GSeq is a collection in a well defined order.

Which means that we have g_seq_insert (at index), g_seq_qet (at index), g_seq_set (at index), g_seq_index_of, g_seq_remove_at (at index) and non-index based g_collection_add, g_collection_remove and g_collection_clear.

A GSeq implementer requires implementing GCollection (where g_collection_add simply means appending for the GSeq, g_collection_remove(s, x) means the same as g_seq_remove_at (s, g_seq_index_of (s, x))).

If somebody has an interesting name for GSeq, let me know.

GCollectionAtWellDefinedOrder ? :-)

Comment 10 Philip Van Hoof 2008-11-24 23:19:24 UTC

Note that for the Java and .NET ppl: GSeq is the equivalent of List in Java and IList in .NET, GCollection is the equivalent of ICollection in .NET and Collection in Java. Of course was GList already taken by GLib's doubly linked list.

Comment 11 Wouter Bolsterlee (uws) 2008-11-24 23:24:42 UTC

(In reply to comment #9)
> If somebody has an interesting name for GSeq, let me know.
> 
> GCollectionAtWellDefinedOrder ? :-)

GOrderedCollection?

Comment 12 David Zeuthen (not reading bugmail) 2008-11-27 20:07:01 UTC

Hi Philip,

Why only collections and not the full shebang that is in e.g. libgee?

I mean, either the GLib maintainers decide that GObject should ship with support for modern container interfaces or they don't. Only doing collections and not maps seems like a weird backwards decision to me.

Also, I think it would be useful with constructors that wrap existing GLib datatypes in these modern containers so I can do e.g.

 static void
 some_function_that_takes_a_g_list (GList *list)
 {
   GSeq *seq;

   seq = g_list_wrap_in_collection (list, GTK_TYPE_WIDGET);

   call_new_and_shiny_function_that_takes_a_g_seq (seq);

   g_object_unref (seq);
 }

or

 static void
 some_function_that_takes_a_g_hash_table (GHashTable *hash_table)
 {
   GMap *map;

   map = g_hash_table_wrap_in_map (hash_table, G_TYPE_STRING, G_TYPE_STRV);

   call_new_and_shiny_function_that_takes_a_g_map (map);

   g_object_unref (map);
 }

to make this new and shiny stuff easier to use from existing code bases.

And, FWIW, I'm hoping the GLib maintainers decide this is a Good Thing(tm) and that they merge something like libgee. It would be good to get some actual feedback from the GLib maintainers on this.

Comment 13 Philip Van Hoof 2008-11-27 20:32:52 UTC

@David: I agree with you that we eventually want the full shebang. But let's first wait for the feedback from the actual GLib maintainers and do this step by step.

Those wrappers (for GList, GSList, GPtrArray, GHashTable, etc) are indeed possibilities. Especially for bringing a list to a outside API, where the wrapper can take the responsibility of adding boxing to the elements within the container and the container itself.

We could add types like Map too, but this would more or less have the same functionality as GHashTable has. Right now I'm not proposing any actual container classes yet, just the interfaces and abstract classes that would be needed as the basis for a collection API.

Also note that actual class types could still be implemented in a separate library like gee.

Comment 14 David Zeuthen (not reading bugmail) 2008-11-27 21:58:57 UTC

(In reply to comment #13) 
> Those wrappers (for GList, GSList, GPtrArray, GHashTable, etc) are indeed
> possibilities. Especially for bringing a list to a outside API, where the
> wrapper can take the responsibility of adding boxing to the elements within the
> container and the container itself.
> 
> We could add types like Map too, but this would more or less have the same
> functionality as GHashTable has.

One important property about good collection APIs is that it should be possible to infer the type of the contained elements. This is needed for e.g. serializing stuff which is important for e.g. IPC. 

You can't do that in any sane way (storing it in a lookside structure isn't sane) with any of the GLib data types, specifically not GHashTable.

(Ouch, it also turns out you can't do this with at least libgee 1.3.6 or maybe I'm not looking hard enough.)

(Also understand I'm not picking on the classic GLib data types here, they're good API, they're efficient and have their place. They're just not a silver bullet for programs that are heavily object oriented.)

> Also note that actual class types could still be implemented in a separate
> library like gee.

Maybe. But I don't think it's particularly helpful to just provide interfaces in GObject if we want people to be using this kind of API. 

For example, one very common usage is to return a list of objects. Since we don't currently have something like this we end up with API with contracts like this

http://library.gnome.org/devel/gio/unstable/GVolumeMonitor.html#g-volume-monitor-get-volumes

  "The returned list should be freed with g_list_free(), after its
   elements have been unreffed with g_object_unref()."

which is just asking for trouble. And that's if you're in luck, some APIs just assume the programmer will "get it". Then there's language bindings but I digress.

With a modern collection API (including base implementations) in the core stack this problem goes the way of the Dodo.

Comment 15 Colin Walters 2008-11-27 22:01:25 UTC

Re comment #14, have you seen the GObject introspection annotations?

That function should be annotated thus:

Returns: (transfer container) (element-type GVolume): A list of #GVolume

If you need that information at runtime, the introspection system can give it to you.

Comment 16 Colin Walters 2008-11-27 22:02:11 UTC

Actually should be (transfer full).

Comment 17 David Zeuthen (not reading bugmail) 2008-11-27 22:06:15 UTC

(In reply to comment #15)
> Re comment #14, have you seen the GObject introspection annotations?
> 
> That function should be annotated thus:
> 
> Returns: (transfer container) (element-type GVolume): A list of #GVolume
> 
> If you need that information at runtime, the introspection system can give it
> to you.

Sure that's nice for APIs using the classic GLib data types (and we have a lot of useful API like that), but wouldn't you agree it would be much more safer to just use a real collection API instead of relying on the programmer getting the gtk-doc annotation right? [1]

[1] : not to pick on you Colin, but the fact you got it wrong the first time pretty much proves my point ;-)

Comment 18 Colin Walters 2008-11-27 22:13:41 UTC

We need the type information at compile time.  So if the suggestion is that the function would look like this:

GCollection *
g_volume_monitor_get_volumes (GVolumeMonitor *mon)
{
  GCollection *coll = g_list_collection_new (G_TYPE_VOLUME);

  ...
  
  return coll;
}

That's not good enough; we still need to have the (element-type GVolume) annotation so the information is available for static language bindings.  You could remove the (transfer) annotation though, under the assumption that a GCollection always holds a reference.

Comment 19 Jürg Billeter 2008-11-27 22:19:27 UTC

(In reply to comment #14)
> One important property about good collection APIs is that it should be possible
> to infer the type of the contained elements. This is needed for e.g.
> serializing stuff which is important for e.g. IPC. 
> 
> You can't do that in any sane way (storing it in a lookside structure isn't
> sane) with any of the GLib data types, specifically not GHashTable.
> 
> (Ouch, it also turns out you can't do this with at least libgee 1.3.6 or maybe
> I'm not looking hard enough.)

Do you mean something like gee_iterable_get_element_type? It's in libgee since version 0.1.1.

I'd also like to add the full libgee API to GLib, however, we should first all agree that we want an interface-based collection API in GLib.

(In reply to comment #18)
Yes, we certainly have to keep the element-type annotations as well. Neither one replaces the other.

Comment 20 David Zeuthen (not reading bugmail) 2008-11-27 22:24:47 UTC

(In reply to comment #18)
> We need the type information at compile time.  So if the suggestion is that the
> function would look like this:
> 
> GCollection *
> g_volume_monitor_get_volumes (GVolumeMonitor *mon)
> {
>   GCollection *coll = g_list_collection_new (G_TYPE_VOLUME);
> 
>   ...
> 
>   return coll;
> }
> 
> That's not good enough; we still need to have the (element-type GVolume)
> annotation so the information is available for static language bindings.  You
> could remove the (transfer) annotation though, under the assumption that a
> GCollection always holds a reference.
> 

The proposal would be to have a GObjectList class that implements GCollection (the whole reason I started talking about this was that it makes sense to provide more than just the interfaces in the core stack) and g_volume_monitor_get_volumes() would then return "GObjectList *" (then users can cast down as appropriate).

For more fancy stuff you *might* need annotations; depends. If it's a GObjectList then the runtime you bind to certainly knows enough information about how to free / copy the passed instance. Which probably is all it needs, I don't know.

Comment 21 David Zeuthen (not reading bugmail) 2008-11-27 22:27:19 UTC

(In reply to comment #19)
> > (Ouch, it also turns out you can't do this with at least libgee 1.3.6 or maybe
> > I'm not looking hard enough.)
> 
> Do you mean something like gee_iterable_get_element_type? It's in libgee since
> version 0.1.1.

Ah, nice, it seems I wasn't looking hard enough.

> I'd also like to add the full libgee API to GLib, however, we should first all
> agree that we want an interface-based collection API in GLib.

Indeed.

Comment 22 Philip Van Hoof 2008-11-27 22:47:31 UTC

Agreeing with David here. I see the introspection work and its annotations as a helpful *but* ad-hoc solution for many of the problems.

How would a language binding iterate over a standard GLib container type, for example? 

Assuming we have these ad-hoc annotations (finally) in place (which is 40 times more work than this patch), and assuming the annotations would specify how iteration should take place: the only way would either be by generating glue code and compile it natively or by using g_function_info_invoke that uses a cached GIFunctionInfo somehow.

I understand that some people don't think that method invocation needs to be fast .. (actually, I don't understand, but anyway). The problem is that the iteration of a container type is precisely usually going to happen in tight loops. A lot of people don't even use the C -> macro <- g_list_next because they are not aware (because it's not written in capitals) that it's a macro (and therefore compiled inline as list?list->next:NULL).

So the ad-hoc gtk-doc annotations plus GObject-introspection solution would end up (if we don't want glue code to be generated, like GJS and PyBank want, but what also GBus and other over-DBus experimental projects will want to start using at some point) like this:

I'm writing it in C, but this would more or less be what the higher language would have to execute too (if it doesn't want to compile any native glue C code):

GList *list = ...
GIFunctionInfo *next = type_factory_lookup_next_method_for_container ("GList");
GIFunctionInfo *getter = type_factory_lookup_get_current_method_for_container ("GList");
gboolean caller_owns = is_caller_owns (getter);

while (list) {
	GArgument next_args[1];
	GArgument next_retval;
	GArgument get_args[1];
	GArgument get_retval;
	GError *next_error = NULL;
	GError *get_error = NULL;

	next_args[0].v_pointer = list;


	g_function_info_invoke (getter, get_args, 1, 
				NULL, 0, &get_retval, 
				&get_error);
		

	if (!get_error) {
		GObject *value = get_retval.v_pointer;

		/* Work with value */ 

		if (caller_owns)
			g_object_unref (value);

		g_error_free (get_error);
	}

	g_function_info_invoke (info, next_args, 1, 
				NULL, 0, &next_retval, 
				&next_error);

	if (next_error) {
		g_error_free (next_error);
		list = NULL;
	} else
		list = next_retval.v_pointer;
}

g_base_info_unref (next);
g_base_info_unref (getter);


With a collection API and the native C library developer providing a GIterable, or if we'd have a simple wrap for the type GList that implements GIterable, it would be much better in performance and more easy for implementation:

GList *list = ...
WrapperForList *wrapper = wrapper_for_list_new (list);
GIterator *iter = g_iterable_iterator (G_ITERABLE (wrapper));

while (g_iterator_next (iter)) {
	GObject *value = g_iterator_current (iter);

	/* Work with value */ 

	g_object_unref (value);
}

g_object_unref (wrapper);

It's also my opinion that with a collection API we are doing sane things with our stack, with only GObject-introspection's infrastructure we are not changing the "mindset" of our library writers: they will keep exposing C-specific collection & container types like GList, GSList, GPtrArray, GHashTable, GBTree, GEtcEteraList. And I think that eventually that isolates our libraries from modern application development.

But I don't want to start doom scenarios by mentioning "isolation". It's just an opinion :)

Comment 23 Philip Van Hoof 2008-11-27 22:58:03 UTC

Comment #18:

> That's not good enough; we still need to have the (element-type GVolume)
> annotation so the information is available for static language bindings.  You
> could remove the (transfer) annotation though, under the assumption that a
> GCollection always holds a reference.

This is the case, GIterator's g_iterator_current is atm always caller-owns. It depends on the container's element-type (equivalent to the G in many 'generics' examples) to know how to free it if it's not a boxed type.

Boxing is indeed the hard part of this collection API.

Comment 24 Philip Van Hoof 2008-11-27 23:26:46 UTC

Comment #18:

> GCollection *
> g_volume_monitor_get_volumes (GVolumeMonitor *mon)
> {
>   GCollection *coll = g_list_collection_new (G_TYPE_VOLUME);
>   ...
>  return coll;
> }


The ideal native C API would however be like this (in my opinion):

static inline void
foreachvol (GVolume *volume, GCollection *collection)
{
   g_collection_add (volume, g_object_ref (collection));
}

void
g_volume_monitor_get_volumes (GVolumeMonitor *mon, GCollection *collection)
{
  g_list_foreach (mon->priv->volumes, (GFunc) foreachvol, collection);
}


This would enable you to do this in the higher language:

public class VolumeModel : Gtk.TreeModel, GLib.Iterable {
  GLib.VolumeMonitor monitor;


  private void insert (GLib.Volume volume) {
       this.insert (volume ...)
       this.row_inserted (...);
  }

  private void delete (GLib.Volume volume) {
       this.delete (volume ...)
       this.row_deleted (...);
  }

  public VolumeModel (GLib.VolumeMonitor monitor) {
      this.monitor = monitor;
      this.monitor.added += add (GLib.Volume volume) => {
         this.insert (volume);
      }
  }

  public void sync () {
     monitor.get_volumes (model);
  }

}

public class VolumeView : Gtk.TreeView {

  private void setup_columns () {
	/* ... */
  }

  public VolumeView (VolumeModel model) {
	this.setup_columns ();
	this.model = model
  }
}

public class MyClass {
  public void MyMethod () {
     GLib.VolumeMonitor monitor = new GLib.VolumeMonitor ();
     VolumeModel model = new VolumeModel (monitor);
     VolumeView view = new VolumeView (model);

     model.sync ();         
  }
}

Comment 25 Philip Van Hoof 2008-11-27 23:28:39 UTC

Sorry, like this I meant:

public class VolumeModel : Gtk.TreeModel, GLib.Collection {
  GLib.VolumeMonitor monitor;


  private void add (GLib.Object volume) {
       this.insert (volume ...)
       this.row_inserted (...);
  }

  private void remove (GLib.Object volume) {
       this.delete (volume ...)
       this.row_deleted (...);
  }

  public VolumeModel (GLib.VolumeMonitor monitor) {
      this.monitor = monitor;
      this.monitor.added += add (GLib.Volume volume) => {
         this.insert (volume);
      }
  }

  public void sync () {
     monitor.get_volumes (this);
  }

}

Comment 26 Matthias Clasen 2008-11-28 04:30:30 UTC

Relevant prior discussion of this proposal and some alternatives:

 http://mail.gnome.org/archives/gtk-devel-list/2008-July/msg00144.html

Comment 27 Dan Winship 2008-11-28 21:13:35 UTC

(In reply to comment #20)
> > That's not good enough; we still need to have the (element-type GVolume)
> > annotation so the information is available for static language bindings.
> 
> The proposal would be to have a GObjectList class that implements GCollection
> (the whole reason I started talking about this was that it makes sense to
> provide more than just the interfaces in the core stack) and
> g_volume_monitor_get_volumes() would then return "GObjectList *" (then users
> can cast down as appropriate).
> 
> For more fancy stuff you *might* need annotations; depends. If it's a
> GObjectList then the runtime you bind to certainly knows enough information
> about how to free / copy the passed instance. Which probably is all it needs, I
> don't know.

Statically-typed languages with generics will always want the annotation, so that rather than wrapping the method as, eg, "IEnumerable GetVolumes();", they can wrap it as "IEnumerable<GIO.Volume> GetVolumes();" so the user doesn't need to do any extra casting.

Comment 28 Philip Van Hoof 2008-12-22 12:01:37 UTC

Ping, do we have a decision from the GLib maintainer already?

Comment 29 Matthias Clasen 2008-12-22 15:20:14 UTC

Iterators-as-objects is pretty much a no-go in my opinion. 
Not sure if you want to count that as a decision...

Comment 30 Philip Van Hoof 2008-12-22 15:57:54 UTC

What is the exact problem with iterators being objects?

Comment 31 Philip Van Hoof 2008-12-22 16:22:16 UTC

If "performance" is that problem, some background information on GObject construction's performance (when you don't have properties on the class) and once Bug #557100's patch is applied.

With the patch ten million instances are created in 2 seconds. You create one iterator per iteration. I fail to see a use-case where both ...

- you care about the two seconds
- you need ten million simultaneous iterations

An example ...

For user interfaces, an hypothetic use-case could be to have an iteration for each Red-value, each Blue-value and each Green-value of each pixel of a very big LCD screen. That still wouldn't be even one million iterations. The creation of iterators for that use-case would be under 0.2 seconds. So a screensaver that operates on each pixel of your screen three times, would take 0.2 seconds to setup.

Another example ...

A GtkTreeModel would only need one iterator. The way GtkTreeIter works you indeed need a lot of iterators for a typical use. This is not how a model based on Iterable would work. GtkTreeModel would be a normal "iterable". You actually need one iterator per GtkTreeView. You could have one iterator per parent node, if you implement it that way. If you do then those iterators can be created at-the-time a parent-node is unfolded.

Can you elaborate if performance is indeed the issue?

And if not, what is the other issue?

An iterator needs reference counting for comfortable use. We would have to make a complete new type, which would require language binding developers to take special care. Which is why I'm keen to know what the actual issue is.

Comment 32 Matthias Clasen 2008-12-22 16:34:52 UTC

http://mail.gnome.org/archives/gtk-devel-list/2008-July/msg00148.html

Comment 33 David Zeuthen (not reading bugmail) 2008-12-22 17:06:36 UTC

So it turns out I needed something like this for EggDBus that I just announced last night

http://mail.gnome.org/archives/gtk-devel-list/2008-December/msg00059.html

I that mail I promised to follow up here with some thoughts, so here goes

o I like the general idea of libgee a lot; I 100% support that it
would be useful with a few standard interfaces for collections
in the core library stack. The fact we don't have this means that
people reinvent iterators over and over again. This leads to confusion,
confusion leads to anger....

o Attempting to get a thing like this into GLib without genuinely
caring about whether it's going to make life simpler or harder
for C programmers probably isn't going to help your cause. I
think you would need to show there's tangible benefits for C
programmers in adding this code.

Specifically, please realize that *requiring* the user to deal with allocators allocated on the heap is probably a no go. The annoying thing about this is that you have to free them. Having to do this just to iterate a list or an array is completely unreasonable in C.

In EggDBus I have these two classes

http://people.freedesktop.org/~david/eggdbus-0.1-docs/EggDBusArraySeq.html
http://people.freedesktop.org/~david/eggdbus-0.1-docs/EggDBusHashMap.html

that I use. While these classes don't yet implement any collection interfaces (such as GSeq or GeeMap) they are designed to be able to do so.

These classes are designed with C programming in mind. In fact, thanks to added syntactic sugar, these classes make it much simpler to deal with edge-cases like handling fixed-size types (ints, floats) even when compared with the classic GLib data types (GPtrArray, GArray, GList, GHashTable etc.).

The test suite shows how simple it is to use

http://cgit.freedesktop.org/~david/eggdbus/tree/src/tests/testclient.c?id=0.1#n3332

http://cgit.freedesktop.org/~david/eggdbus/tree/src/tests/testclient.c?id=0.1#n3888

I'd argue that these types actually are safer/better/more enjoyable to use from C than the corresponding GLib data types (of course YMMV). In addition, things like size / members etc are available in the class instance structure so iterating over an array is as efficient as when using raw C arrays.

So if we had something like collection interfaces in GLib, I would say that the implementations in GLib should look somewhat like EggDBusArraySeq and EggDBusHashMap. We'd also want some more convenience API on the actual collection interfaces; things like GSeq.get_copy() sounds like an useful thing especially for multi-threaded code.

About the iterator thing: I think this can largely be addressed by a simple convention. I think all we need to do is declare that instead of returning the abstract type (GSeq), most API should return a more concrete type such as ArraySeq or ListSeq.

The upside is that C programmers won't have to worry about iterators at all; they can access the ArraySeq or ListSeq using mechanisms like I employ in EggDBusArraySeq.

The downside is that API can't readily change from returning an ArraySeq to returning a ListSeq. So if e.g. GIO were to use this, you'd have

GArraySeq *g_volume_monitor_get_volumes()

instead of just

GSeq *g_volume_monitor_get_volumes()

I think this is probably *fine*, most API won't change like that. And it's the price we have to pay for caring about C programmers. It's a price I think we should be willing to pay.

Of course some API, such as g_file_enumerate_children() will have to return a GSeq and C programmers have to deal with iterators allocated on the heap. But this is not any different from the situation we already have (GFileEnumerator).

Anyway, these are just some thoughts. Hope it's useful.

Matthias: if we did collections like this (caring about C programmers, the convention to avoid iterators for usual things), would it be possible to consider something like this for GLib?

Comment 34 Philip Van Hoof 2008-12-22 17:25:38 UTC

#Comment 32: What about the response(s) to Havoc's mail (x)? As a result of that mailing-list discussion we created a wiki page (y) that explains the proposal further. This bug and the code that got implemented are all responses of the E-mail that you are referring to (x).

#x http://mail.gnome.org/archives/gtk-devel-list/2008-July/msg00161.html
#y http://live.gnome.org/IteratorsAPI

Also note that Havoc's proposal doesn't solve everything for language bindings. It also doesn't help C developer writing libraries that will have APIs that are easily usable in the higher languages. For example Python, JavaScript, Java and .NET all have a collection APIs that are similar to the one being proposed. On the wiki page (z) you can find a complete example how both a binding would look and how application code would that uses it ends up.

#z http://live.gnome.org/IteratorsAPI#head-1ed354babe15e717a1b0d2600933e2d53e302dd1

The problem with Havoc's introspection proposal is that it doesn't allow an application developer to provide *his* collection types to the libraries that work with them. It doesn't make it possible for the application developer to implement a list in *his* language. For a model-view-controller it is not uncommon that the model of your views implement "being a list".

The GObject-introspection proposal of Havoc also isn't incompatible with this. In fact, the two are complementary and one way or another would you have to make a List or Collection API for the introspection proposal too. The Collection API being proposed would be ideal for that purpose.

*feels like repeating the complete discussion, recursiveness *

Comment 35 Jürg Billeter 2008-12-22 20:36:23 UTC

David, if the main issue is that the proposed API currently requires two lines more in C per iteration loop, we should certainly be able to come up with a solution. I'd prefer to not expose a concrete subclass in the API just for C convenience, but there are other possibilites. For example, we could provide a function that takes a collection, creates an iterator, unrefs the collection, and then returns the iterator:

GIterator *g_collection_iterator (GCollection *collection);

That way the API would be quite convenient to iterate in C:

GIterator *children;
children = g_collection_iterator (gtk_container_get_children (container));
while (g_iterator_next (children)) {
        GtkWidget *child = g_iterator_get (children);
        /* operate on child */
}
g_object_unref (children);

Same example with current API:

GList *children, *l;
children = gtk_container_get_children (container);
for (l = children; l != NULL; l = l->next) {
        GtkWidget *child = l->data;
        /* operate on child */
}
g_list_free (children);

In the current libgee API, g_iterator_get returns a strong reference to the child, which would require an additional g_object_unref in the loop. However, we could make the iterator a bit smarter and keep a reference to the current item in the iterator object if necessary, if that makes the API more acceptable.

Comment 36 Wouter Bolsterlee (uws) 2008-12-22 20:46:19 UTC

(In reply to comment #35)
>         GtkWidget *child = g_iterator_get (children);

IMHO g_iterator_current() is a saner name.

Comment 37 Philip Van Hoof 2008-12-22 22:17:51 UTC

Created attachment 125156 [details] [review]
Round one, the convenience API and the removal of the padding

This convenience API was too easy not to do it, so I added it immediately to the proposed patch. As said in Comment #2 don't interfaces need padding, so this patch removes those from the .h files.

Please review and let us know what is wrong about this proposal and patch, and if your comments make sense to us and other people we'll asap adapt it.

Comment #36: I discussed this with Jürg on IRC (feel free to join #vala if you want to join us on those discussions) and although I was okay with calling it "current", like how you proposed, Jürg prefered the method-name "get" better. Claiming that "short is nice". I have no strong feelings about "current" vs. "get" but if you have a good argument for "current", let us know.

By the way, we did all the tests for "g_iterator_get" vs. "g_object_get" to see if nothing (that generates code and tries to detect a getter and setter, or something) clashes, and so far nothing did clash or stopped functioning because we call a method-name "get". That was my own argument for Jürg to go with your proposal to call it "current", by the way.


Here's the implementation of the convenience API, it's implemented in giterator.c in the patch:

/**
 * g_iterable_iterate:
 * @iterable: a #GIterable that will be unreferenced
 *
 * Create an iterator for this iterable. The returned value must be unreferenced
 * after use using @g_object_unref. The @iterable being passed will be unreferenced
 * as a result of this function.
 *
 * Example:
 * <informalexample><programlisting>
 * GIterator *children;
 * children = g_iterable_iterate (egg_container_get_children (container));
 * while (g_iterator_next (children)) {
 *       GtkWidget *child = g_iterator_get (children);
 * }
 * g_object_unref (children);
 * </programlisting></informalexample>
 *
 * Returns: a new iterator and unreferences the passed value
 **/
GIterator*
g_iterable_iterate (GIterable *iterable)
{
  GIterator *iterator = G_ITERABLE_GET_INTERFACE (iterable)->iterator (iterable);
  g_object_unref (iterable);
  return iterator;
}

Comment 38 David Zeuthen (not reading bugmail) 2008-12-22 22:24:40 UTC

(In reply to comment #35)
> David, if the main issue 

(Note that with regards to my criticism of "iterators allocated on the heap", this is just my impression of reading the thread on gtk-devel-list and chatting to Matthias and others. I'm in no way in any position to decide what goes into the GLib stack.)

> is that the proposed API currently requires two lines
> more in C per iteration loop, we should certainly be able to come up with a
> solution. I'd prefer to not expose a concrete subclass in the API just for C
> convenience, but there are other possibilites. For example, we could provide a
> function that takes a collection, creates an iterator, unrefs the collection,
> and then returns the iterator:

I think the problem with this approach is that you still need to free the iterator which is extremely inconvenient as it's not uncommon to break out of a loop. Maybe I wasn't clear but I think that's the main issue with this proposal.

My proposal nicely sidesteps this issue completely simply by saying that most API that already return lists or arrays simply should use a concrete data type provided by GLib (something similar EggDBusArraySeq) rather than casting to the more abstract interface type (something similar to GeeList).

If your response "I'd prefer to not expose a concrete subclass in the API just for C convenience" means you disagree with this proposal, perhaps you can elaborate why?

     David

Comment 39 Jürg Billeter 2008-12-22 22:36:39 UTC

(In reply to comment #38)
> I think the problem with this approach is that you still need to free the
> iterator which is extremely inconvenient as it's not uncommon to break out of a
> loop. Maybe I wasn't clear but I think that's the main issue with this
> proposal.

Yes, you still have to free the iterator, however, you don't have to free the collection anymore. The GTK+ functions returning GLists usually return a newly allocated GList, so you also have to call g_list_free with the current API that is considered convenient in C. You just have to call g_object_unref instead of g_list_free with the collection API. I thought my example shows pretty clearly that my proposal would be more or less equally convenient to use as the current GList based API. Or am I misunderstanding something?

> If your response "I'd prefer to not expose a concrete subclass in the API just
> for C convenience" means you disagree with this proposal, perhaps you can
> elaborate why?

GTK+ 3 will hide many implementation details that are currently exposed and that's certainly a good thing as it allows more flexibility in future changes. We should try to follow this direction and not expose unnecessary implementation details. It will likely not be a problem in many cases, but there are certainly cases where exposing a subclass will hinder later changes.

Comment 40 David Zeuthen (not reading bugmail) 2008-12-22 23:01:35 UTC

(In reply to comment #39)
> (In reply to comment #38)
> > I think the problem with this approach is that you still need to free the
> > iterator which is extremely inconvenient as it's not uncommon to break out of a
> > loop. Maybe I wasn't clear but I think that's the main issue with this
> > proposal.
> 
> Yes, you still have to free the iterator, however, you don't have to free the
> collection anymore. The GTK+ functions returning GLists usually return a newly
> allocated GList, so you also have to call g_list_free with the current API that
> is considered convenient in C. You just have to call g_object_unref instead of
> g_list_free with the collection API. I thought my example shows pretty clearly
> that my proposal would be more or less equally convenient to use as the current
> GList based API. Or am I misunderstanding something?

You're assuming at least two things here

 a. The user owns the collection
    - not always the case; especially not with non-reffing API's

 b. The user wants to free the collection after having iterated over it
    - maybe he stores the collection somewhere

FWIW, I'm being the devil's advocate here; personally I can live with this API (and I concede that a. and b. are typically edge cases). 

(IOW, I'm not the person you need to convince...)

> > If your response "I'd prefer to not expose a concrete subclass in the API just
> > for C convenience" means you disagree with this proposal, perhaps you can
> > elaborate why?
> 
> GTK+ 3 will hide many implementation details that are currently exposed and
> that's certainly a good thing as it allows more flexibility in future changes.
> We should try to follow this direction and not expose unnecessary
> implementation details. It will likely not be a problem in many cases, but
> there are certainly cases where exposing a subclass will hinder later changes.

Sure, things like these are trade-offs and in some cases you may want to deviate from the convention to give yourself enough freedom to change things in the future. But for example, for things like VolumeMonitor, I think it's safe to assume we'd always want to return a ListSeq or ArraySeq. 

(FWIW, I do agree it's cleaner to return a GSeq. It just makes things a bit more complicated in C but maybe your latest changes are enough to convince the GLib maintainers that it's worth it. But these guys are a tough crowd...)

Comment 41 Philip Van Hoof 2008-12-26 15:20:23 UTC

a1. The user doesn't own the collection. For a onion_container_get_children that is a transfer-none (no ownership transfer or callee-owns) that returns a GCollection:

static void
my_type_operate_on_children (MyType *self) {
  GIterator *children;
  children = g_collection_iterator (g_object_ref (onion_container_get_children (container)));
  while (g_iterator_next (children)) {
        GtkWidget *child = g_iterator_get (children);
        /* operate on child */
        if (!child)
             break;
  }
  g_object_unref (children);
}


a2. The user doesn't own the collection. For a onion_container_get_children that is a transfer-all (ownership transfer or caller-owns) that returns a GCollection:

static void
my_type_operate_on_children (MyType *self) {
  GIterator *children;
  children = g_collection_iterator (onion_container_get_children (container));
  while (g_iterator_next (children)) {
        GtkWidget *child = g_iterator_get (children);
        /* operate on child */
        if (!child)
             break;
  }
  g_object_unref (children);
}


b1. The user doesn't want to free the collection, he instead wants to store it somewhere. For a onion_container_get_children that is a transfer-all (ownership transfer or caller-owns) that returns a GCollection:

static void
my_type_operate_on_children (MyType *self) {
  MyTypePriv *priv = MY_TYPE_GET_PRIV (self);
  GIterator *children;

  priv->storage = onion_container_get_children (container);
  children = g_collection_iterator (g_object_ref (priv->storage));
  while (g_iterator_next (children)) {
        GtkWidget *child = g_iterator_get (children);
        /* operate on child */
        if (!child)
             break;
  }
  g_object_unref (children);
}

static void
my_type_finalize (MyType *self) {
  MyTypePriv *priv = MY_TYPE_GET_PRIV (self);
  if (priv->storage)
    g_object_unref (priv->storage);
}


b2. The user doesn't want to free the collection, he instead wants to store it somewhere. For a onion_container_get_children that is a transfer-none (no ownership transfer or callee-owns) that returns a GCollection:

static void
my_type_operate_on_children (MyType *self) {
  MyTypePriv *priv = MY_TYPE_GET_PRIV (self);
  GIterator *children;

  priv->storage = g_object_ref (onion_container_get_children (container));
  children = g_collection_iterator (g_object_ref (priv->storage));
  while (g_iterator_next (children)) {
        GtkWidget *child = g_iterator_get (children);
        /* operate on child */
        if (!child)
             break;
  }
  g_object_unref (children);
}

static void
my_type_finalize (MyType *self) {
  MyTypePriv *priv = MY_TYPE_GET_PRIV (self);
  if (priv->storage)
    g_object_unref (priv->storage);
}


I used examples where I'm breaking the loop, just like you mention in Comment #38. Can you illustrate with an example where the API ain't convenient for the application developer developing in C?

Comment 42 Philip Van Hoof 2008-12-26 15:31:02 UTC

A higher language will of course always use this:

a1. For a onion_container_get_children that is a transfer-all (ownership transfer or caller-owns) that returns a GCollection:

static void
my_type_operate_on_children (MyType *self) {
  GCollection *collection = onion_container_get_children (container);
  GIterator *iterator;

  iterator = g_iterable_iterate (collection);

  while (g_iterator_next (iterator)) {
        GtkWidget *child = g_iterator_get (iterator);
        /* operate on child */
        if (!child)
             break;
  }

  g_object_unref (iterator);
  g_object_unref (collection);

}

a2. For a onion_container_get_children that is a transfer-none (no ownership transfer or callee-owns) that returns a GCollection:

static void
my_type_operate_on_children (MyType *self) {
  GCollection *collection = onion_container_get_children (container);
  GIterator *iterator;

  iterator = g_iterable_iterate (collection);

  while (g_iterator_next (iterator)) {
        GtkWidget *child = g_iterator_get (iterator);
        /* operate on child */
        if (!child)
             break;
  }

  g_object_unref (iterator);
}

I again used examples where I'm breaking the loop, just in case people would think that breaking the loop is problematic in some way (I don't really see how, but okay).

Whether or not g_iterator_get should be caller-owns or callee-owns is something to discuss for the "breaking a loop" case. I'm in favor of having caller-owns only (no matter the breaking-the-loop inconvenience, but Jürg is in disagreement with me on that). If that's what would block this proposal, then I'll go with the callee-owns version easily.

Although I wont agree, not with a single convenience-for-C-please advocate, that it's the right thing to do. Caller-owns is the right thing for g_iterator_get, but callee-owns would indeed be more convenient for the C developer who wants to break his loops. *but anyway* - grmbl, nasty C people -.

Comment 43 Philip Van Hoof 2008-12-26 15:53:43 UTC

Note for the people who want to know why g_iterator_get should be caller-owns: 

If you make it callee-owns then iterators that aren't an iterator for containers that already have the items that you are iterating (you create them during next() or during get()) require g_iterator_next and the destructor to destroy the previously given item.

For example for a callee-owns g_iterator_get:

public class Spit { }

public class SpitSource: Iterable {
 private Spit last_spit = null;

 public void next () {
    if (next_spit)
      delete next_spit;
    next_spit = new Spit ();
 }

 public Object get () {
   return next_spit;
 }  

 public Spitter () {
   next_spit = new Spit();
 }

 public ~Spitter() {
    if (next_spit)
      delete next_spit;  
 }
}

public class Noise { }

public class Mouth {
  public SpitSource SpitSource;

  private void Eject (Spit spit) {
  }

  private void Eject (Noise noise) {
  }

  public Spit (int times) {
    Iterable iter = this.SpitSource;
    int i;
    for (i = 0; i < times && iter.next(); i++) {
       this.Eject (iter.get());
    }
    delete iter;
  }
}

Instead of just using this as SpitSource (which is indeed a caller-owns g_iterator_get):

public class SpitSource: Iterable {
 public void next () {
 }
 public Object get () {
   return new Spit();
 }  
}

Yeh, I didn't come up with a better example than something "that spits". A gun didn't work because the bullets are in the magazine already, so they are not 'created' during iteration by the gun. A machine in a factory creating candy or something could also be used as an example. - hey, it's just an example :p -

Comment 44 Philip Van Hoof 2009-02-03 12:54:33 UTC

A month has passed, what is the decision on this?

Comment 45 Philip Van Hoof 2009-02-12 09:05:08 UTC

News surrounding this bug:

Mailing list discussion:
http://mail.gnome.org/archives/gtk-devel-list/2009-February/msg00039.html
http://mail.gnome.org/archives/gtk-devel-list/2009-February/msg00044.html

Git branch implementing this:
http://git.codethink.co.uk/?p=glib;a=shortlog;h=collections

Comment 46 zarevucky.jiri 2009-06-27 14:58:30 UTC

(In reply to comment #43)
> Note for the people who want to know why g_iterator_get should be caller-owns: 
> 
> If you make it callee-owns then iterators that aren't an iterator for
> containers that already have the items that you are iterating (you create them
> during next() or during get()) require g_iterator_next and the destructor to
> destroy the previously given item.
> 
> ...
> 

What is wrong with caller referencing it himself only if he needs to?

Comment 47 Philip Van Hoof 2009-06-27 15:34:21 UTC

@Comment #46: the example in Comment #43 explains the problem:

You can't easily make this callee-owns (what if making a new instance each get is precisely what you want to do? Then the callee-owns API becomes harder to implement - as you need to clean instances up in finalize() and the next next() call)

public Object get () {
   return new Spit();
}  

Note that a decision was made to let g_iterator_get be callee-owns nonetheless, Comment #46 is just my opinion on why it should be caller-owns.

Comment 48 Allison Karlitskaya (desrt) 2015-03-11 14:10:23 UTC

I think GListModel represents all we're going to be willing to do for a while.