GNOME Bugzilla – Bug 652587
Add an embedded list implementation
Last modified: 2018-05-24 13:11:15 UTC
I think it would be a really useful feature if GLib provided an embedded linked-list implementation in the style popularized by the Linux kernel. For some applications dealing with large lists GList isn't ideal because it requires a separate slice allocation for the list node. The Linux list implementation uses macros to store the list node directly in the struct being listed. This also has the major advantage that removing from a doubly-linked list is O(1) because it can directly get a pointer to the list node given a pointer to the node data. g_list_remove is O(n) because it has to walk the entire list to find the right node. There was some discussion about this in 2007 here: http://mail.gnome.org/archives/gtk-devel-list/2007-November/msg00218.html I guess it's not possible to take the Linux kernel implementation directly because that is licensed under GPL. However I think it should be possible to take the FreeBSD implementation instead which is largely equivalent except it also provides singly-linked lists whereas the Linux version only implements doubly-linked lists. It also doesn't appear to use any GCC-isms or C99-isms. The new list implementation wouldn't be a replacement for the existing lists in GLib because both are useful in different circumstances. The discussion around adding this to glib arose from Cogl bug 652514.
Created attachment 189932 [details] [review] Add a header containing the FreeBSD embedded list implementation This adds gelist.h as a public header which is a copy of the FreeBSD embedded list implementation from here: http://svn.freebsd.org/base/head/sys/sys/queue.h This implements four different types of list which comprises of all combinations of choosing either singly- or doubly- linked and whether to store a pointer to the end of the list in the list header. The header is modified to so that it is all in a G_* namespace. The macros have been renamed from SLIST_*, LIST_*, TAILQ_*, STAILQ_* to G_ESLIST_*, G_ELIST_*, G_ETAILQ_* and G_ESTAILQ_* respectively. The 'E' stands for 'embedded'. The header has also been modified so that it assumes there is a typedef for all of the structures instead of having to call the structures 'struct WhateverType' because this is more common in GLib-style applications. The lists are entirely implemented as macros that can define data to be embedded inside another struct. As opposed to using GList, GSList or GQueue, this avoids a separate g_slice allocation for the list node. It also means that it's possible to get a pointer to the list node given a pointer to the structure which means that removals can be done in O(1). g_list_remove would have to be O(n) because it needs to walk the entire list to find the right node to remove for the given data. A test case is included which just sanity checks most of the operations.
Hmmm. Do you have any performance numbers? Seems like this would be tricky for introspection, but possibly doable.
(In reply to comment #2) > Hmmm. Do you have any performance numbers? I don't have any performance numbers but I think it just makes sense that an application dealing with additions and removals in a very long list should aim for an O(1) algorithm over an O(n) algorithm when possible especially as in this case there is no trade-off with space. I could make a convoluted example with timing if it helps. > Seems like this would be tricky for introspection, but possibly doable. I don't think it would make sense to provide any introspection support for this. Any language binding would have its own list equivalent so there's no need to expose these macros directly. In C the lists can only be used embedded as part of another struct so presumably there would already be API to manipulate that outer struct instead. I don't think any library would want to expose the list manipulation macros here as part of its external API but it instead it would use these lists as an internal optimisation detail.
I don't think it makes sense to add to glib, tbh. GList works great as a general-purpose list; and providing multiple incarnations of essentially the same structure with slightly different performance characteristics is not really glib's mission. If an application has special needs, it is much better off doing its own datatype.
Created attachment 189983 [details] Example showing the timing for gdk_window_raise Here is a real-world example showing the timing for calling gdk_window_raise. The list of children of a GdkWindow is stored in a GList. When a child is raised it is removed from the list with g_list_remove and prepended to the beginning of the list. The list of children is internal to GdkWindow and pretty much every window has a parent so there is no reason not to embed the list node directly in the GdkWindow. The test creates a toplevel window with 4096 children and then raises each one in turn. The time taken to call gdk_window_raise is printed. Using GList the time taken is 5.794615 seconds Using embedded lists the time taken is only 0.004779 seconds.
Created attachment 189984 [details] [review] Half-baked patch to use embedded lists for the children of a GdkWindow This is an incomplete patch to use gelist.h for the list of children of a GdkWindow. The idea is that then it can avoid an O(n) algorithm when unparenting a window. The patch is incomplete because some parts have just been #ifdef'd out. https://bugzilla.gnome.org/show_bug.cgi?id=65287
(In reply to comment #4) > GList works great as a general-purpose list; and providing multiple > incarnations of essentially the same structure with slightly different > performance characteristics is not really glib's mission. The difference between an O(n) algorithm and an O(1) algorithm can be the difference between ‘completely unusable’ and ‘works fine’ in some circumstances so I wouldn't say it's ‘slightly different’. It's the same as saying you could use a GList instead of a GHashTable and just iterate the list every time you want to find a key! There is already a precedent for this in GLib because GSList is effectively the same thing as a GList but with slightly different performance characteristics. > If an application has special needs, it is much better off doing its own > datatype. I think the macros are general purpose enough to useful for a wide range of applications. It's quite fiddly to implement linked lists manually and it's very easy to make a mistake with updating the pointers which can cause confusing buried bugs. It seems like a good idea to avoid having lots of linked-list implementations spread throughout Gnome.
Without having looked at your embedded list implementation, I'll state that if its O(1) its not a linked list. A linked list is pretty much by definition O(n)...
(In reply to comment #8) > Without having looked at your embedded list implementation, I'll state that if > its O(1) its not a linked list. A linked list is pretty much by definition > O(n)... It most definitely is a linked list. Please take a look at the implementation. Note that it's not 'my' implementation - I stole it from the FreeBSD kernel. I don't think it makes sense to say a data type is O(n) but I'm mostly just talking about the algorithm for removing a node (although there are other operations that benefit too). With g_list_remove it is O(n) because it needs to walk the list to find the node that corresponds to the data pointer. If you already know the node you could use g_list_remove_link but most uses of GList don't know the node so they can't use this. With an embedded list implementation you can always work out where the node is for free given a pointer to the data so it is always O(1) to remove a node.
(In reply to comment #3) > (In reply to comment #2) > > Hmmm. Do you have any performance numbers? > > I don't have any performance numbers but I think it just makes sense that an > application dealing with additions and removals in a very long list should aim > for an O(1) algorithm over an O(n) algorithm when possible especially as in > this case there is no trade-off with space. I could make a convoluted example > with timing if it helps. I'm not so interested in artifical benchmarks; from reading your linked cogl bug ( bug 652514 ), I can totally sympathize with the statement "so when a pipeline is unparented we have to walk the entire list of children to find the node." But how often are pipelines unparented? Did it show up in any performance measurements? We should probably gather whether there are any other potential users too. And have some sort of answer for "when should I use GEList over G[S]List?". > > Seems like this would be tricky for introspection, but possibly doable. > > I don't think it would make sense to provide any introspection support for > this. I meant if it was used in a public API; like as a return value from a function such as IntSList foo_get_all_ids() or something. But maybe we just tell people "don't do that".
(In reply to comment #10) > But how often are pipelines unparented? Did it show up in any performance > measurements? To test the Cogl example I tried instrumenting cogl_pipeline_layer_unparent so that it would print the number of children in the parent every time. I then ran it against Media Explorer and it flooded the output by creating and unparenting many many pipelines per frame. (Ideally it shouldn't be doing this, but that's another issue). The average number of children in the parent was 24 and the maximum was 475. It's hard to get precise figures for the performance difference here but it intuitively seems that repeatedly walking a list that is 475 entries long should be avoided. In this particular case there is no cost at all to using an embedded list apart from arguably making the code look awkward. The reason the discussion started in the first place was because Owen Taylor found a bug in Cogl when used with Mutter where we would accidentally create a material with 1000s of children every frame and then try to remove them in the wrong order. With an O(n) removal this is never going to finish in a reasonable time but with O(1) it might still work. > We should probably gather whether there are any other potential users too. And > have some sort of answer for "when should I use GEList over G[S]List?". Yes, ideally we'd need to make some complete documentation. There is quite a good man page for the BSD implementation so we could use that as a starting point. > I meant if it was used in a public API; like as a return value from a function > such as IntSList foo_get_all_ids() or something. But maybe we just tell people > "don't do that". Yes, I think we should just say “don't do that”.
I'm all for cogl using datastructures that work best for its use. I'm still not convinced that a macro-only implementation of 4 more list variants is a good fit for GLib next to the existing GLib data structures.
As a compromise I think an alternative idea could be to encourage people to embed GS?List nodes directly in their structures where appropriate. Of course the downside of this is that the data pointer in the node is then redundant. Also you wouldn't get the convenience of being able to pass pointers to your structure rather than the list node to the list API and it's a bit fiddly to cast a list node pointer back to your structure node. To make this work better we could add the following missing API to GList: GList * g_list_append_link (GList *list, GList *link); GList * g_list_prepend_link (GList *list, GList *link); GList * g_list_insert_link (GList *list, GList *link, gint position); GList * g_list_insert_link_sorted (GList *list, GList *link, GCompareFunc func); GList * g_list_insert_link_sorted_with_data (GList *list, GList *link, GCompareDataFunc func, gpointer user_data); GList * g_list_insert_link_before (GList *list, GList *sibling, GList *link); These would be equivalent to the non-link variants but they would use an existing node instead of allocating a new one.
I think that would be very confusing. Currently, list notes are slice-allocated by glib; with this approach, the application would allocate the nodes, and you have to be extremely careful not to mix apis that allocate/free any list nodes on such a list.
(In reply to comment #0) > I guess it's not possible to take the Linux kernel implementation directly > because that is licensed under GPL. However I think it should be possible to > take the FreeBSD implementation instead which is largely equivalent except it > also provides singly-linked lists whereas the Linux version only implements > doubly-linked lists. It also doesn't appear to use any GCC-isms or C99-isms. It is possible to take the Linux implementation, it's in ccan, and there at least it's LGPL 2.1 or later. Here it is in the ccan git repo http://git.ozlabs.org/?p=ccan;a=tree;f=ccan/list (The _info file is where it says "2.1 or later")
If an embedded list is in CCAN, then maybe the answer is simply that projects can use [1] it from there? That way we don't have to carry it in GLib [1] : the way CCAN works, IIRC, is that it's libegg-style - e.g. you just copy-paste. Extra credit for having a script that automatically pulls a new copy from CCAN
The only issue is that ccan stuff has internal discrepancies on other bits of ccan, I think there's some tools in the ccan repo for pulling in the discrepancies of what you want though, so that's not much a problem (for example the ccan/list.h needs at macros from at least three other ccan packages). However a more annoying issue is that ccan stuff expects there to be a "config.h" but ccan is not autoconf based, and in fact it's config.h has definitions in it that have the same purpose as an existing autoconf equivalent, but they are different. This makes it REALLY awkward to use ccan stuff in an autotools built project. Also the utility that generates the ccan config.h is written in C and I think it does some execution test, so it's not terribly friendly to cross compiling. Basically if you're writing an app that uses glib in C and you use autotools you have to either do some really awkward mucking about with the config.h stuff in your configure.ac so that the ccan packages get all the macros they want, or you do a bunch of work to make the packages use macros from autoconf tests or glib.h (as appropriate).
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/417.