GNOME Bugzilla – Bug 553794
Support for light-weight objects
Last modified: 2009-02-10 14:50:02 UTC
I'll attach the code of a simple app I wrote to get an estimate of time it takes to creates 1Mi instances of different kinds of classes available in Vala. Most of them translate pretty much directly to C. Here are the results of a single run (it's pretty much the same results on my machine each time i run it) of that app excluding the first two lines that are irreleven here: $ ./test-perf ... 0.329435 seconds taken in creating 1048576 instances. 0.290740 seconds taken in creating 1048576 instances (GstMiniObject). 0.813593 seconds taken in creating 1048576 instances (GObject). $ As you can see, it takes almost thrice as much time to create 1Mi instances of GObject-deriving classes as it does to create same number of instances of the non-gobject deriving classes. The non-gobject deriving classes in Vala are very similar to GstMiniObject so that is the reason the speed of creation is very similar. Please note that I am running the app on my laptop (800 MHz CPU + 2GB RAM) so the different might be in terms of seconds on embedded systems. In most of the cases, one doesn't need to create millions or even thousands of instances within seconds but there are times when you do. There, you usually want light-weight objects that only support inheritance and ref counting but not signals and properties. E.g Gstreamer has such needs for some of it's core objects and that is the reason they invented GstMiniObject and in GUPnP AV we would have used such a class(es) to represent media objects if it had been available in glib rather than providing wrapper functions to deal with xmlNode etc. I originally only intended to propose the move of GstMiniObject to gobject library as "GMiniObject" but ebassi had something even better: "a possible solution would be adding a GBaseObject that just has refcounting and then rebase GObject on top of it - for GObject 3.0." This will break API pretty hard but I think in this case, it is worth it and we are talking of GObject 3.0 anyways.
Created attachment 119376 [details] Test app to estimate time it takes to create instances of diff kinds of classes
FWIW we can use that in Pango too.
Sounds like something we could use in gEDA for a plethora of CAD objects.
For those who don't know. MiniObject is a GObject without signals and properties. If I recall right I does not support interfaces either. Would be interesting to see whats highest in a oprofile run of the testcode for GObject.
It seems like a foundational question is "what is it that's making GObject slow, and can that thing simply be fixed, especially if breaking ABI is allowed in 3.0?"
(In reply to comment #5) > It seems like a foundational question is "what is it that's making GObject > slow, and can that thing simply be fixed, The oprofile results of a modified version of the test app (that only does gobjects) looks like this: $ opreport -l /usr/lib/libgobject-2.0.so |head -n20 CPU: PIII, speed 800 MHz (estimated) Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 7 14.0000 g_signal_emit_valist 4 8.0000 g_type_check_instance_is_a 3 6.0000 g_type_value_table_peek 3 6.0000 signal_emit_unlocked_R 2 4.0000 g_closure_invoke 2 4.0000 g_type_check_instance_cast 2 4.0000 g_type_check_is_value_type 2 4.0000 g_type_check_value 2 4.0000 g_type_is_a 2 4.0000 g_value_unset 1 2.0000 __i686.get_pc_thunk.bx 1 2.0000 boxed_proxy_collect_value 1 2.0000 boxed_proxy_value_init 1 2.0000 g_closure_unref 1 2.0000 g_object_add_weak_pointer 1 2.0000 g_object_freeze_notify 1 2.0000 g_object_get_valist Hmm.. I might want to disable type checks.
Also note that I am not using gobject props in the OClass in the test app. If I do that, the creation of same objects takes a LOT (>4 seconds) more.
I wonder where g_signal_emit_valist() is called from. I was thinking first about the g_object_notify() from the g_object_set_parameter() in g_object_newv(), but if you don't have params it can't be that. Besides I was wondering, if the g_object_notify() shouldn't be skipped totally when initializing a object.
It's likely the thawing of the notify queue at the end of the constructor that's causing the signal emission. Plus, an oprofile with roughly 50 data points is not very representative.
(In reply to comment #9) > It's likely the thawing of the notify queue at the end of the constructor > that's causing the signal emission. except that he claims to be using no gobject properties, so there shouldn't be any notifications, right? The test program doesn't compile with the version of vala currently shipping in Fedora 9, but after fiddling with it a bit, I got it to run, and I get similar timings, but g_signal_emit_valist() never gets called. So I don't know what's up with that profile.
(In reply to comment #10) > The test program doesn't compile with the version of vala currently shipping in > Fedora 9, I think I should keep vala trunk to my jhbuild env only. :) > but after fiddling with it a bit, I got it to run, and I get similar > timings, but g_signal_emit_valist() never gets called. So I don't know what's > up with that profile. Hmm.. It could be that oprofile is broken on my machine (or worse, distro) for some reason so it would be very nice if you can provide the correct one.
Created attachment 119470 [details] Generated C code of test app I've attached the generated C code of the test app so people can test it without installing Vala.
Just trying to get the picture... Is what you are proposing adding an intermediate class in between GTypeInstance and GObject adding ref-counting to the GTypeInstance? This in fact does seem to be exactly what the _Class struct in Jürg's attachment to comment 12 does. I've been thinking along those lines for Xesam GLib as well. I need some light ref-counted objects to build queries in a DOM-like manner, and I am not really keen on using full GObjects for each node.
(In reply to comment #13) > Just trying to get the picture... > > Is what you are proposing adding an intermediate class in between GTypeInstance > and GObject adding ref-counting to the GTypeInstance? Yes! > This in fact does seem to be exactly what the _Class struct in Jürg's > attachment to comment 12 does. Yeah that is the translation of the Vala's non-gobject-deriving class I mentioned, to C. > I've been thinking along those lines for Xesam GLib as well. I need some light > ref-counted objects to build queries in a DOM-like manner, and I am not really > keen on using full GObjects for each node. Good to know. The more people show interest in it, the greater the possibility of it actually happening. :)
It still seems like if you create a GObject that has no properties or signals, there's no reason it should be slower than a MiniObject type that has no support for those things. Surely "fix GObject" is the default action, until proven impossible? There's still no complete analysis here of why GObject is slow.
Created attachment 119619 [details] Simple comparison in C Here is a simpler version of the test for people like me who are scared of Vala. Running the program on my computer gives: 10000000 GObjects in 14.142550 seconds = 707086.062980 objects per second 10000000 GstMiniObjects in 4.561623 seconds = 2192202.205224 objects per second I think the OProfile report from comment #6 might be broken. If I run either of the tests through GDB, g_signal_emit_valist doesn't get called at all. I tried running OProfile myself with the new test as well. The report from just the GObject version is here: http://pastebin.com/f7231c8de and just the GstMiniObject version is here: http://pastebin.com/f3da2aa50 Comparing the two outputs, the suspicious entries seem to be g_data_set_internal and g_hash_table_lookup_node. These seem to be caused by changing the qdata when freezing and thawing notifications during init as well as freeing the list of signal handlers during dispose. I tried bodging gobject.c in the following ways: - Put this at the top to disable freezing notifications: #define g_object_notify_queue_thaw(a, b) ((void *) 0) #define g_object_notify_queue_freeze(a, b) ((void *) 0) - Put a 'return' in g_object_dispose to skip out the three function calls which just destroy the signal handlers, the closure array and the weak refs. - Comment out the three similar calls in g_object_unref Then running the program again I get: 10000000 GObjects in 6.891258 seconds = 1451113.860488 objects per second 10000000 GstMiniObjects in 4.570091 seconds = 2188140.236157 objects per second which means the GObjects would be twice as fast.
I looked into this previously too. The bottleneck is looking up in the global paramspecpool to find the paramspec for the signal. Those kind of things. I'm sure it can be improved by crafting some custom data structures.
Neil, really helpful info. Looks like it's a time-space tradeoff where GObject uses a data list (or the hash table in gsignal.c) to avoid making the GObject struct larger. However, it's slow. Maybe some of the padding in GObjectClass could be used to flag whether the class has any properties at all. Toggle the flag when the first property is installed. If no properties are ever installed, then bypass the whole notification queue machinery. Theoretically if GObject itself has properties this breaks, but right now GObject doesn't have properties. The stuff in g_object_real_dispose() is harder to deal with because you'd need a per-instance flag to know whether to skip it, and there's no padding in the instance, only in the class. Given an ABI break, or some type of wacky hacks, it might be possible to add flags for HAS_ANY_SIGNAL_HANDLERS, HAS_ANY_WEAK_REFS, HAS_ANY_CLOSURE_ARRAY - don't know. I wonder if an extra 32 bits for flags in GObject would be worth it, if all "unnecessary" hash table lookups could be bypassed. Then again, the existing class pointer, refcount, GData*, and then a flags might already make an object too large to be "mini" so maybe the MiniObject really is the simplest solution. Another concept could be to track the HAS_ANY_* flags by _class_, so if for example you connect to a signal on any instance of a class, all instances of that class forevermore will have signal-hash lookup overhead. But until you use a signal, the flag would be unset and signal-hash lookups bypassed. Similarly, you could say that as soon as any instance of a class has a 'notify' connection they all get notify queue overhead, but until then, none of them do, etc. This makes each class automatically match its usage and only have the overhead it needs. But yeah it's sort of hacky. Just brainstorming...
Havoc, we already use the two lower bits of the data list pointer as flags, for toggle ref, and for floating flag. Ryan pointed out to me before that since gslice allocates multiples of 8, there's another bit there to harvest... The ref count also can be totally rehauled and used, as I don't think GObject declared that public. Ryan already has a proposal to use a bit in data pointer or the ref count to do per-object locking. Note that any mini object at least needs a class pointer and a ref count. So we are talking about 2 vs 3 pointers. Don't think it makes much sense to have a mini object if we make GObject fast in the first place.
This bug seems to be wandering between two rather different things: #1 The amount of memory a GObject uses #2 The amount of time it takes to allocate a GObject Which is the optimization goal? What problems are real-world programs having? The original bug seemed to mostly be about #2. Personally I think if you're having problems with #1, you're probably doing something wrong in your app, like having a treeview with 100,000 rows. Just don't do things like that - use SQLite and search and only display a subset of the rows, etc.
Colin, The original report was about #2. Havoc suggested that we may be able to solve #2 in GObject proper and not need a GMiniObject, unless #1 is also a problem. I suggested that #1 can't be a problem as GMiniObject can't do substantially better either. The corollary is that if #2 can be fixed, the bug is fixed.
Improvement: from 1.59s to 1.36s. Compilation flags: -ggdb, -O0 I don't consider this to be a great improvement, tbf Code (note the DOPROPS ifdef): gpointer g_object_newv (GType object_type, guint n_parameters, GParameter *parameters) { GObjectConstructParam *cparams, *oparams; GObjectNotifyQueue *nqueue = NULL; /* shouldn't be initialized, just to silence compiler */ GObject *object; GObjectClass *class, *unref_class = NULL; GSList *slist; guint n_total_cparams = 0, n_cparams = 0, n_oparams = 0, n_cvalues; GValue *cvalues; GList *clist = NULL; gboolean newly_constructed; guint i; g_return_val_if_fail (G_TYPE_IS_OBJECT (object_type), NULL); class = g_type_class_peek_static (object_type); if (!class) class = unref_class = g_type_class_ref (object_type); #ifdef DOPROPS for (slist = class->construct_properties; slist; slist = slist->next) { clist = g_list_prepend (clist, slist->data); n_total_cparams += 1; } /* collect parameters, sort into construction and normal ones */ oparams = g_new (GObjectConstructParam, n_parameters); cparams = g_new (GObjectConstructParam, n_total_cparams); for (i = 0; i < n_parameters; i++) { GValue *value = ¶meters[i].value; GParamSpec *pspec = g_param_spec_pool_lookup (pspec_pool, parameters[i].name, object_type, TRUE); if (!pspec) { g_warning ("%s: object class `%s' has no property named `%s'", G_STRFUNC, g_type_name (object_type), parameters[i].name); continue; } if (!(pspec->flags & G_PARAM_WRITABLE)) { g_warning ("%s: property `%s' of object class `%s' is not writable", G_STRFUNC, pspec->name, g_type_name (object_type)); continue; } if (pspec->flags & (G_PARAM_CONSTRUCT | G_PARAM_CONSTRUCT_ONLY)) { GList *list = g_list_find (clist, pspec); if (!list) { g_warning ("%s: construct property \"%s\" for object `%s' can't be set twice", G_STRFUNC, pspec->name, g_type_name (object_type)); continue; } cparams[n_cparams].pspec = pspec; cparams[n_cparams].value = value; n_cparams++; if (!list->prev) clist = list->next; else list->prev->next = list->next; if (list->next) list->next->prev = list->prev; g_list_free_1 (list); } else { oparams[n_oparams].pspec = pspec; oparams[n_oparams].value = value; n_oparams++; } } /* set remaining construction properties to default values */ n_cvalues = n_total_cparams - n_cparams; cvalues = g_new (GValue, n_cvalues); while (clist) { GList *tmp = clist->next; GParamSpec *pspec = clist->data; GValue *value = cvalues + n_total_cparams - n_cparams - 1; value->g_type = 0; g_value_init (value, G_PARAM_SPEC_VALUE_TYPE (pspec)); g_param_value_set_default (pspec, value); cparams[n_cparams].pspec = pspec; cparams[n_cparams].value = value; n_cparams++; g_list_free_1 (clist); clist = tmp; } #endif /* construct object from construction parameters */ object = class->constructor (object_type, n_total_cparams, cparams); #ifdef DOPROPS /* free construction values */ g_free (cparams); while (n_cvalues--) g_value_unset (cvalues + n_cvalues); g_free (cvalues); #endif /* adjust freeze_count according to g_object_init() and remaining properties */ G_LOCK (construction_mutex); newly_constructed = slist_maybe_remove (&construction_objects, object); G_UNLOCK (construction_mutex); #ifdef DOPROPS if (newly_constructed || n_oparams) nqueue = g_object_notify_queue_freeze (object, &property_notify_context); if (newly_constructed) g_object_notify_queue_thaw (object, nqueue); #endif /* run 'constructed' handler if there is one */ if (newly_constructed && class->constructed) class->constructed (object); #ifdef DOPROPS /* set remaining properties */ for (i = 0; i < n_oparams; i++) object_set_property (object, oparams[i].pspec, oparams[i].value, nqueue); g_free (oparams); /* release our own freeze count and handle notifications */ if (newly_constructed || n_oparams) g_object_notify_queue_thaw (object, nqueue); #endif if (unref_class) g_type_class_unref (unref_class); return object; }
I've been doing some callgrinding, and I noticed a lot of memset() happening with -ggdb -O0 (note that with better compilation flags, the issue might be less bad. I don't know how g_slice_alloc0 behaves with proper compilation optimization flags, don't know if anything differs). I followed the code a little bit and I didn't find my g_slice_alloc0 is used over g_slice_alloc. But perhaps can the original developer elaborate? In case the reason is found, I think just setting that field of the struct to NULL will be better than implicitly calling for memset by using g_slice_alloc0 instead of g_slice_alloc. I also found a lot of similar g_malloc0 calls. All of those could be replaced with g_malloc instead (the field that is the reason for the set-to-0 allocation must be manually set to 0, of course). pvanhoof@tinc:~/repos/gnome/glib$ svn diff gobject/gtype.c Index: gobject/gtype.c =================================================================== --- gobject/gtype.c (revision 7570) +++ gobject/gtype.c (working copy) @@ -1651,7 +1651,7 @@ class = g_type_class_ref (type); total_size = type_total_instance_size_I (node); - instance = g_slice_alloc0 (total_size); + instance = g_slice_alloc (total_size); if (node->data->instance.private_size) instance_real_class_set (instance, class); pvanhoof@tinc:~/repos/gnome/glib$ Without this patch: pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.302047 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.283837 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.304417 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.255168 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.272710 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.319576 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.264704 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.364176 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.262055 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.281513 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.254322 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ With this patch: pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.275148 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.250664 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.262090 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.268409 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.197464 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.220995 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.229558 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.269562 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.203025 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.253002 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.217687 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.207490 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 1.226927 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$
gmrbl. s/I didn't find my g_slice_alloc0/I didn't find why g_slice_alloc0
Created attachment 120036 [details] [review] Some tests in the format of ifdefs disabling the properties code of gobject This patch shows what I so far changed in gobject.c and gtype.c. If you replace the "#ifdef NOPROPS/#endif" with a "if (G_UNLIKELY (class->flags & NOPROPS)) {}" you might have something that looks like hp's proposal already. Here are some timings with the patch applied: pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.618516 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.599759 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.583069 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.598884 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.596710 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.593085 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.582992 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.617058 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$
Patch from #25 with g_slice_alloc0 gtype.c: pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.715261 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.696097 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.733293 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.709191 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.689464 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.729449 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.730101 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.707337 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.734465 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.686283 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.764338 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.729503 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.699860 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ Patch from #25 with g_slice_alloc instead of g_slice_alloc0 gtype.c: pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.632869 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.677825 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.686501 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.641746 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.644655 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.654203 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.718963 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.638586 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.650788 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.697349 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.659109 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.671999 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.620188 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.668381 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$
pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 0.387407 seconds taken in creating 1048576 instances (GstMiniObject). 0.520342 seconds taken in creating 1048576 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ yw
Functions that are still "doing things" (not saying they are replaceable): g_type_class_peek_static: -> static-rec-lock 30% -> static-rec-unlock 30% -> lookup_type_node_I 30% g_object_init: -> g_datalist_init 20% (gobject qdata, not all types require support for this) g_type_fundamental: -> lookup_type_node_I 40% g_object_newv: -> caller for g_type_class_peek_static -> slist_maybe_remove I think if we make both the properties and the qdata a feature that only gets enabled as soon as the type's class requires it, GObject should be almost as fast as GstMiniObject (with just the properties's code disabled, it's already almost as fast - 0.38s vs. 0.52s, both code paths sharing the same gtype.c and LD_LIBRARY_PATH -).
(In reply to comment #23) > --- gobject/gtype.c (revision 7570) > +++ gobject/gtype.c (working copy) > @@ -1651,7 +1651,7 @@ > class = g_type_class_ref (type); > total_size = type_total_instance_size_I (node); > > - instance = g_slice_alloc0 (total_size); > + instance = g_slice_alloc (total_size); > > if (node->data->instance.private_size) > instance_real_class_set (instance, class); That's a huge ABI break though. Currently it's guaranteed that all fields in all GObject-derived types are initialized to 0, and using g_slice_alloc instead would break that. You *might* be able to shave off a few cycles safely by doing: instance = g_slice_alloc (total_size); memset (instance + sizeof (GObject), 0, total_size - sizeof (GObject)); and then making sure that GObject properly initializes all of its own fields. (Or maybe have a generic opt-out-of-0-initialization class flag and then only initialize the parts of the struct that need it.)
Created attachment 120046 [details] [review] Not working patch, don't try if you expect things to work The tests/gobject are working, but most applications aren't ;), I'm going to give up for now and post the patch I have so far. Perhaps somebody can pick up from here? I will of course continue trying tomorrow or the day after. pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./accumulator pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./defaultiface pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./deftype pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./dynamictype pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./gvalue-test pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./ifacecheck pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./ifacein bash: ./ifacein: No such file or directory pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./ifaceinherit pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./ifaceinit pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./ifaceproperties pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./override pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./paramspec-test pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./references pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$ ./singleton GLib-GObject-Message: stale GObjects: 0 pvanhoof@tinc:~/repos/gnome/glib/tests/gobject$
ps. While trying out things I noticed that g_type_class_ref is relatively slow too. That's why the match uses G_OBJECT_GET_CLASS(object) instead of class = g_type_class_ref(type) + g_type_class_unref(class)
Note that the GObjectClass's struct size with this experimental patch is not x64 safe anymore. That's because sizeof(void*) != sizeof(guint32) everywhere. So that flags field could become a gsize instead, perhaps? Just pointing out. ps. Why didn't glib use "guint32 dummy[7]" instead of "gpointer dummy[7]"? I'm guessing padding, but this of course makes adding fields a bit more difficult each time.
Note that the GObjectClass's struct size with this experimental patch is not x64 safe anymore. That's because sizeof(void*) != sizeof(guint32) everywhere. So that flags field could become a gsize instead, perhaps? Just pointing out.
Created attachment 120053 [details] [review] Slightly better, still not working
Created attachment 120071 [details] [review] Working patch I've fixed the bug that prevented most applications from working (G_OBJECT_GET_CLASS doesn't work in g_object_init, see comments in g_type_create_instance) and switched to gsize instead of guint32. Without patch: 4.066497 seconds taken in creating 10000000 instances (GObject). With patch: 1.931577 seconds taken in creating 10000000 instances (GObject). GstMiniObject: 0.938353 seconds taken in creating 10000000 instances (GstMiniObject).
Wouldn't it be better if people shared their results in number of instances per second ? It would avoid people choosing random numbers of instances just to adapt the test to the speed of their computers...
@Jurg: nice! thanks for taking a look at spotting the problem. We are still at a second more than GstMiniObject. I'm guessing this will be mostly related to g_datalist_init.
Created attachment 120093 [details] Makes a inline g_type_class_peek_static Here's a bit of crack (really not for serious use, really an experiment) that makes a static inline g_type_class_peek_static_i out of g_type_class_peek_static and share it with gtype.c. It shaves of one ~ < 0.1s from my test with Jürg's fix of my patch.
Hey Jurg, if we move the construction_objects = g_slist_prepend (construction_objects, object) Into our "if the class has properties", then I almost bit the GstMiniObject border: pvanhoof@tinc:~/repos/gnome/glib$ ./a.out 3.374144 seconds taken in creating 10000000 instances (GstMiniObject). 3.827715 seconds taken in creating 10000000 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ Regretfully does this invalidate the determining of "newly_constructed" in g_object_newv (as you pointed out on IRC). If we find another solution for that, and cut-out the GSList usage, then I think we have our patch. Any ideas? --- gobject/gobject.c (revision 7574) +++ gobject/gobject.c (working copy) @@ -678,11 +678,14 @@ } static void -g_object_init (GObject *object) +g_object_init (GObject *object, + GObjectClass *class) { object->ref_count = 1; g_datalist_init (&object->qdata); + if (CLASS_HAS_PROPS (class)) + { /* freeze object's notification queue, g_object_newv() preserves pairedness */ g_object_notify_queue_freeze (object, &property_notify_context); /* enter construction list for notify_queue_thaw() and to allow construct-only properties */ @@ -690,6 +693,9 @@ construction_objects = g_slist_prepend (construction_objects, object); G_UNLOCK (construction_mutex); + } + + #ifdef G_ENABLE_DEBUG IF_DEBUG (OBJECTS) {
Note that the above copy/paste of the diff must be applied above Jürg's patch. And s/bit/hit too.
Going to attach and comment a bunch of stuff related to these results: pvanhoof@tinc:~/repos/gnome/glib$ export LD_LIBRARY_PATH=/opt/glib-perform/lib/ pvanhoof@tinc:~/repos/gnome/glib$ ./test-perf Don't count these: 2.225854 seconds taken in creating 10000000 instances (GstMiniObject). 2.583421 seconds taken in creating 10000000 instances (GObject). Don't count these: 1.505127 seconds taken in creating 10000000 instances (GstMiniObject). 2.583628 seconds taken in creating 10000000 instances (GObject). Useful results: 1.507492 seconds taken in creating 10000000 instances (GstMiniObject). 2.599355 seconds taken in creating 10000000 instances (GObject). Useful results: 1.514446 seconds taken in creating 10000000 instances (GstMiniObject). 2.609111 seconds taken in creating 10000000 instances (GObject). Useful results: 1.503256 seconds taken in creating 10000000 instances (GstMiniObject). 2.582921 seconds taken in creating 10000000 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$ export LD_LIBRARY_PATH= pvanhoof@tinc:~/repos/gnome/glib$ ./test-perf Don't count these: 2.180473 seconds taken in creating 10000000 instances (GstMiniObject). 6.583364 seconds taken in creating 10000000 instances (GObject). Don't count these: 1.453439 seconds taken in creating 10000000 instances (GstMiniObject). 6.608013 seconds taken in creating 10000000 instances (GObject). Useful results: 1.454491 seconds taken in creating 10000000 instances (GstMiniObject). 6.597693 seconds taken in creating 10000000 instances (GObject). Useful results: 1.459345 seconds taken in creating 10000000 instances (GstMiniObject). 6.571601 seconds taken in creating 10000000 instances (GObject). Useful results: 1.471676 seconds taken in creating 10000000 instances (GstMiniObject). 6.613062 seconds taken in creating 10000000 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$
Created attachment 120283 [details] [review] Patch that causes this improvement This patch has proper indentation and everything. It's the one causing the performance improvement. I tested this patch with the Vala compiler valac which excessively uses GObjects. I also ran a few typical GNOME applications with it. But that's it that I did for testing.
Created attachment 120285 [details] Original test application /opt/vala/bin/valac test-perf.vala --pkg=gstreamer-0.10
Ok so what got changed: We (Jürg and me, but especially me, gne gne gne) noticed a g_slist_prepend and a GSList construction_objects GSList being used solely to detect whether or not an instance is being constructed. For large amounts of instances GSList will indeed use the g_slice_alloc magazines, but it'll still be relatively slow. Perhaps I misunderstood the real purpose of this GSList of course, but what I have done in the patch is to replace that with a gboolean in the qdata of the instance and only set it in case the class has properties. I noticed that the GSList was only being read in case of property related methods are being called. If the class has no properties, then those methods return prematurely anyway. Jürg was discussing with me that perhaps if we'd find a bit somewhere in the instance's struct, then we could avoid adding the four bytes for the gboolean to the qdata. Perhaps we could make it atomic too, and that way get rid of the Mutex? I don't think it would further improve performance for property-less class instances, but it might improve performance and memory consumption (per instance) for property-having class instances. We think the remaining performance related issue, if we truly want to beat GstMiniObject, which we do, is the qdata. So if some genius appears who'll invent a better solution for this ... he might just beat GstMiniObject's performance. For our purposes is this performance (near GstMiniObject's performance, but not .. yet arg arg arg) good enough. BUT, Jürg and me both have a flight of > 8 hours to Boston tomorrow. You never know what will happen while we are bored and locked up in a plane with a laptop that has an extra battery pack.
Philip, there is one more bit available per instance, first come first serve. See my comment 19.
For in_construction checking, you can do something like setting reference_count to 0x80000000 and check for refcount >= 0x80000000. No locking, no glist.
Created attachment 120362 [details] [review] Some fixes This is not yet using the bit that Behdad discovered in comment #19, just some fixes. Note that I tried using g_datalist_set_flags/unset_flags but that this was quite a bit slower than storing and removing a full boolean due to the atomic integer stuff.
Created attachment 120363 [details] [review] Oeps this is the right one I think So, the version with g_datalist_id_get_data and g_datalist_id_remove_no_notify
Created attachment 120366 [details] [review] And a version with flags And a version that uses g_datalist_unset_flags and get_flags. This is slower than the one above that uses get_data and remove_no_notify. I also have callgrind .out's showing it clearly why the one with flags is slower than the one with a normal bool.
Some pointers: http://pvanhoof.be/files/unsetflags.png http://pvanhoof.be/files/bool.png http://pvanhoof.be/files/callgrind.bool.out http://pvanhoof.be/files/callgrind.unset.out Use the callgrind.*.out files with kcachegrind
Created attachment 120515 [details] [review] Using the ref_count field, as proposed by Behdad This one is also faster. pvanhoof@tinc:~/repos/gnome/glib$ ./test-perf Don't count these: 2.180973 seconds taken in creating 10000000 instances (GstMiniObject). 2.420649 seconds taken in creating 10000000 instances (GObject). Don't count these: 1.481022 seconds taken in creating 10000000 instances (GstMiniObject). 2.389999 seconds taken in creating 10000000 instances (GObject). Useful results: 1.482898 seconds taken in creating 10000000 instances (GstMiniObject). 2.377750 seconds taken in creating 10000000 instances (GObject). Useful results: 1.515348 seconds taken in creating 10000000 instances (GstMiniObject). 2.392910 seconds taken in creating 10000000 instances (GObject). Useful results: 1.455294 seconds taken in creating 10000000 instances (GstMiniObject). 2.353915 seconds taken in creating 10000000 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$
Created attachment 120517 [details] [review] This one decreases the amounts of lock/unlock in gtype.c This is related to Attachment #120515 [details], if you apply this patch on gtype.c you improve the performance a little bit more (because this way you do less lock/unlocks in g_type_class_ref in gtype.c: (The actual amounts of seconds are higher than above, but this small difference fluctuated depending on the load of my computer. What matters more is that the difference between GstMiniObject and GObject is consistently smaller). pvanhoof@tinc:~/repos/gnome/glib$ ./test-perf Don't count these: 2.374864 seconds taken in creating 10000000 instances (GstMiniObject). 2.486572 seconds taken in creating 10000000 instances (GObject). Don't count these: 1.704863 seconds taken in creating 10000000 instances (GstMiniObject). 2.464646 seconds taken in creating 10000000 instances (GObject). Useful results: 1.717197 seconds taken in creating 10000000 instances (GstMiniObject). 2.469642 seconds taken in creating 10000000 instances (GObject). Useful results: 1.708875 seconds taken in creating 10000000 instances (GstMiniObject). 2.480056 seconds taken in creating 10000000 instances (GObject). Useful results: 1.717147 seconds taken in creating 10000000 instances (GstMiniObject). 2.472500 seconds taken in creating 10000000 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$
Created attachment 120520 [details] [review] Same as Attachment #120515 [details], but using unset-flag #31 I just realized that just setting obj->ref_count to 1 in g_object_newv is wrong, because the 'user-code' constructor could have done g_object_ref and g_object_unref calls. So setting it would overwrite these (which is obviously wrong). This new patch is the same as Attachment #120515 [details], with the difference that I just unset bit 0x80000000 instead of equalizing ref_count with 1.
pvanhoof@tinc:~/repos/gnome/glib$ svn diff > /home/pvanhoof/gobjectonsteroids4.diff pvanhoof@tinc:~/repos/gnome/glib$ ./test-perf Don't count these: 2.236350 seconds taken in creating 10000000 instances (GstMiniObject). 2.346191 seconds taken in creating 10000000 instances (GObject). Don't count these: 1.501287 seconds taken in creating 10000000 instances (GstMiniObject). 2.344975 seconds taken in creating 10000000 instances (GObject). Useful results: 1.502752 seconds taken in creating 10000000 instances (GstMiniObject). 2.350401 seconds taken in creating 10000000 instances (GObject). Useful results: 1.501459 seconds taken in creating 10000000 instances (GstMiniObject). 2.356480 seconds taken in creating 10000000 instances (GObject). Useful results: 1.498863 seconds taken in creating 10000000 instances (GstMiniObject). 2.363237 seconds taken in creating 10000000 instances (GObject). pvanhoof@tinc:~/repos/gnome/glib$
(In reply to comment #52) > Created an attachment (id=120517) [edit] > This one decreases the amounts of lock/unlock in gtype.c > > This is related to Attachment #120515 [details] [edit], if you apply this patch on gtype.c you > improve the performance a little bit more (because this way you do less > lock/unlocks in g_type_class_ref in gtype.c: > > (The actual amounts of seconds are higher than above, but this small difference > fluctuated depending on the load of my computer. What matters more is that the > difference between GstMiniObject and GObject is consistently smaller). First, please create patches with diff -up, so the patched functions are aparent without having to apply a patch for review. Second, this patch actually *worsens* performance. The reason for that is that g_type_class_ref() is already optimized for the common case, i.e. referencing an already created class, rather than creating a non existing class. This btw is the reason that the function contains some "extra" locking calls, this keeps some of the rarely needed locks out of the common code path. The optimization is the if-condition commented as "/* optimize for common code path */", and it is in this exact code path that your patch is actually *adding* the acquisition of a mutex. Thus we flush CPU pipelines and caches twice instead of once and the common case becomes significantly slower. (I admit that type_class_ref isn't the easiest understandable code portion we have, but the quoted comment should have given a hint ;)
This bug has been cloned/split into Bug #557151 and Bug #557100. I'm marking this bug as obsolete for that reason. If that ain't the right decision (for example if there still is a reason to have a GstMiniObject in glib, which was the original request in this bug), please feel free to revert this action.
NOTE Comment #55: note that this patch is obsoleted because that patch is indeed wrong. Bug# 557100 and bugs #557151 have different patches and Comment #55 is not related to these patches.