GNOME Bugzilla – Bug 557100
Performance improvements for GObjectClasses that don't have properties
Last modified: 2010-05-31 14:15:14 UTC
+++ This bug was initially created as a clone of Bug #553794 +++ The original bug resulted in a large amount of comments and experimental patches. This patch developed by me and Jürg was a first one that Tim discussed with us on IRC. We concluded to isolate it and start a new bug about it. The patch improves the performance of GObject instance creation in case the GObjectClass has no properties defined. I can't immediately attach patches using this version of Bugzilla, so I will instantly after adding this bug attach the isolated patch.
Created attachment 120939 [details] [review] Jürg Billeter: Working patch Jürg Billeter wrote in original bug (Comment num. 35): I've fixed the bug that prevented most applications from working (G_OBJECT_GET_CLASS doesn't work in g_object_init, see comments in g_type_create_instance) and switched to gsize instead of guint32. Without patch: 4.066497 seconds taken in creating 10000000 instances (GObject). With patch: 1.931577 seconds taken in creating 10000000 instances (GObject). GstMiniObject: 0.938353 seconds taken in creating 10000000 instances (GstMiniObject).
Note The other part of the bigger patch in Attachment #120520 [details] (which originated from Bug #553794) is available in Bug #557151 as Attachment #120959 [details].
Are there any notes on this patch, or can I go ahead and commit this? Please review Attachment #120939 [details] (this is the last and most likely final patch)
(In reply to comment #3) > Are there any notes on this patch, or can I go ahead and commit this? > > Please review Attachment #120939 [details] [edit] (this is the last and most likely final patch) > Well, to me the patch looks good, except that I would add a private struct and put this 'flags' field in there instead. FWIW, it will be really nice to have this patch in so that I can happily derive my MediaContainer and MediaItem classes in Rygel from GObject without compromising (significantly) on performance. Otherwise, the class-hierarchy will have to be a bit of spaghetti. :(
Ping, what is the decision of the GLib maintainer on this one?
Marking as needinfo, awaiting decision
Hmm, according to Company it's better to leave the bug unconfirmed
ping
Maybe it's wise to propose this patch in the "GLib plans for the next cycle" thread on the gtk-devel-list? All I can tell is that I've applied this patch to my local glib for a few months now without any problems.
Phillip, for the most part, I just see this patch adding new if()s around existing if() statements that are already false. E.g. this part: + if (CLASS_HAS_PROPS (class)) + { for (slist = class->construct_properties; slist; slist = slist->next) It doesn't help to add an if here, because a class without properties will always also have class->construct_properties == NULL. The only possible improvement I see from this patch is to conditionalize the two calls to freeze/thaw the notify queue. Do you agree here, or can you provide reproducable measurements for any further improvements? (If not, I'd apprechiate a reworked patch, produced with "diff -up" this time.)
Created attachment 138346 [details] [review] Reworked patch This patch is reapplied and reworked on GLib git master. For the measurements which you requested, check comment #1 For the test application being used: Attachment #120285 [details] which you can compile to a .c and .h file using valac -C
About the placement of the if statement, commented on in Comment #13: given that the if() needs to happen anyway and given that the for loop isn't required to run anyway, it's slightly more efficient to move the if over it as well. Surely it's a bit pointless but so is it equally pointless to move the if down (and less clear in my opinion).
Created attachment 138347 [details] Attached compiled .c and source .vala program tar xvf test-perf.tar ; gcc test-perf.c `pkg-config glib-2.0 --cflags --libs` -o test-perf
Created attachment 138350 [details] [review] Using diff -up For your viewing pleasures
(In reply to comment #17) > Created an attachment (id=138350) [edit] > Using diff -up > > For your viewing pleasures > Why don't you just use `git format-patch` ?
The latest patch seems reversed ?
+#define CLASS_HAS_PROPS(class) \ + (G_UNLIKELY(class->flags & CLASS_HAS_PROPS_FLAG)) G_UNLIKELY doesn't seem appropriate here. Both cases happen reasonably often.
@Havoc comment #20: This one can easily be removed of course. However: When writing the patch, we wanted to optimize for the situation of the class not having properties. When the class had properties, the performance wasn't very good (in original code, in case of a class with properties). This make me conclude that it wouldn't be a serious performance regression to do incorrect branch prediction in the class-has-properties case, and that it would be a wanted performance improvement to do correct branch prediction in the class-has-no-properties case. I don't think you can actually measure the difference, though. So I think it would be equally fine to remove said branch-prediction.
Created attachment 140479 [details] [review] Minimal patch blocking only notification freezes As tim noted the only real optimization is avoiding the notification freezing, here is a minimal patch that does that only. It gives approximately the same performance as the full one, performance goes from: 2.619281 seconds taken in creating 10000000 instances (GObject). to 1.443047 seconds taken in creating 10000000 instances (GObject). a result which is mostly stable over time. However, is this patch actually right? Could not one of the class parents have properties that is not detected by this simple check?
You'd inherit the flag as construction of a new class memcpy()s all values before calling class_init, so the only thing that'd hit you is creation of a property after the subclass object is already created. So your patch should just work.
Its kinda weird that notification is such a slow operation though, at this points the ->qdata dataset should be empty so the lookups should be fast. Sysprof says a lot of time is spent in g_slice_free1, g_datalist_id_set_data_full and g_slice_alloc. If we can make this faster in general that may be a better approach.
RE Comment 24: The g_object_notify_queue_thaw's g_new could be a g_slice_new. A question here is whether nqueue->n_pspecs > 16 is going to happen often enough to have a gslice magazine instead? Other than that it sounds to me that thaw and freeze are wraps for g_datalist_id_set_data. The g_datalist_id_set_data(_full) stuff uses a mutex. These aren't known for being very fast. I wonder if g_datalist_id* can be made lockfree + MT safe? Or perhaps replace its use in gobject.c with something that is or can be lockfree?
I did some work on this, and will attach a well-factored series of patches containing the changes I felt actually made a difference (i tried various other optimizations, but they were not obviously good). It contains as the first patch a pretty reliable (see commit) performance measuring tool, and further "optimization" patches contain performance imporvements from last commit generated with this tool. The only thing i'm unsure of is the flag propagation to child classes. Is is allowed to add properties to a class after another class has derived from the class? I assume not, but want to make sure.
Created attachment 142229 [details] [review] Add performance tests for GObject primitives These are basic performance test for a couple of basic gobject primitives: * construction of simple objects. Simple is a bare gobject derived class with no properties, signals or interfaces. * construction of complex objects. Complex is a gobject subclass with construct properties, normal properties, signals, and implements an interface. * run-time type check of complex objects * signal emissions Lots of care is taken to try to make the results reproducible. Each test is run for multible "rounds", where we try to make each round be "not too short" in order to be significant wrt timer accuracy, but also "not to long" to make the probability of some other random event happening on the system (interrupts, other process scheduled, etc) during the round less likely. The current target round time is 10msecs, which was picked without rigour, but seems small wrt e.g. scheduler time. For each test we then run the calculated round size for 60 seconds, and then report the performance based on the minimal time of one round. The model here is that any random stuff that happens during a round can only slow it down, there is nothing that can make it go faster, so the minimal time is the best estimate of how fast one round goes. The result is not ideal, even on a "idle" system the results vary from round to round, but the variation seems to be less than 1%. So, any performance difference reported by this test over 1% is probably statistically significant. Additionally the tests can be run with or without threads being initialized. The script tests/gobject/run-performance.sh makes it easy to produce a performance report for the current checkout.
Created attachment 142230 [details] [review] Allocate GObjectNotifyQueue with g_slice instead of abusing g_list This is both cleaner and faster (it avoids function calls and zeroing the memory twice). Object construction performance improvement: Non-Threaded Threaded Simple: 11% 1.3% Complex: 8% 6% Other tests stable.
Created attachment 142231 [details] [review] Add flags member for GObjectClass
Created attachment 142232 [details] [review] Add GObjectClass flag CLASS_HAS_PROPS_FLAG This is set if a class or any of its parents have installed any properties.
Created attachment 142233 [details] [review] Don't freeze/thaw notification during construction if no properties If the class has no properties there could be no notification anyway. This is an important optimization for construction of simple objects. Object construction performance improvement: Non-Threaded Threaded Simple: 84% 91% Complex: -1.4% -0.6% Other tests stable.
Created attachment 142234 [details] [review] Add fast path for construction with no params This avoids a bunch of code and makes construction of simple objects faster. Object construction performance improvement: Non-Threaded Threaded Simple: 14% 5% Complex: -1.1% -2.2% Other tests stable.
This looks pretty good to me. Thanks for writing the tests, Alex. I'd say we should get this in.
Created attachment 142720 [details] [review] Fix type check test to actually test things g_type_check_instance_is_a is marked as G_GNUC_PURE, so was optimized away in the test loops. This fixes that by doing an indirect call through a global function pointer.
All these patches are now in the gobject-performance branch in git
I wonder how we can make this process go faster next time: There have been a rather large amount of positive reactions about this subject, giving, at least me, the impression that it is important. Alex's rework of the patch doesn't really change much to the core of the first patch. It splits it up in three parts (and adds a, very nice, unit test). Yet this took almost a year to materialize and about five monthly pings that didn't result in any clear instructions for the original patch. Note that me and Jürg worked on this during Boston Summit 2008 and had already discussed it with some developers back then. If that doesn't discourage a developer to write patches for GLib, then I don't know what does. It sure discouraged me (and I bet Jürg, too). Not much of a victory for 'community' style development. Sorry.
With the recent patches in bug 585375, we gain 2% on non-threaded and 40% on threaded tests for simple-construction on top of Alex' patches: BEFORE: $ ./performance -s 5 Running test simple-construction Number of constructed objects per second: 1923231 ^C $ ./performance -t -s 5 Running test simple-construction Number of constructed objects per second: 1332559 ^C AFTER: lvs@macbook:~/cvs/glib/tests/gobject$ ./performance -s 5 Running test simple-construction Number of constructed objects per second: 1860503 ^C lvs@macbook:~/cvs/glib/tests/gobject$ ./performance -t -s 5 Running test simple-construction Number of constructed objects per second: 899226 ^C
And if you wanna test them: Those patches are pushed to glib's gobject-performance branch.
(In reply to comment #37) > With the recent patches in bug 585375, we gain 2% on non-threaded and 40% on > threaded tests for simple-construction on top of Alex' patches: > > BEFORE: > > $ ./performance -s 5 > Running test simple-construction > Number of constructed objects per second: 1923231 > ^C > $ ./performance -t -s 5 > Running test simple-construction > Number of constructed objects per second: 1332559 > ^C > > AFTER: > > lvs@macbook:~/cvs/glib/tests/gobject$ ./performance -s 5 > Running test simple-construction > Number of constructed objects per second: 1860503 > ^C > lvs@macbook:~/cvs/glib/tests/gobject$ ./performance -t -s 5 > Running test simple-construction > Number of constructed objects per second: 899226 > ^C Don't these numbers suggest slowdown? Or am I too sleepy?
Sorry, I mixed the BEFORE and AFTER dataset. It is definitely faster now. I blame lack of coffee. Also even more numbers, from Edward, comparing master and gobject-performance branch are at http://pastebin.com/m2c7b0222 The gist of it: SIMPLE COMPLEX TYPECHECK EMIT --------------------------------------------------------------- master 2635119 890288 31.95 2192462 gobject-performance 5926007 851837 57.07 2325357 threaded master 1328971 586592 12.40 1790377 threaded gobject-perf 4264636 682392 57.06 1949348 And I made sure to copy the numbers properly this time. :)
(In reply to comment #28) > Created an attachment (id=142230) [details] > Allocate GObjectNotifyQueue with g_slice instead of abusing g_list > > This is both cleaner and faster (it avoids function calls and > zeroing the memory twice). YOu should get rid of the (void*) casts here, the g_slice* macros implement type chaging magic on purpose and the void* casts blinden the compiler from mismatches.
(In reply to comment #30) > Created an attachment (id=142232) [details] > Add GObjectClass flag CLASS_HAS_PROPS_FLAG > > This is set if a class or any of its parents have installed any > properties. Ok, I think having this makes a lot of sense (actually discussed this in the past with P.vHoof already). However, considering the wide use of GObject, it's not 100% certain that nobody introduces new properties after class derivation, and I'd at the very least like to elaboratelly catch this case. I suggest we do something like the following (warning pseudo patch ahead): @@ object_init() { + class->flags |= CLASS_HAS_INSTANCES_FLAG; // 0x2 @@ g_object_class_install_property (GObjectClass *class, { g_return_if_fail (G_IS_OBJECT_CLASS (class)); g_return_if_fail (G_IS_PARAM_SPEC (pspec)); class->flags |= CLASS_HAS_PROPS_FLAG; + if (CLASS_HAS_INSTANCES (class)) + g_error ("Attempt to add property %s::%s to class after object creation", + G_OBJECT_CLASS_NAME (class), pspec->name);
Hmm, i added that, and it triggered this on this: #0-#3 - the g_error above
+ Trace 218004
Basically, the button class when initialized adds properties to the GSettings class, at a time when the class is initialized. Maybe we should instead ensure that there are no subclasses by the time a property is installed?
This seems to work, does it seem ok to you tim: @@ -117,6 +117,10 @@ #define CLASS_HAS_PROPS(class) \ ((class)->flags & CLASS_HAS_PROPS_FLAG) +#define CLASS_HAS_DERIVED_CLASS_FLAG 0x2 +#define CLASS_HAS_DERIVED_CLASS(class) \ + ((class)->flags & CLASS_HAS_DERIVED_CLASS_FLAG) + /* --- signals --- */ enum { NOTIFY, @@ -281,6 +285,12 @@ g_object_base_class_init (GObjectClass *class) { GObjectClass *pclass = g_type_class_peek_parent (class); + /* Don't inherit HAS_DERIVED_CLASS flag from parent class */ + class->flags &= ~CLASS_HAS_DERIVED_CLASS_FLAG; + + if (pclass) + pclass->flags |= CLASS_HAS_DERIVED_CLASS_FLAG; + /* reset instance specific fields and methods that don't get inherited */ class->construct_properties = pclass ? g_slist_copy (pclass->construct_properties) : NULL; class->get_property = NULL; @@ -414,6 +424,10 @@ g_object_class_install_property (GObjectClass *class, g_return_if_fail (G_IS_OBJECT_CLASS (class)); g_return_if_fail (G_IS_PARAM_SPEC (pspec)); + if (CLASS_HAS_DERIVED_CLASS (class)) + g_error ("Attempt to add property %s::%s to class after it was derived", + G_OBJECT_CLASS_NAME (class), pspec->name); + class->flags |= CLASS_HAS_PROPS_FLAG; if (pspec->flags & G_PARAM_WRITABLE)
Yes, that seems to work with everything, and is the minimal check required for the HAS_PROPS propagation to work. I rebased the gobject-performance patch to current master, cleaning up some things (adding comment, reordering history, made the GAtomicArray freelist not use GSList) and pushed this as a gobject-performance-2 branch. Then i merged all the trivial bits to master: * performance tests * HAS_PROPS & co (includeing HAS_DERIVED_CLASS check from above) * construction fast path) to master The rest of the branch needs a bit more review imho, and i'll try to schedule that (and hopefully other will too). Then we can land that after we branch for unstable.
Now that gobject-performance-2 branch was merged, I think we can close this as fixed, alex?
+ if (CLASS_HAS_DERIVED_CLASS (class)) + g_error ("Attempt to add property %s::%s to class after it was derived", + G_OBJECT_CLASS_NAME (class), pspec->name); + Maybe this should check for (class->flags & CLASS_HAS_PROPS_FLAG) to be FALSE? What's the point on restricting the addition of a property to a class already derived, if it already has properties and CLASS_HAS_PROPS_FLAGS has been already set to TRUE? The inconsistency can only happen if CLASS_HAS_PROPS_FLAGS is unset.