Bug 661140 – GSignal emission optimization

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 661140 - GSignal emission optimization


Summary:	GSignal emission optimization


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	gobject
Version:	2.30.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:	536939

Reported:	2011-10-06 23:32 UTC by Lionel Landwerlin
Modified:	2012-03-05 11:40 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
patch1 (47.41 KB, patch) 2011-10-06 23:33 UTC, Lionel Landwerlin	none	Details \| Review
patch2 (5.05 KB, patch) 2011-10-06 23:34 UTC, Lionel Landwerlin	none	Details \| Review
patch3 (4.28 KB, patch) 2011-10-06 23:34 UTC, Lionel Landwerlin	none	Details \| Review
patch4 (10.33 KB, patch) 2011-10-06 23:35 UTC, Lionel Landwerlin	none	Details \| Review
patch5 (15.98 KB, patch) 2011-10-06 23:35 UTC, Lionel Landwerlin	none	Details \| Review
patch6 (49.05 KB, patch) 2011-10-06 23:36 UTC, Lionel Landwerlin	none	Details \| Review
patch7 (2.63 KB, patch) 2011-10-06 23:37 UTC, Lionel Landwerlin	none	Details \| Review
patch8 (5.47 KB, patch) 2011-10-06 23:38 UTC, Lionel Landwerlin	rejected	Details \| Review
patch9 (4.40 KB, patch) 2011-10-06 23:38 UTC, Lionel Landwerlin	none	Details \| Review
signal connection tracking (4.28 KB, patch) 2012-02-16 02:03 UTC, Lionel Landwerlin	none	Details \| Review
glib-genmarshal modification (58.13 KB, patch) 2012-02-16 02:03 UTC, Lionel Landwerlin	none	Details \| Review
gsignal modification (69.35 KB, patch) 2012-02-16 02:04 UTC, Lionel Landwerlin	none	Details \| Review
tests additions 1 (5.47 KB, patch) 2012-02-16 02:04 UTC, Lionel Landwerlin	none	Details \| Review
tests additions 2 (3.98 KB, patch) 2012-02-16 02:05 UTC, Lionel Landwerlin	none	Details \| Review
tests additions 3 (979 bytes, patch) 2012-02-16 02:05 UTC, Lionel Landwerlin	none	Details \| Review
gsignal whitespaces cleanup (31.75 KB, patch) 2012-02-17 01:05 UTC, Lionel Landwerlin	none	Details \| Review
gsignal whitespaces cleanup (31.75 KB, patch) 2012-02-17 01:05 UTC, Lionel Landwerlin	none	Details \| Review
gsignal emission optimization (37.77 KB, patch) 2012-02-17 01:06 UTC, Lionel Landwerlin	reviewed	Details \| Review
glib-genmarshal modification (56.74 KB, patch) 2012-02-22 09:04 UTC, Lionel Landwerlin	none	Details \| Review
gsignal emission optimization (39.75 KB, patch) 2012-02-22 09:05 UTC, Lionel Landwerlin	none	Details \| Review
tests: Add generic and empty signal emission performace tests (6.36 KB, patch) 2012-02-23 13:00 UTC, Alexander Larsson	none	Details \| Review
Add GRealClosure and move meta_marshallers there (10.71 KB, patch) 2012-02-23 13:00 UTC, Alexander Larsson	none	Details \| Review
Add optional support for varargs marshallers to GClosure (12.83 KB, patch) 2012-02-23 13:00 UTC, Alexander Larsson	none	Details \| Review
Support generating va marshallers in glib-genmarshal (62.56 KB, patch) 2012-02-23 13:01 UTC, Alexander Larsson	none	Details \| Review
Add g_signal_set_va_marshaller (5.79 KB, patch) 2012-02-23 13:01 UTC, Alexander Larsson	none	Details \| Review
Optimize single-handler va_marshaller case (9.25 KB, patch) 2012-02-23 13:01 UTC, Alexander Larsson	none	Details \| Review
Use the generated va_marshallers in the performance tests (1.29 KB, patch) 2012-02-23 13:01 UTC, Alexander Larsson	none	Details \| Review
Add _g_closure_is_void to check for NULL vfuncs (2.33 KB, patch) 2012-02-28 16:04 UTC, Alexander Larsson	none	Details \| Review
Precalculate as much as possible for signal emission fast path (13.08 KB, patch) 2012-02-28 16:04 UTC, Alexander Larsson	none	Details \| Review
generic va_marshaller: Mask out G_SIGNAL_TYPE_STATIC_SCOPE from type (902 bytes, patch) 2012-02-28 16:04 UTC, Alexander Larsson	none	Details \| Review

Description Lionel Landwerlin 2011-10-06 23:32:21 UTC

Hi all,

These days I'm mostly working with frameworks Mx and Clutter and
contributing to projects like this Media Explorer :

https://github.com/media-explorer/media-explorer

A few months ago we started to profile our application to get better
performances on devices with a more "embedded" profile (like Intel
CE4100 chipsets).

Our profiling showed up that we spend a lot of time at painting our
Clutter scene. And a considerable amount of this time is spent on
emitting paint signal in clutter actors, that are, most of the time not
connected. At the end, most of the paint signals emitted end up calling
the virtual function associated to the signal.

We do quite a lot of things when calling the g_signal_emit() function.
One of them is to transform the variable arguments of the function into
an array of GValue which can be considered as negligible when several
function are connected to a signal but is quite costly when you don't
have anything to call.

To avoid as much computation as possible during the signal emission,
Robert Bragg came up with the idea of trying to get rid of the GValue
boxing of the variable arguments of the g_signal_emit() function and
instead transfer a argument va_list directly to the marshallers. On top
of that I did some testing & improvements (around PLT symbols) with our
application.

The optimization can be deactivated at run time. And the results using
the performance test show that we double the performances of the signal
emission in the case nobody is listening to the emission.

$ G_SIGNAL_ENABLE_VAR_ARGS=0 ./performance emit-unhandled
Running test emit-unhandled
Emissions per second: 1050115
$ G_SIGNAL_ENABLE_VAR_ARGS=1 ./performance emit-unhandled
Running test emit-unhandled
Emissions per second: 2112535

It also improve performances of connected signals :

$ G_SIGNAL_ENABLE_VAR_ARGS=0 ./performance emit-handled
Running test emit-handled
Emissions per second: 804398
$ G_SIGNAL_ENABLE_VAR_ARGS=1 ./performance emit-handled
Running test emit-handled
Emissions per second: 1368504

This work started on top of the glib-2-28 branch and has been rebased on
glib-2-30. I noticed that a few things have changed on the marshallers
generation. It seems quite a few signal moved on using a generic
marshaller using ffi. I haven't had the time to look at it yet. Any
information on what it does differently from old marshaller would be
appreciated.
I could like to know whether we can rework our optimization on top of it
and investigate performances as well.

Thanks

Comment 1 Lionel Landwerlin 2011-10-06 23:33:44 UTC

Created attachment 198484 [details] [review]
patch1

Comment 2 Lionel Landwerlin 2011-10-06 23:34:05 UTC

Created attachment 198485 [details] [review]
patch2

Comment 3 Lionel Landwerlin 2011-10-06 23:34:51 UTC

Created attachment 198486 [details] [review]
patch3

Comment 4 Lionel Landwerlin 2011-10-06 23:35:26 UTC

Created attachment 198487 [details] [review]
patch4

Comment 5 Lionel Landwerlin 2011-10-06 23:35:47 UTC

Created attachment 198488 [details] [review]
patch5

Comment 6 Lionel Landwerlin 2011-10-06 23:36:14 UTC

Created attachment 198489 [details] [review]
patch6

Comment 7 Lionel Landwerlin 2011-10-06 23:37:46 UTC

Created attachment 198490 [details] [review]
patch7

Comment 8 Lionel Landwerlin 2011-10-06 23:38:03 UTC

Created attachment 198491 [details] [review]
patch8

Comment 9 Lionel Landwerlin 2011-10-06 23:38:38 UTC

Created attachment 198492 [details] [review]
patch9

Comment 10 Lionel Landwerlin 2011-10-06 23:59:00 UTC

I wanted to do the same thing for the newly introduced generic closure marshaller, but that would require va_list support from libffi, which might be quite a pain in the ass to implement for every ABI...

Considering the minimal number of people who manually use the closure API to connect signal handlers, it might could be interesting to create a fast path within gsignal to optimize the 90% cases where we emit signal that nobody really listens to.

Comment 11 Colin Walters 2011-10-09 13:32:08 UTC

(In reply to comment #10)

> Considering the minimal number of people who manually use the closure API to
> connect signal handlers, it might could be interesting to create a fast path
> within gsignal to optimize the 90% cases where we emit signal that nobody
> really listens to.

As I said on the mailing list, I'd like to see this first - it might not be very hard at all to do.

Comment 12 Lionel Landwerlin 2011-10-09 15:41:55 UTC

I had a look at that yesterday and it doesn't look straight forward...
Mostly because of these functions :


void    g_signal_override_class_closure       (guint              signal_id,
                                               GType              instance_type,
                                               GClosure          *class_closure);
void    g_signal_override_class_handler       (const gchar       *signal_name,
                                               GType              instance_type,
                                               GCallback          class_handler);
void    g_signal_chain_from_overridden        (const GValue      *instance_and_params,
                                               GValue            *return_value);
void   g_signal_chain_from_overridden_handler (gpointer           instance,
                                               ...);

I don't know who's using these functions, but that makes the fast-path much more complicated to implement.

Comment 13 Emmanuele Bassi (:ebassi) 2011-10-09 17:03:51 UTC

the _override_class_* functions are meant to be used by classes when installing signals or by their subclasses; they are the function equivalent of "klass->signal = class_closure" - i.e. they should be treated exactly as if there is a class handler when a signal is about to be emitted.

signal_chain_from_overridden_handler() is admittedly more complex to handle; I believe is used by language bindings as well, for subclass override + chain up across the language barrier. this sadly can happen *during* the emission as opposed to before the emission.

Comment 14 Lionel Landwerlin 2011-10-18 23:54:24 UTC

Review of attachment 198491 [details] [review]:

Moving the performances tests patchs to https://bugzilla.gnome.org/show_bug.cgi?id=662154

Comment 15 Lionel Landwerlin 2012-01-18 20:43:47 UTC

Updating the bug, since I saw a few new people following this.

As discussed on #gtk+, I tried an another approach to specifically target the class closures on signal (i.e. the vfunc called when you emit a signal).
This is used a lot in Clutter for instance, when painting actors and also queuing redraw/relayout. It seems to be used in GTK+ as well, but there are other specific stuff that might not benefit from this approach.

So basically I now have a signal emission which is about 4 times faster than what we currently have, but only if you have no signal connected but have a class closure on the signal. Just with this, I already have a few Clutter apps that perform 10 fps faster than previously, I guess apps like gnome-shell could benefit from this.

I'm currently in the process of cleaning that a bit, get rid of the first approach (patches attached here a few months ago), and also trying to fast past the signal emission with handlers, by squeezing completely GClosure.

Of course that optimization won't work if you're using weird signals like a signal registered with g_signal_newv() or if you're using a non standard marshaller (like GtkWidget does on the "draw" signal), but both cases are very specific and don't apply to 90% of your signals.

Comment 16 Lionel Landwerlin 2012-01-18 20:44:49 UTC

And this would also speed up signals using the generic marshaller introduced in glib 2.30. :)

Comment 17 Lionel Landwerlin 2012-02-16 02:03:16 UTC

Created attachment 207716 [details] [review]
signal connection tracking

Comment 18 Lionel Landwerlin 2012-02-16 02:03:47 UTC

Created attachment 207717 [details] [review]
glib-genmarshal modification

Comment 19 Lionel Landwerlin 2012-02-16 02:04:08 UTC

Created attachment 207718 [details] [review]
gsignal modification

Comment 20 Lionel Landwerlin 2012-02-16 02:04:41 UTC

Created attachment 207719 [details] [review]
tests additions 1

Comment 21 Lionel Landwerlin 2012-02-16 02:05:02 UTC

Created attachment 207720 [details] [review]
tests additions 2

Comment 22 Lionel Landwerlin 2012-02-16 02:05:21 UTC

Created attachment 207721 [details] [review]
tests additions 3

Comment 23 Lionel Landwerlin 2012-02-16 02:24:00 UTC

So this is the 3rd iteration of the patch for gsignal.

As explained above the approach is to avoid passing through gclosure as much as possible, so this adds a bit of code inside gsignal.

With this modification, you end up having backtraces like that look more this :

#6 clutter_stage_paint
#5 g_cclosure_marshal_VOID__VOIDv
#4 g_signal_emit_valist
#3 g_signal_emit
#2 clutter_actor_continue_paint
#1 clutter_actor_paint

instead of this (currently) :

#9 clutter_stage_paint
#8 g_cclosure_marshal_VOID__VOID
#7 g_type_class_meta_marshal
#6 g_closure_invoke
#5 signal_emit_unlocked_R
#4 g_signal_emit_valist
#3 g_signal_emit
#2 clutter_actor_continue_paint
#1 clutter_actor_paint

It's pretty obvious that we gain on the number of function call, but also something that you don't see here is that we don't collect the arguments into GValues as well. That avoids taking references on objects and copying strings etc...

Of course we preserve the current behavior for signal emissions that need it.
If you look at the "gsignal modification" patch, you'll appreciate the complexity of cases handled by gsignal.

So with this I manage to run pretty complex applications like evolution (even though I found a small bug in GTK+ https://bugzilla.gnome.org/show_bug.cgi?id=670176) and clutter apps as well.

I'll probably give try on a whole distro later, but in the meantime if people could give feedback, that would be greatly appreciated.

Comment 24 Matthias Clasen 2012-02-16 02:43:48 UTC

Review of attachment 207718 [details] [review]:

No detailed review atm, but a free nitpick: 'vaarg' really hurts my sensibility. It should either be 'vararg', or 'va' (as shorthand for 'variable argument')

Comment 25 Lionel Landwerlin 2012-02-17 01:05:22 UTC

Created attachment 207826 [details] [review]
gsignal whitespaces cleanup

Comment 26 Lionel Landwerlin 2012-02-17 01:05:23 UTC

Created attachment 207827 [details] [review]
gsignal whitespaces cleanup

Comment 27 Lionel Landwerlin 2012-02-17 01:06:40 UTC

Created attachment 207828 [details] [review]
gsignal emission optimization

Comment 28 Lionel Landwerlin 2012-02-17 01:25:51 UTC

As requested on IRC, split the whitespace cleanup from the actual optimization.

Comment 29 Emmanuele Bassi (:ebassi) 2012-02-20 12:31:56 UTC

Review of attachment 207828 [details] [review]:

pre-review, with a couple of points raised during the hackfest.

::: gobject/gsignal.c
@@ +410,3 @@
+
+static gpointer
+{

this is pretty evil.

how does this work on windows? how does it work for statically linked symbols?

@@ +431,3 @@
+
+void
+  if (types1[i] == types2[i])

registering marshallers is also pretty evil for compatibility.

personally, I'd rather see generating marshallers go away, and just switch to libffi.

Comment 30 Lionel Landwerlin 2012-02-20 13:22:21 UTC

I am happy to flag the PLT part as linux specific, but I don't have much knowledge about windows. Any input about what happens on this platform?

I could give a try to use libffi directly, but I already know we will loose the gain this patch tries to bring.

The idea here is to optimize the common and heavily used marshallers, and leave the rest to libffi. Ideally if this patch lands, we could tell (most) developers to not use/generate marshallers and just give a NULL argument to g_signal_new().

Comment 31 Lionel Landwerlin 2012-02-20 13:25:33 UTC

Actually there is a problem _resolve_plt_marshal, I can fix that to make it work with static symbols (which aren't subject to the PLT problem).

Comment 32 Alexander Larsson 2012-02-20 16:00:14 UTC

Review of attachment 207716 [details] [review]:

::: gobject/gsignal.c
@@ +138,3 @@
 							 gpointer	  instance);
 static inline Handler*		handler_new		(gboolean	  after);
+static	      void		signal_handler_insert	(SignalNode      *node,

Why are you renaming this function? It seems unnecessary and confusing with all the other handler_foo_bar functions

@@ +590,3 @@
+{
+  Handler *handler;
+

Shouldn't this initialize all_c_closures to TRUE here?

Comment 33 Alexander Larsson 2012-02-20 16:00:14 UTC

Review of attachment 207716 [details] [review]:

::: gobject/gsignal.c
@@ +138,3 @@
 							 gpointer	  instance);
 static inline Handler*		handler_new		(gboolean	  after);
+static	      void		signal_handler_insert	(SignalNode      *node,

Why are you renaming this function? It seems unnecessary and confusing with all the other handler_foo_bar functions

@@ +590,3 @@
+{
+  Handler *handler;
+

Shouldn't this initialize all_c_closures to TRUE here?

Comment 34 Alexander Larsson 2012-02-20 16:00:23 UTC

Review of attachment 207716 [details] [review]:

Some feedback

::: gobject/gsignal.c
@@ +138,3 @@
 							 gpointer	  instance);
 static inline Handler*		handler_new		(gboolean	  after);
+static	      void		signal_handler_insert	(SignalNode      *node,

Why are you renaming this function? It seems unnecessary and confusing with all the other handler_foo_bar functions

@@ +590,3 @@
+{
+  Handler *handler;
+

Shouldn't this initialize all_c_closures to TRUE here?

Comment 35 Alexander Larsson 2012-02-20 16:19:30 UTC

Review of attachment 207717 [details] [review]:

The generated marshallers should use G_GNUC_UNUSED for e.g. return values that are not used, just like the normal marshallers.

For ease of read we should probably generate the g_cclosure_marshal_VOID__BOOLEANv after the g_cclosure_marshal_VOID__BOOLEAN in the code.

You need to handle the G_CCLOSURE_SWAP_DATA case too, as you can e.g. g_signal_override_class_handler with a g_cclosure_new_object_swap.

Comment 36 Alexander Larsson 2012-02-20 17:07:15 UTC

Review of attachment 207828 [details] [review]:

All this weird magic to reverse map the ordinary marshaller to the varargs one just is too ugly to live.
I think we should just drop it all and add a new g_signal_new which lets you specify the varargs marshaller.

I tried to think of some preprocessor hackery to rewrite existing g_signal_new calls to the new call, but I don't think that is possible.

Anyway, this is free software, changing all the core/importants signal registrations to use a new function is easy, and maintaining the ugly stuff forever is much worse.

Comment 37 Alexander Larsson 2012-02-21 15:32:59 UTC

I'm working on another version of this, but i noticed a real problem with the whole approach. We currently rely a lot on the fact that GObject keeps a ref of all data over the lifetime of the signal emission. For instance, if the first signal handler unrefs some argument it still continues to live in that handler and in the following ones. This is guaranteed by the boxing of the arguments in the GValues.

We can still sort of solve this by having the va_list marshaller ref its arguments, but we can then only handle the case of a single signal handler.

Comment 38 Lionel Landwerlin 2012-02-21 15:45:29 UTC

Just one question : What the point of rewriting this if it ends up as slow as it was before?

Actually if I understand correctly, what you're proposing would be even slower than what we currently have (you would ref/unref for each call to a marshaller instead of only once before calling marshallers). Besides, if you look at how things are done today, if 2 handlers unref one of the argument, you're already screwed.

Also there is flag on gsignal : G_SIGNAL_MUST_COLLECT, you should use that if you REALLY want gsignal to take a ref on your arguments. But to me that's already fairly wrong.

Comment 39 Lionel Landwerlin 2012-02-22 09:02:50 UTC

Here is an update of the patches.

I handled :
  * comment 29 :
    - PLT symbols specific code has been flagged

  * comment 32/33/34 :
    - renamed to handler_insert
    - all_c_closures is now set to TRUE at first

  * comment 35 :
    - marshallers use G_GNUC_UNUSED
    - G_CCLOSURE_SWAP_DATA is actually already handled in signal_emit_valist_unlocked_R

Comment 40 Lionel Landwerlin 2012-02-22 09:04:39 UTC

Created attachment 208183 [details] [review]
glib-genmarshal modification

Comment 41 Lionel Landwerlin 2012-02-22 09:05:24 UTC

Created attachment 208184 [details] [review]
gsignal emission optimization

Comment 42 Alexander Larsson 2012-02-22 10:49:18 UTC

If things are slower there is obviously no point in this whole rewrite. However, if it breaks existing code there is also no point in it. It not that I *like* the fact that the optimization might not be possible.

There are hundreds of thousands ancient lines of code that depend on the details of GObject, we can't just change the behaviour of things and claim that they should have been using some flag all along (especially ALWAYS_COLLECT which was introduced less than a year ago).

I don't think your "two handlers unreffing the argument" argument is quite right, the only bug-free way a handler can unref an argument is if it owns a ref to it, it should never unref the boxed signal arguments ref. Take this example for instance:

 Class Foo { GObject *prop; signal ping(GObject *obj) }
 foo emits ping with prop as the argument
 a signal handler on foo destroys the foo object (it was the only owner of it)
 a second signal handler was connected to the ping signal, it now receives a
    reference to the prop object that has been finalized, wheras before the
    object would be kept alive by the GValue until all handlers had run.

So, we can't handle the case where there is multiple signal handlers if we're ref:ing things in each marshaller, which is why I said we can only handle the case with a single closure.

Comment 43 Lionel Landwerlin 2012-02-22 11:07:20 UTC

Ok, in that case we better only care about the vfunc class closure. It should just be a matter of removing signal_emit_valist_unlocked_R and fallback to the usual signal_emit_unlocked_R instead.

I still think the case you're describing is so convoluted that it's basically a programming error if you end up with an unreffed argument in your handler. You should be aware of the signal semantic : it's going to unref stuff, therefore you should not connect with G_CONNECT_AFTER and define the signal with G_SIGNAL_RUN_CLEANUP.

Comment 44 Alexander Larsson 2012-02-22 11:23:38 UTC

Also, I don't think the return value collection is right, you can't assume
everything thats not one of the primary types is a pointer, you need to do the
full G_VALUE_COLLECT_INIT loop, handling collect_format and multiple arguments per value (although you obviously don't need to do the actual collection).

Comment 45 Alexander Larsson 2012-02-23 12:57:11 UTC

I have a new set of patches to handle this that is imho a bit cleaner. It introduces the concept of va_marshallers properly into GClosure rather than trying to keep the GClosure construction information on the side in gsignal.c. It only optimizes the case of a single handler with no or NULL vfunc class closure or the default class closure (assuming the corresponding closure supports va_marshallers). There is also a generic (ffi based) va_marshaller that goes with the normal generic marshaller.

The patch series is mostly independent (although based on Lionels patches) but does rely on the existing "test additions 1" patch.

I can run a the gobject tests with this and i've tried various apps a bit, but its not all that well tested. It needs review, more testcases and real life testing. 

The va_marshallers in the patches don't yet ref/copy objects/boxed arguments, so its not quite conformant. It does however ref the instance.

Perormance before (not by disabling at runtime, but just running the tests before applying the optimizations, as we otherwise count the optimization bookkeeping overhead in the before case) is like:

Running test emit-unhandled
Emissions per second: 1877347
Running test emit-unhandled-empty
Emissions per second: 23058544
Running test emit-unhandled-generic
Emissions per second: 1686556
Running test emit-unhandled-generic-empty
Emissions per second: 23041250
Running test emit-handled
Emissions per second: 1411691
Running test emit-handled-empty
Emissions per second: 1453611
Running test emit-handled-generic
Emissions per second: 1195489
Running test emit-handled-generic-empty
Emissions per second: 1333166

(empty means class closure set but vfunc is NULL)

And after the optimizations:

Running test emit-unhandled
Emissions per second: 4107661
Running test emit-unhandled-empty
Emissions per second: 22277069
Running test emit-unhandled-generic
Emissions per second: 3114286
Running test emit-unhandled-generic-empty
Emissions per second: 22271500
Running test emit-handled
Emissions per second: 1367398
Running test emit-handled-empty
Emissions per second: 3628843
Running test emit-handled-generic
Emissions per second: 1141061
Running test emit-handled-generic-empty
Emissions per second: 2839890

Its noteworthy that a lot of the cost in the fast path is the ref/unref of the
instance. Without that we get:

Running test emit-unhandled
Emissions per second: 5951305
Running test emit-unhandled-empty
Emissions per second: 22422150
Running test emit-unhandled-generic
Emissions per second: 4074260
Running test emit-unhandled-generic-empty
Emissions per second: 22421699
Running test emit-handled
Emissions per second: 1326608
Running test emit-handled-empty
Emissions per second: 4984977
Running test emit-handled-generic
Emissions per second: 1129769
Running test emit-handled-generic-empty
Emissions per second: 3591398

Which is significantly different. So, maybe we should look at making ref/unref faster.

Comment 46 Alexander Larsson 2012-02-23 13:00:32 UTC

Created attachment 208254 [details] [review]
tests: Add generic and empty signal emission performace tests

generic means it uses the generic marshaller
empty means the vfunc pointer is NULL

Comment 47 Alexander Larsson 2012-02-23 13:00:42 UTC

Created attachment 208255 [details] [review]
Add GRealClosure and move meta_marshallers there

This means we're not abusing the notifiers for meta_marshallres,
and we're able to later cleanly add other fields to GClosure.

We still have to leave the ABI intact for the GClosure->meta_marshal
bit, as old G_CLOSURE_N_NOTIFIERS macro instances still accesses it.
However, we always set it to zero to keep those macros working.

Comment 48 Alexander Larsson 2012-02-23 13:00:51 UTC

Created attachment 208256 [details] [review]
Add optional support for varargs marshallers to GClosure

These closures support being invoked on a va_args which can
be useful as you can then avoid boxing the va_args into
GValues in certain cases.

Comment 49 Alexander Larsson 2012-02-23 13:01:02 UTC

Created attachment 208257 [details] [review]
Support generating va marshallers in glib-genmarshal

Comment 50 Alexander Larsson 2012-02-23 13:01:11 UTC

Created attachment 208258 [details] [review]
Add g_signal_set_va_marshaller

This lets you set a va_marshaller on your signal which will be
propagated to all closures for the signal. Also, automatically
uses the generica va_marshaller if you specify a NULL c_marshaller.

Comment 51 Alexander Larsson 2012-02-23 13:01:20 UTC

Created attachment 208259 [details] [review]
Optimize single-handler va_marshaller case

When there is only one closure handling a signal emission and
it doesn't have a bunch of complicated features enabled we
can short circuit the va_args collection into GValues and call the
callback via the va_marshaller directly.

Note, this isn't yet 100% correct. The marshallers should also
be extended to ref all non-primitive arguments.

Comment 52 Alexander Larsson 2012-02-23 13:01:30 UTC

Created attachment 208260 [details] [review]
Use the generated va_marshallers in the performance tests

Comment 53 Alexander Larsson 2012-02-23 18:52:01 UTC

Review of attachment 208256 [details] [review]:

::: gobject/gclosure.c
@@ +1476,3 @@
+  for (i = 0; i < n_params; i++)
+    {
+      atypes[i+1] = va_to_ffi_type (param_types[i],

This needs to mask out G_SIGNAL_TYPE_STATIC_SCOPE

Comment 54 Alexander Larsson 2012-02-28 16:04:21 UTC

Created attachment 208599 [details] [review]
Add _g_closure_is_void to check for NULL vfuncs

Comment 55 Alexander Larsson 2012-02-28 16:04:42 UTC

Created attachment 208600 [details] [review]
Precalculate as much as possible for signal emission fast path

Comment 56 Alexander Larsson 2012-02-28 16:04:50 UTC

Created attachment 208601 [details] [review]
generic va_marshaller: Mask out G_SIGNAL_TYPE_STATIC_SCOPE from type

Comment 57 Alexander Larsson 2012-02-28 16:07:55 UTC

The last three patches try to avoid as much work as possible per-emission in the fastpath code by storing the precalculated state and invalidating it when needed. It also subsumes the existing signal emission NOP checks.

This gives us another 50% or so performance increase compared to my old patcheset.

Known issues: Still not reffing signal arguments as needed.

Also, generally not tested very well.

Comment 58 Alexander Larsson 2012-03-05 11:40:34 UTC

I just pushed the signal-performance branch to master.