After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 723642 - Use g_function_info_prep_invoker and ffi_call directly
Use g_function_info_prep_invoker and ffi_call directly
Status: RESOLVED FIXED
Product: pygobject
Classification: Bindings
Component: introspection
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Nobody's working on this now (help wanted and appreciated)
Python bindings maintainers
Depends on:
Blocks:
 
 
Reported: 2014-02-05 00:48 UTC by Simon Feltman
Modified: 2014-03-04 09:39 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Use ffi_call directly instead of g_callable_info_invoke (35.80 KB, patch)
2014-02-08 11:21 UTC, Simon Feltman
committed Details | Review

Description Simon Feltman 2014-02-05 00:48:58 UTC
PyGObject functions are called using "g_function_info_invoke" which does a whole lot of work each time it is called. This includes the bindings splitting up "in" and "out" argument lists only to have this function join them back together. Additionally it creates a new ffi_cif each call.

The bindings should be updated to cache the ffi_cif via g_function_info_prep_invoker then use ffi_call directly. This might also allow us to simplify argument marshaling logic.

An initial test reveals almost a 2x performance boost with a micro benchmark for simple functions:


Before:
>>> %timeit GLib.hostname_is_ip_address('192.168.1.1')
100000 loops, best of 3: 5.32 us per loop


After:
>>> %timeit GLib.hostname_is_ip_address('192.168.1.1')
100000 loops, best of 3: 2.82 us per loop
Comment 1 Simon Feltman 2014-02-08 11:21:10 UTC
Created attachment 268480 [details] [review]
Use ffi_call directly instead of g_callable_info_invoke

Cleanup internal callable cache and state tracking by removing multiple
counting schemes for differently sized "in" and "out" argument arrays.
Use a single count based on the total number of arguments passed to C
(inclusive of instance argument and GError exception where applicable).
Size all state tracking arrays to the same size and ensure argument cache
indices always line up with these arrays. This cleans up logic which was
required by g_callable_info_invoke for splitting "in" and "out" arguments
up.

Cleanup array marshaling which can now rely on the new scheme which ensures
the "arg_values" array always points to the correct location for length
argument values.

Cache the ffi_cif struct in PyGICallableCache via GIFunctionInvoker and
related GI methods. Overall, these changes can give a performance boost of
up to 2x for function calls.
Comment 2 Simon Feltman 2014-02-09 02:53:39 UTC
Notes:

valgrind has been run before and after this patch using: test_everything, test_gi, test_gobject, and test_overrides_gtk with no differences.

Performance is most notably improved for functions which require less arguments of lower complexity. A function like Gtk.ListStore.insert_with_valuesv which marshals an array of Python values to GValues is not going to see as much of an improvement (although there is some). Some updated micro benchmarks:


Before:
  >>> %timeit Gtk.get_major_version()
  100000 loops, best of 3: 3.48 us per loop

After:
  >>> %timeit Gtk.get_major_version()
  100000 loops, best of 3: 1.85 us per loop



Before:
  >>> %timeit GLib.hostname_is_ip_address('192.168.1.1')
  100000 loops, best of 3: 5.22 us per loop

After:
  >>> %timeit GLib.hostname_is_ip_address('192.168.1.1')
  100000 loops, best of 3: 2.82 us per loop



Before:
  >>> adjust = Gtk.Adjustment()
  >>> %timeit adjust.configure(0, 0, 10, 0.1, 1.0, 1.0)
  100000 loops, best of 3: 15.2 us per loop

After:
  >>> adjust = Gtk.Adjustment()
  >>> %timeit adjust.configure(0, 0, 10, 0.1, 1.0, 1.0)
  100000 loops, best of 3: 10.2 us per loop



Before:
  >>> model = Gtk.ListStore(str, str, str, int, int, int)
  >>> columns = [0, 1, 2, 3, 4, 5]
  >>> row = ['a'*16, 'b'*16, 'c'*16, 1, 2, 3]
  >>> model.clear()
  >>> %timeit model.insert_with_valuesv(-1, columns, row)
  100000 loops, best of 3: 19.6 us per loop

After:
  ...
  >>> %timeit model.insert_with_valuesv(-1, columns, row)
  100000 loops, best of 3: 14.9 us per loop
Comment 3 Simon Feltman 2014-03-03 13:50:27 UTC
Notes:
More testing has been completed, including armv6 architecture.

Attachment 268480 [details] pushed as 5798f94 - Use ffi_call directly instead of g_callable_info_invoke
Comment 4 Martin Pitt 2014-03-04 08:12:47 UTC
Wow, nice work Simon!

With that, should we drop the --without-ffi configure option? It's pretty much mandatory now, isn't it?
Comment 5 Simon Feltman 2014-03-04 09:39:24 UTC
(In reply to comment #4)
> Wow, nice work Simon!
> 
> With that, should we drop the --without-ffi configure option? It's pretty much
> mandatory now, isn't it?

Yep, I meant to check on that but missed it. We've always had a hard dependency on the ffi header files, so I'm not sure what that option was actually supposed to do?