GNOME Bugzilla – Bug 723642
Use g_function_info_prep_invoker and ffi_call directly
Last modified: 2014-03-04 09:39:24 UTC
PyGObject functions are called using "g_function_info_invoke" which does a whole lot of work each time it is called. This includes the bindings splitting up "in" and "out" argument lists only to have this function join them back together. Additionally it creates a new ffi_cif each call. The bindings should be updated to cache the ffi_cif via g_function_info_prep_invoker then use ffi_call directly. This might also allow us to simplify argument marshaling logic. An initial test reveals almost a 2x performance boost with a micro benchmark for simple functions: Before: >>> %timeit GLib.hostname_is_ip_address('192.168.1.1') 100000 loops, best of 3: 5.32 us per loop After: >>> %timeit GLib.hostname_is_ip_address('192.168.1.1') 100000 loops, best of 3: 2.82 us per loop
Created attachment 268480 [details] [review] Use ffi_call directly instead of g_callable_info_invoke Cleanup internal callable cache and state tracking by removing multiple counting schemes for differently sized "in" and "out" argument arrays. Use a single count based on the total number of arguments passed to C (inclusive of instance argument and GError exception where applicable). Size all state tracking arrays to the same size and ensure argument cache indices always line up with these arrays. This cleans up logic which was required by g_callable_info_invoke for splitting "in" and "out" arguments up. Cleanup array marshaling which can now rely on the new scheme which ensures the "arg_values" array always points to the correct location for length argument values. Cache the ffi_cif struct in PyGICallableCache via GIFunctionInvoker and related GI methods. Overall, these changes can give a performance boost of up to 2x for function calls.
Notes: valgrind has been run before and after this patch using: test_everything, test_gi, test_gobject, and test_overrides_gtk with no differences. Performance is most notably improved for functions which require less arguments of lower complexity. A function like Gtk.ListStore.insert_with_valuesv which marshals an array of Python values to GValues is not going to see as much of an improvement (although there is some). Some updated micro benchmarks: Before: >>> %timeit Gtk.get_major_version() 100000 loops, best of 3: 3.48 us per loop After: >>> %timeit Gtk.get_major_version() 100000 loops, best of 3: 1.85 us per loop Before: >>> %timeit GLib.hostname_is_ip_address('192.168.1.1') 100000 loops, best of 3: 5.22 us per loop After: >>> %timeit GLib.hostname_is_ip_address('192.168.1.1') 100000 loops, best of 3: 2.82 us per loop Before: >>> adjust = Gtk.Adjustment() >>> %timeit adjust.configure(0, 0, 10, 0.1, 1.0, 1.0) 100000 loops, best of 3: 15.2 us per loop After: >>> adjust = Gtk.Adjustment() >>> %timeit adjust.configure(0, 0, 10, 0.1, 1.0, 1.0) 100000 loops, best of 3: 10.2 us per loop Before: >>> model = Gtk.ListStore(str, str, str, int, int, int) >>> columns = [0, 1, 2, 3, 4, 5] >>> row = ['a'*16, 'b'*16, 'c'*16, 1, 2, 3] >>> model.clear() >>> %timeit model.insert_with_valuesv(-1, columns, row) 100000 loops, best of 3: 19.6 us per loop After: ... >>> %timeit model.insert_with_valuesv(-1, columns, row) 100000 loops, best of 3: 14.9 us per loop
Notes: More testing has been completed, including armv6 architecture. Attachment 268480 [details] pushed as 5798f94 - Use ffi_call directly instead of g_callable_info_invoke
Wow, nice work Simon! With that, should we drop the --without-ffi configure option? It's pretty much mandatory now, isn't it?
(In reply to comment #4) > Wow, nice work Simon! > > With that, should we drop the --without-ffi configure option? It's pretty much > mandatory now, isn't it? Yep, I meant to check on that but missed it. We've always had a hard dependency on the ffi header files, so I'm not sure what that option was actually supposed to do?