Bug 651581 – Request to merge invoke-rewrite branch

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 651581 - Request to merge invoke-rewrite branch


Summary:	Request to merge invoke-rewrite branch


Status:	RESOLVED FIXED

Product:	pygobject
Classification:	Bindings
Component:	introspection
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Nobody's working on this now (help wanted and appreciated)
QA Contact:	Python bindings maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-05-31 21:42 UTC by johnp
Modified:	2011-07-18 15:12 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
valgrind log (56.61 KB, text/plain) 2011-06-21 07:14 UTC, Tomeu Vizoso	Details

Description johnp 2011-05-31 21:42:05 UTC

Right now the invoke-rewrite branch is stable and complete enough to be merged to master.  It passes all the tests and even cleans up after itself when an error is raised while processing input parameters (something the current code doesn't do).  Here is a brief list of the advantages of this branch:

* over 2 times faster running thought the test suite than the current implementation.

* more robust - handles a couple of corner cases which the current code does not such as multiple lists pointing to the same aux length value, having argument 0 be a length value and correctly blocking all NULL values which are not marked as allow-none

* correctly handles argument parameter counting and reduces the amount of complicated math used to determine argument counts and indexes

* splits code into logical layers such as caching, invoking, marshalling and cleanup layers

* splits code into manageable units so, for example, if there is an error in list processing it is easier to set break points and focus on the list code instead of wading through large switch statements

* ripe for even more optimizations and better layout for profiling

Things that are yet to be done:

* Memory management still leaks and some types such as gclosures are not yet cleaned

* Some of the earlier code like struct marshalling can be split up further

* signals and closures still use the old marshalling paths

* remove the FFI/GArgument layer and marshal directly to FFI

Why merge now?

Code rots and getting more people to use the new codepath makes it easier to fix bugs.  It is already feature complete with the old code and simply trades off some new bugs while fixing a slew of other bugs.  Because of the caching layer the code is somewhat more complex but follows a much more consistent pattern.  In my eyes it is easier to understand once learned but developers need ramp up time since the structure is different.  This is the future, there is no need to keep bandaging the old invoke code. As we get to more and more corner cases it gets harder to patch the old code without breaking other pieces or causing memory leaks that are hard to track down and plug (see garray hack we added to handle C Array vs GArray failures).

A brief overrview of the design:

  New modules
      * pygi-invoke-ng - the new invoke API handles setup/invoking/teardown at a high level. Replaces pygi-invoke and does not attempt to handle memory it did not itself create (see garray handling for example)
      * pygi-invoke-state-structure - the state struct used to pass around changing state during the caching, marshalling, invoking and cleanup stages
      * pygi-cache - sets up the cache which normalizes the information and support methods such as marshalling and cleanup of each interface when called.  This allows us to process method calls in a more efficent manner as well as makes it easier to debug
      * pygi-marshal - marshalling functions for each parameter type. Replace pygi-arguments
      * pygi-marshal-cleanup - cleanup functions for each parameter type.  Includes cleanup for parameters which were not processed due to an error.

Basic flow:

  invoke is called on an GICallableInfo object
    |
    |-> if GICallableInfo does not have a cache we loop over the interface 
    |   creating the PyGICallableCache and ArgCaches for the interface
    |   and its arguments
    |
    |----> use the cache to marshal the parameters into the state struct
    |
    |---------> using the marshalled values invoke the interface using the GI
    |           FFI calls
    |
    |----> use the cache to cleanup any in values that need cleaning
    |
    |---------> use the cache to marshal the out values to python objects
    |
    |----> use the cache to cleanup any out values that need cleaning
    |
    |-> cleanup state
    |
  return out values to python app

Comment 1 Tomeu Vizoso 2011-06-21 07:14:37 UTC

Created attachment 190340 [details]
valgrind log

Do tests pass for you? There seem to be some memory issues at least on 32bit.

Comment 2 johnp 2011-06-21 15:49:47 UTC

Yes, they pass on 64bit.  This seems like an easy fix but I don't have 32bit installed here.  Strange since most of the code is copy and paste from pygi-arguments but there may need to be an extra check in one of the marshallers.

Comment 3 Tomeu Vizoso 2011-06-22 06:31:42 UTC

(In reply to comment #2)
> Yes, they pass on 64bit.  This seems like an easy fix but I don't have 32bit
> installed here.  Strange since most of the code is copy and paste from
> pygi-arguments but there may need to be an extra check in one of the
> marshallers.

If you have a proposed fix, I can test it out here.

Comment 4 johnp 2011-06-24 04:47:24 UTC

Hitting some other issues but I quickly took a look at the code and most likely it is a false positive.  When marshalling we are looking for ValueErrors not OverflowErrors when numbers are out of range.  In the old code we used to encode the ranges as python objects and do the checks that way.  In the new code we do the opposite.  We decode and check the ranges in C because this is way faster.  Unfortunately this means that if the C storage type is smaller than what python has encoded, it throws an OverflowError when it decodes.  I think the tests are looking for ValueErrors.

ABI compliant fix would be to always marshal into the biggest C integer type available, do the checks and then cast to the correct type when stuffing into the GArgument.

We could also break ABI and correctly throw OverflowErrors and then not worry about it but then the error message is inconstant.

Right now GValues are giving us bad reads and writes so I am concentrating on them.

Comment 5 johnp 2011-06-24 15:55:16 UTC

Fixed the OverflowErrors by clearing them and replacing them with ValueErrors in the marshaller.  This fixed the tests.  I still have an issue with 32bit GValue which might also be an issue with the older code too.

invoke-rewrite bug: https://bugzilla.gnome.org/show_bug.cgi?id=653181

old invoke bug: https://bugzilla.gnome.org/show_bug.cgi?id=653151

Comment 6 johnp 2011-06-24 21:45:29 UTC

Fixed the memory corruption issue on 32bit systems by using g_slice_new0(GValue) instead of g_malloc0(gi_element_size).  I think GI gives back the wrong size gor GValues.

Comment 7 Tomeu Vizoso 2011-06-29 09:11:55 UTC

Hi, this is just the beginning, more to follow soon. So far it looks like great work, hope we can land it really soon from now.

General considerations:

- Do you have already an idea of what's the memory penalty of having the cache around?

- Whitespace is inconsistent, but maybe we should just do what we did for pygi and run a code formatter and add a git commit hook, after this lands.

+AC_ARG_ENABLE(invoke-ng,

What's the reason for keeping the old code around?

+       pygi-invoke-ng.c \

Related, I don't think we should have references in the code to "ng", we have git's history to find out what happened.

+    GIArgument *default_value;

Is it intentional?

+    gboolean is_method;
+    gboolean is_constructor;
+    gboolean is_vfunc;
+    gboolean is_callback;

Shouldn't these be mutually exclusive?

   1569 +    if (type == GI_INFO_TYPE_VFUNC)
   1570 +        cache->is_vfunc = TRUE;
   1571 +    else if (type == GI_INFO_TYPE_CALLBACK)
   1572 +        cache->is_callback = TRUE;
   1573 +
   1574 +    if (cache == NULL)
   1575 +        return NULL;

The check should go a bit before.

    cache->name = g_base_info_get_name ((GIBaseInfo *)callable_info);

I don't think we should store the name in the cache if it's just used during debugging and for reporting errors. In those cases we can always call g_base_info_get_name and be it.

_pygi_callable_cache_new (GICallableInfo *callable_info)
{
    PyGICallableCache *cache = _callable_cache_new_from_callable_info (callable_info);

I'm confused by having all of _pygi_callable_cache_new, _callable_cache_new_from_callable_info and _args_cache_generate. I would consider merging them into _pygi_callable_cache_new or finding a more meaningful way to split them.

        instance_cache->in_marshaller = _pygi_marshal_in_interface_instance;

Are you completely sure of the convenience of using function pointers instead of just calling one function or the other based on the other fields in the cache struct? Those function pointers hurt readability and debugging.

typedef enum {
  /* Not an AUX type */
  PYGI_AUX_TYPE_NONE   = 0,
  /* AUX type handled by parent */
  PYGI_AUX_TYPE_IGNORE = 1,
  /* AUX type has an associated pyarg which is modified by parent */
  PYGI_AUX_TYPE_HAS_PYARG = 2
} PyGIAuxType;

If the identifiers cannot be made more clear, then we should make those comments more verbose and explain what is meant by AUX, by parent and by being modified.

Comment 8 johnp 2011-06-29 16:38:59 UTC

(In reply to comment #7)
> Hi, this is just the beginning, more to follow soon. So far it looks like great
> work, hope we can land it really soon from now.
> 
> General considerations:
> 
> - Do you have already an idea of what's the memory penalty of having the cache
> around?

Not really.  I'll look into figuring out how much memory the cache uses during the tests.  Eventually we might want to have a configured limit and some sort of queue that Garbage collects the least used cached interfaces.

> - Whitespace is inconsistent, but maybe we should just do what we did for pygi
> and run a code formatter and add a git commit hook, after this lands.

Ya, I tried to keep it constant but we are still not consistent between the old code and the new code so I may have jumped in-between by accident.  I would say code format everything after the merge.

> +AC_ARG_ENABLE(invoke-ng,
> 
> What's the reason for keeping the old code around?

It was easy for me to jump back and forth to examine behaviour, copy paste, etc.  Also we still use some of the old code paths for properties and such (things that need marshalling but aren't invoked).  Invoke itself can go away now. 

> +       pygi-invoke-ng.c \
> 
> Related, I don't think we should have references in the code to "ng", we have
> git's history to find out what happened.

I did that at first but splitting the files made merging much easier expecially when code was being written in the old branch.  We can always rename after the merge. 

> +    GIArgument *default_value;
> 
> Is it intentional?

It is a shim and isn't hooked up yet but the code it pretty much there to support default values (or at least the infrastructure if not the glue).

> +    gboolean is_method;
> +    gboolean is_constructor;
> +    gboolean is_vfunc;
> +    gboolean is_callback;
> 
> Shouldn't these be mutually exclusive?

I guess we could just use the GI_INFO_TYPE enum.  It makes code look nice to say if (is_method) and not have to remember the names of the enums though. The way the cache is set up is to mentally map exactly how the interface will be processed.  Now that that is pretty much set we can start to optimize.

>    1569 +    if (type == GI_INFO_TYPE_VFUNC)
>    1570 +        cache->is_vfunc = TRUE;
>    1571 +    else if (type == GI_INFO_TYPE_CALLBACK)
>    1572 +        cache->is_callback = TRUE;
>    1573 +
>    1574 +    if (cache == NULL)
>    1575 +        return NULL;
> 
> The check should go a bit before.

Fixed

>     cache->name = g_base_info_get_name ((GIBaseInfo *)callable_info);
> 
> I don't think we should store the name in the cache if it's just used during
> debugging and for reporting errors. In those cases we can always call
> g_base_info_get_name and be it.

The one reason we may want to do it this way is it is easier to debug since you simply print out the struct.  It is a const so all you lose is a pointer.

> _pygi_callable_cache_new (GICallableInfo *callable_info)
> {
>     PyGICallableCache *cache = _callable_cache_new_from_callable_info
> (callable_info);
> 
> I'm confused by having all of _pygi_callable_cache_new,
> _callable_cache_new_from_callable_info and _args_cache_generate. I would
> consider merging them into _pygi_callable_cache_new or finding a more
> meaningful way to split them.

This splits up the business logic of building each of the caches into their own separate units so it is easier to debug.  It is the first code I wrote so some of it may not have been as clear as it could have been but merging into a large function is what I was trying to avoid.  There may be duplicate code but you can look at a particular type of interface and know exactly how it is laid out.

>         instance_cache->in_marshaller = _pygi_marshal_in_interface_instance;
> 
> Are you completely sure of the convenience of using function pointers instead
> of just calling one function or the other based on the other fields in the
> cache struct? Those function pointers hurt readability and debugging.

This speeds things up as well as being less confusing in my opinion.  Once you have the cache set up the invoker no longer guesses which function it should call.  That is set in stone (caches should be considered immutable).  This way most of the branching happens once per interface during the caching phase.  This can be expanded to almost having no if statements in the marshallers themselves.  Of course I didn't go that far and as you see things like the struct marshaller could be broken down further.  Also, this approach work equally well for arguments, return values and elements of arrays, lists and hashes. 

> 
> typedef enum {
>   /* Not an AUX type */
>   PYGI_AUX_TYPE_NONE   = 0,
>   /* AUX type handled by parent */
>   PYGI_AUX_TYPE_IGNORE = 1,
>   /* AUX type has an associated pyarg which is modified by parent */
>   PYGI_AUX_TYPE_HAS_PYARG = 2
> } PyGIAuxType;
> 
> If the identifiers cannot be made more clear, then we should make those
> comments more verbose and explain what is meant by AUX, by parent and by being
> modified.

Ya I didn't have better names for them but in the design AUX types like length arguments for arrays don't marshal themselves since they don't have enough context to so we need to mark them and the marshallers need to ignore them.  Aux types are also random where they might appear in relation to their parent.  Some times they come before their parent and sometimes after.  Parents are the arguments who's marshaller will take care of marshalling the aux type.  For instance the array argument itself.

Comment 9 Tomeu Vizoso 2011-07-01 08:36:37 UTC

(In reply to comment #8)
> (In reply to comment #7)
> > Hi, this is just the beginning, more to follow soon. So far it looks like great
> > work, hope we can land it really soon from now.
> > 
> > General considerations:
> > 
> > - Do you have already an idea of what's the memory penalty of having the cache
> > around?
> 
> Not really.  I'll look into figuring out how much memory the cache uses during
> the tests.  Eventually we might want to have a configured limit and some sort
> of queue that Garbage collects the least used cached interfaces.

We may want to steal https://github.com/chergert/glrucache to put a limit on the amount of memory used for caches.

> > +AC_ARG_ENABLE(invoke-ng,
> > 
> > What's the reason for keeping the old code around?
> 
> It was easy for me to jump back and forth to examine behaviour, copy paste,
> etc.  Also we still use some of the old code paths for properties and such
> (things that need marshalling but aren't invoked).  Invoke itself can go away
> now. 

Can we do this right after the merge?

> > +       pygi-invoke-ng.c \
> > 
> > Related, I don't think we should have references in the code to "ng", we have
> > git's history to find out what happened.
> 
> I did that at first but splitting the files made merging much easier expecially
> when code was being written in the old branch.  We can always rename after the
> merge. 

Can we do this right after the merge?

> > +    GIArgument *default_value;
> > 
> > Is it intentional?
> 
> It is a shim and isn't hooked up yet but the code it pretty much there to
> support default values (or at least the infrastructure if not the glue).

Same as with keeping old code around, I think we shouldn't have future code either in master. It's great that the code structure will make it easier to support default values in the future but the code should reflect current functionality.

> > +    gboolean is_method;
> > +    gboolean is_constructor;
> > +    gboolean is_vfunc;
> > +    gboolean is_callback;
> > 
> > Shouldn't these be mutually exclusive?
> 
> I guess we could just use the GI_INFO_TYPE enum.  It makes code look nice to
> say if (is_method) and not have to remember the names of the enums though. The
> way the cache is set up is to mentally map exactly how the interface will be
> processed.  Now that that is pretty much set we can start to optimize.

I think it will help readability, as I wouldn't expect that construct to be used when they are mutually exclusive.

> >     cache->name = g_base_info_get_name ((GIBaseInfo *)callable_info);
> > 
> > I don't think we should store the name in the cache if it's just used during
> > debugging and for reporting errors. In those cases we can always call
> > g_base_info_get_name and be it.
> 
> The one reason we may want to do it this way is it is easier to debug since you
> simply print out the struct.  It is a const so all you lose is a pointer.

Ah, in that case it sounds great.

> > _pygi_callable_cache_new (GICallableInfo *callable_info)
> > {
> >     PyGICallableCache *cache = _callable_cache_new_from_callable_info
> > (callable_info);
> > 
> > I'm confused by having all of _pygi_callable_cache_new,
> > _callable_cache_new_from_callable_info and _args_cache_generate. I would
> > consider merging them into _pygi_callable_cache_new or finding a more
> > meaningful way to split them.
> 
> This splits up the business logic of building each of the caches into their own
> separate units so it is easier to debug.  It is the first code I wrote so some
> of it may not have been as clear as it could have been but merging into a large
> function is what I was trying to avoid.

Fine, but then we should find descriptive names. Right now I don't know how the logic is split between those funcs without reading their bodies.

> There may be duplicate code but you
> can look at a particular type of interface and know exactly how it is laid out.

The duplicate code should be dealt with unless it's really trivial. We cannot expect that people will remember to fix the same bug in every instance of the duplication.

> >         instance_cache->in_marshaller = _pygi_marshal_in_interface_instance;
> > 
> > Are you completely sure of the convenience of using function pointers instead
> > of just calling one function or the other based on the other fields in the
> > cache struct? Those function pointers hurt readability and debugging.
> 
> This speeds things up as well as being less confusing in my opinion.  Once you
> have the cache set up the invoker no longer guesses which function it should
> call.  That is set in stone (caches should be considered immutable).  This way
> most of the branching happens once per interface during the caching phase. 
> This can be expanded to almost having no if statements in the marshallers
> themselves.  Of course I didn't go that far and as you see things like the
> struct marshaller could be broken down further.  Also, this approach work
> equally well for arguments, return values and elements of arrays, lists and
> hashes. 

I think it's far too early to talk about optimizing branching with function pointers but I will keep reading and see how it affects readability.

> > typedef enum {
> >   /* Not an AUX type */
> >   PYGI_AUX_TYPE_NONE   = 0,
> >   /* AUX type handled by parent */
> >   PYGI_AUX_TYPE_IGNORE = 1,
> >   /* AUX type has an associated pyarg which is modified by parent */
> >   PYGI_AUX_TYPE_HAS_PYARG = 2
> > } PyGIAuxType;
> > 
> > If the identifiers cannot be made more clear, then we should make those
> > comments more verbose and explain what is meant by AUX, by parent and by being
> > modified.
> 
> Ya I didn't have better names for them but in the design AUX types like length
> arguments for arrays don't marshal themselves since they don't have enough
> context to so we need to mark them and the marshallers need to ignore them. 
> Aux types are also random where they might appear in relation to their parent. 
> Some times they come before their parent and sometimes after.  Parents are the
> arguments who's marshaller will take care of marshalling the aux type.  For
> instance the array argument itself.

Ok, but the reader of the code also needs to know all that :) So I think we should try harder to find better names in the identifiers, and make the comments clearer if they are still needed.

Comment 10 Tomeu Vizoso 2011-07-01 09:47:19 UTC

_arg_cache_new_from_type_info (GITypeInfo *type_info,
                               PyGICallableCache *callable_cache,
                               GIArgInfo *arg_info,
                               GITypeTag type_tag,
                               GITransfer transfer,
                               GIDirection direction,
                               gint c_arg_index,
                               gint py_arg_index)

AFAICS, this is always called with g_type_info_get_tag (type_info) as the type_tag argument. This is redundant and I think we should try hard to avoid functions with so may arguments. Haven't checked the other arguments but would be good to remove as many as possible. If after profiling we decide that we need to reuse any precomputed values, that should be done then and not now.

static inline PyGIArgCache *
_arg_cache_new (void)
{
    return g_slice_new0 (PyGIArgCache);
}

static inline void
_arg_cache_in_void_setup (PyGIArgCache *arg_cache)
{
    arg_cache->in_marshaller = _pygi_marshal_in_void;
}

I'm not sure whether those one-liners make the code more or less readable, up to you. Also, the compiler knows probably better than us whether inlining them will be of benefit. I would remove all of the inline keywords and only put them in place if we find that it helps performance or object size (doubtful).

static inline void
_arg_cache_out_interface_callback_setup (void)
{
    PyErr_Format(PyExc_NotImplementedError,
                 "Callback returns are not supported");
}

I think it's a bit dangerous to set an exception without returning NULL there. I would remove this function and move the message to where it's called, so it's next to the return NULL.

That's all about pygi-cache.*

Comment 11 johnp 2011-07-05 18:46:19 UTC

(In reply to comment #9)

> > > What's the reason for keeping the old code around?
> > 
> > It was easy for me to jump back and forth to examine behaviour, copy paste,
> > etc.  Also we still use some of the old code paths for properties and such
> > (things that need marshalling but aren't invoked).  Invoke itself can go away
> > now. 
> 
> Can we do this right after the merge?

Yep, right after the merge this should be done though arguments.[c|h] is still needed for signal processing for now.

> > > +       pygi-invoke-ng.c \
> > > 
> > > Related, I don't think we should have references in the code to "ng", we have
> > > git's history to find out what happened.
> > 
> > I did that at first but splitting the files made merging much easier expecially
> > when code was being written in the old branch.  We can always rename after the
> > merge. 
> 
> Can we do this right after the merge?

Yes this will be easy as a git-mv.  Doing it after the merge means an easier merge and much more readable history.

> > > +    GIArgument *default_value;
> > > 
> > > Is it intentional?
> > 
> > It is a shim and isn't hooked up yet but the code it pretty much there to
> > support default values (or at least the infrastructure if not the glue).
> 
> Same as with keeping old code around, I think we shouldn't have future code
> either in master. It's great that the code structure will make it easier to
> support default values in the future but the code should reflect current
> functionality.

fixed - just removed the pointer.

> > > +    gboolean is_method;
> > > +    gboolean is_constructor;
> > > +    gboolean is_vfunc;
> > > +    gboolean is_callback;
> > > 
> > > Shouldn't these be mutually exclusive?
> > 
> > I guess we could just use the GI_INFO_TYPE enum.  It makes code look nice to
> > say if (is_method) and not have to remember the names of the enums though. The
> > way the cache is set up is to mentally map exactly how the interface will be
> > processed.  Now that that is pretty much set we can start to optimize.
> 
> I think it will help readability, as I wouldn't expect that construct to be
> used when they are mutually exclusive.

So actually this is an amalgamation of a bunch of different GI constructs so if we wanted to remove the booleans I would have to construct a new Enum.  For instance is_method and is_constructor comes from the GICallableInfo flags.  We definitely do not want to be bit comparing flags everywhere as it impedes redability.  is_vfunc and is_callback comes from the GIInfoType and are proper enums.  Before I write this code are you ok with me creating a new enum called PyGICallableType?
 
> > > _pygi_callable_cache_new (GICallableInfo *callable_info)
> > > {
> > >     PyGICallableCache *cache = _callable_cache_new_from_callable_info
> > > (callable_info);
> > > 
> > > I'm confused by having all of _pygi_callable_cache_new,
> > > _callable_cache_new_from_callable_info and _args_cache_generate. I would
> > > consider merging them into _pygi_callable_cache_new or finding a more
> > > meaningful way to split them.
> > 
> > This splits up the business logic of building each of the caches into their own
> > separate units so it is easier to debug.  It is the first code I wrote so some
> > of it may not have been as clear as it could have been but merging into a large
> > function is what I was trying to avoid.
> 
> Fine, but then we should find descriptive names. Right now I don't know how the
> logic is split between those funcs without reading their bodies.

I'll work on this this week.
 
> > There may be duplicate code but you
> > can look at a particular type of interface and know exactly how it is laid out.
> 
> The duplicate code should be dealt with unless it's really trivial. We cannot
> expect that people will remember to fix the same bug in every instance of the
> duplication.

Most of the duplication is due to the need to order setting values getting set because we cascade for things like in/out parameters.  I'll take a look into simplifying it.

> > >         instance_cache->in_marshaller = _pygi_marshal_in_interface_instance;
> > > 
> > > Are you completely sure of the convenience of using function pointers instead
> > > of just calling one function or the other based on the other fields in the
> > > cache struct? Those function pointers hurt readability and debugging.
> > 
> > This speeds things up as well as being less confusing in my opinion.  Once you
> > have the cache set up the invoker no longer guesses which function it should
> > call.  That is set in stone (caches should be considered immutable).  This way
> > most of the branching happens once per interface during the caching phase. 
> > This can be expanded to almost having no if statements in the marshallers
> > themselves.  Of course I didn't go that far and as you see things like the
> > struct marshaller could be broken down further.  Also, this approach work
> > equally well for arguments, return values and elements of arrays, lists and
> > hashes. 
> 
> I think it's far too early to talk about optimizing branching with function
> pointers but I will keep reading and see how it affects readability.

This was the whole premiss of my rewrite.  The function pointers make it easy to do optimizations in the future without effecting whole blocks of code.  It allows one code path for each marshalling type and means when bugs pop up it is easier to pinpoint where it is happening and fix it without breaking other code.  It is all about optimization.  And I have shown that the changes as a whole have sped up this branch more than two fold even if I haven't specifically profiled each individual function. It reduces the clutter of branch statements which in my opinion was the major cause of readability issues in the old branch.  In other words I stand by this design decision and it will take a well written objection for me to change it.

> > > typedef enum {
> > >   /* Not an AUX type */
> > >   PYGI_AUX_TYPE_NONE   = 0,
> > >   /* AUX type handled by parent */
> > >   PYGI_AUX_TYPE_IGNORE = 1,
> > >   /* AUX type has an associated pyarg which is modified by parent */
> > >   PYGI_AUX_TYPE_HAS_PYARG = 2
> > > } PyGIAuxType;
> > > 
> > > If the identifiers cannot be made more clear, then we should make those
> > > comments more verbose and explain what is meant by AUX, by parent and by being
> > > modified.
> > 
> > Ya I didn't have better names for them but in the design AUX types like length
> > arguments for arrays don't marshal themselves since they don't have enough
> > context to so we need to mark them and the marshallers need to ignore them. 
> > Aux types are also random where they might appear in relation to their parent. 
> > Some times they come before their parent and sometimes after.  Parents are the
> > arguments who's marshaller will take care of marshalling the aux type.  For
> > instance the array argument itself.
> 
> Ok, but the reader of the code also needs to know all that :) So I think we
> should try harder to find better names in the identifiers, and make the
> comments clearer if they are still needed.

How does this enum feel to you:

/* Argument meta types denote how we process the argument:
 *  - Parents (PYGI_META_ARG_TYPE_PARENT) may or may not have children
 *    but are always processed via the normal marshaller for their
 *    actual GI type.  If they have children the marshaller will
 *    also handle marshalling the children.
 *  - Children without python argument (PYGI_META_ARG_TYPE_CHILD) are
 *    ignored by the marshallers and handled directly by their parents
 *    marshaller.
 *  - Children with pyargs (PYGI_META_ARG_TYPE_CHILD_WITH_PYARG) are processed
 *    the same as other child args but also have an index into the 
 *    python parameters passed to the invoker
 */
typedef enum {
    PYGI_META_ARG_TYPE_PARENT,
    PYGI_ARG_TYPE_CHILD,
    PYGI_ARG_TYPE_CHILD_WITH_PYARG
} PyGIMetaArgType;

Comment 12 johnp 2011-07-08 16:53:26 UTC

I fixed the issues I had questions on since I need to get this done sooner than later.  I have an issue with another request of yours that I will post in the next comment.

Comment 13 johnp 2011-07-08 17:18:12 UTC

(In reply to comment #10)
> _arg_cache_new_from_type_info (GITypeInfo *type_info,
>                                PyGICallableCache *callable_cache,
>                                GIArgInfo *arg_info,
>                                GITypeTag type_tag,
>                                GITransfer transfer,
>                                GIDirection direction,
>                                gint c_arg_index,
>                                gint py_arg_index)
> 
> AFAICS, this is always called with g_type_info_get_tag (type_info) as the
> type_tag argument. This is redundant and I think we should try hard to avoid
> functions with so may arguments. Haven't checked the other arguments but would
> be good to remove as many as possible. If after profiling we decide that we
> need to reuse any precomputed values, that should be done then and not now.

The type tag is the only one that can be removed as it is the only generic argument that can be accessed from one API.  That is a trivial change that I will fix.

The other parameters are all necessary. This is because we call this function three times in _args_cache_generate.  The first time we call it for the return value which uses different APIs to determine arg_info and transfer than regular parameters.  For direction, c_arg_index and py_arg_index we pass in defaults.

The second time we call is when we want info for the instance type, in which case transfer and direction and py_arg_index are manually set to defaults. 

The third time we call it is in the loop for parameter in which case all inputs are generated via GI API or counters.

So you see, there is no simplifying this function signature other than the type tag.  Special casing for return and instance would complicate the code even more.  As for the name it does what it says and creates a new cache from the type info.

> static inline PyGIArgCache *
> _arg_cache_new (void)
> {
>     return g_slice_new0 (PyGIArgCache);
> }

This is there as a convenience function because _arg_cache_new_from_type_info may create a base arg cache for base types like int or it may create larger caches like sequences or hash caches that inherit from an arg cache.  I can make this clearer by calling it _arg_cache_alloc and _sequence_cache_alloc, etc. and also move the new_from_type_info methods to simply become _new.  I think that is the best course of action here. 

> static inline void
> _arg_cache_in_void_setup (PyGIArgCache *arg_cache)
> {
>     arg_cache->in_marshaller = _pygi_marshal_in_void;
> }
> 
> I'm not sure whether those one-liners make the code more or less readable, up
> to you. Also, the compiler knows probably better than us whether inlining them
> will be of benefit. I would remove all of the inline keywords and only put them
> in place if we find that it helps performance or object size (doubtful).

I will remove the inline but the setup methods all follow a pattern so if you are having issues with a particular type you can put breakpoints in the cache methods, to make sure they are getting set correctly and in the marshallers. 

> static inline void
> _arg_cache_out_interface_callback_setup (void)
> {
>     PyErr_Format(PyExc_NotImplementedError,
>                  "Callback returns are not supported");
> }
> 
> I think it's a bit dangerous to set an exception without returning NULL there.
> I would remove this function and move the message to where it's called, so it's
> next to the return NULL.

You are correct. I should print out warnings here and error out if the API is actually used.  The question is, should they be GLib warnings or Python warnings?

cool, looks like we are on the same page for most of it and I fixed up most of the issues you had with the code including consolidating _pygi_callable_cache_new and _pygi_callable_cache_new_from_type info.  If we can sit down next week we should be able to finish the review.  I need to move on to some fedora stuff and I want most of my time spent on pygobject fixing bugs in master going forward.

Comment 14 Tomeu Vizoso 2011-07-11 07:57:38 UTC

(In reply to comment #11)
> (In reply to comment #9)
> 
> > > > What's the reason for keeping the old code around?
> > > 
> > > It was easy for me to jump back and forth to examine behaviour, copy paste,
> > > etc.  Also we still use some of the old code paths for properties and such
> > > (things that need marshalling but aren't invoked).  Invoke itself can go away
> > > now. 
> > 
> > Can we do this right after the merge?
> 
> Yep, right after the merge this should be done though arguments.[c|h] is still
> needed for signal processing for now.

Pity :/ Do you see any complexity in using the new marshalling functions?

> > > > +    gboolean is_method;
> > > > +    gboolean is_constructor;
> > > > +    gboolean is_vfunc;
> > > > +    gboolean is_callback;
> > > > 
> > > > Shouldn't these be mutually exclusive?
> > > 
> > > I guess we could just use the GI_INFO_TYPE enum.  It makes code look nice to
> > > say if (is_method) and not have to remember the names of the enums though. The
> > > way the cache is set up is to mentally map exactly how the interface will be
> > > processed.  Now that that is pretty much set we can start to optimize.
> > 
> > I think it will help readability, as I wouldn't expect that construct to be
> > used when they are mutually exclusive.
> 
> So actually this is an amalgamation of a bunch of different GI constructs so if
> we wanted to remove the booleans I would have to construct a new Enum.  For
> instance is_method and is_constructor comes from the GICallableInfo flags.  We
> definitely do not want to be bit comparing flags everywhere as it impedes
> redability.  is_vfunc and is_callback comes from the GIInfoType and are proper
> enums.  Before I write this code are you ok with me creating a new enum called
> PyGICallableType?

Yeah, a flag would be about as bad because those are exclusive. An enum would be great.

> > > >         instance_cache->in_marshaller = _pygi_marshal_in_interface_instance;
> > > > 
> > > > Are you completely sure of the convenience of using function pointers instead
> > > > of just calling one function or the other based on the other fields in the
> > > > cache struct? Those function pointers hurt readability and debugging.
> > > 
> > > This speeds things up as well as being less confusing in my opinion.  Once you
> > > have the cache set up the invoker no longer guesses which function it should
> > > call.  That is set in stone (caches should be considered immutable).  This way
> > > most of the branching happens once per interface during the caching phase. 
> > > This can be expanded to almost having no if statements in the marshallers
> > > themselves.  Of course I didn't go that far and as you see things like the
> > > struct marshaller could be broken down further.  Also, this approach work
> > > equally well for arguments, return values and elements of arrays, lists and
> > > hashes. 
> > 
> > I think it's far too early to talk about optimizing branching with function
> > pointers but I will keep reading and see how it affects readability.
> 
> This was the whole premiss of my rewrite.  The function pointers make it easy
> to do optimizations in the future without effecting whole blocks of code.  It
> allows one code path for each marshalling type and means when bugs pop up it is
> easier to pinpoint where it is happening and fix it without breaking other
> code.  It is all about optimization.  And I have shown that the changes as a
> whole have sped up this branch more than two fold even if I haven't
> specifically profiled each individual function. It reduces the clutter of
> branch statements which in my opinion was the major cause of readability issues
> in the old branch.  In other words I stand by this design decision and it will
> take a well written objection for me to change it.

Fair enough, I like how the marshalling functions are set during cache construction and not during invocation as that cannot change between invocations.

> > > > typedef enum {
> > > >   /* Not an AUX type */
> > > >   PYGI_AUX_TYPE_NONE   = 0,
> > > >   /* AUX type handled by parent */
> > > >   PYGI_AUX_TYPE_IGNORE = 1,
> > > >   /* AUX type has an associated pyarg which is modified by parent */
> > > >   PYGI_AUX_TYPE_HAS_PYARG = 2
> > > > } PyGIAuxType;
> > > > 
> > > > If the identifiers cannot be made more clear, then we should make those
> > > > comments more verbose and explain what is meant by AUX, by parent and by being
> > > > modified.
> > > 
> > > Ya I didn't have better names for them but in the design AUX types like length
> > > arguments for arrays don't marshal themselves since they don't have enough
> > > context to so we need to mark them and the marshallers need to ignore them. 
> > > Aux types are also random where they might appear in relation to their parent. 
> > > Some times they come before their parent and sometimes after.  Parents are the
> > > arguments who's marshaller will take care of marshalling the aux type.  For
> > > instance the array argument itself.
> > 
> > Ok, but the reader of the code also needs to know all that :) So I think we
> > should try harder to find better names in the identifiers, and make the
> > comments clearer if they are still needed.
> 
> How does this enum feel to you:

Looks good, but may be better if we can explain why there is a hierarchy.

Comment 15 Tomeu Vizoso 2011-07-11 08:38:51 UTC

gi/pygi-invoke-state-struct.h

Why its own file?

    /* We don't use the class parameter sent in by  the structure
     * so we remove it from the py_args tuple but we keep it 
     * around just in case we want to call actual gobject constructors
     * in the future instead of calling g_object_new
     */

I think this should be at most a TODO comment, without any unneeded code. Also, I don't think we'll want to use the _new* functions as those are mere convenience for C users (and should probably be skipped in the .gir).

        if (state->implementor_gtype == 0)
            return FALSE;

Shouldn't be setting an error string?

                if (arg_cache->py_arg_index >= state->n_py_in_args) {
                    PyErr_Format (PyExc_TypeError,
                                  "%s() takes exactly %zd argument(s) (%zd given)",
                                   cache->name,
                                   cache->n_py_args,
                                   state->n_py_in_args);

                    /* clean up all of the args we have already marshalled,
                     * since invoke will not be called
                     */
                    pygi_marshal_cleanup_args_in_parameter_fail (state,
                                                                 cache,
                                                                 i - 1);
                    return FALSE;
                }

This can only happen because of a bug in pygi-cache.c, right? If so, it should probably be an assertion (would be good to not have duplicated error messages as well).

                if (arg_cache->aux_type != PYGI_AUX_TYPE_IGNORE) {
                    if (arg_cache->py_arg_index >= state->n_py_in_args) {

Same here.

                if (arg_cache->is_caller_allocates) {

Maybe put all the allocation code in its own function so _invoke_marshal_in_args itself doesn't do so much work?

    arg->v_pointer = py_arg;

I vaguely remember that we had decided not to support passing PyObjects to the C side, because of having no guarantees of whether the object will still be alive.

pygi-marshal.c

What do you think of splitting it in pygi-marshal-in.c and pygi-marshal-out.c ? At 2145 lines is one of the biggest files in pygobject.

    /* assume hashtable has boxed key and value */

Should this be a FIXME? If so, can we check if those are indeed boxed and raise an Unimplemented exception otherwise?

That's all, let's merge it!

Comment 16 johnp 2011-07-13 18:35:07 UTC

(In reply to comment #15)
> gi/pygi-invoke-state-struct.h
> 
> Why its own file?

Dependency issues when including the header files.

>     /* We don't use the class parameter sent in by  the structure
>      * so we remove it from the py_args tuple but we keep it 
>      * around just in case we want to call actual gobject constructors
>      * in the future instead of calling g_object_new
>      */
> 
> I think this should be at most a TODO comment, without any unneeded code. Also,
> I don't think we'll want to use the _new* functions as those are mere
> convenience for C users (and should probably be skipped in the .gir).

Unfortunately a lot of things still can't be done without using the proper new function so they should be kept.  But in any case that is a general design question and not really in the scope of the invoke rewrite

>         if (state->implementor_gtype == 0)
>             return FALSE;
> 
> Shouldn't be setting an error string?
> 
>                 if (arg_cache->py_arg_index >= state->n_py_in_args) {
>                     PyErr_Format (PyExc_TypeError,
>                                   "%s() takes exactly %zd argument(s) (%zd
> given)",
>                                    cache->name,
>                                    cache->n_py_args,
>                                    state->n_py_in_args);
> 
>                     /* clean up all of the args we have already marshalled,
>                      * since invoke will not be called
>                      */
>                     pygi_marshal_cleanup_args_in_parameter_fail (state,
>                                                                  cache,
>                                                                  i - 1);
>                     return FALSE;
>                 }
> 
> This can only happen because of a bug in pygi-cache.c, right? If so, it should
> probably be an assertion (would be good to not have duplicated error messages
> as well).

No this happens if the user does not supply enough input arguments.

>                 if (arg_cache->aux_type != PYGI_AUX_TYPE_IGNORE) {
>                     if (arg_cache->py_arg_index >= state->n_py_in_args) {
> 
> Same here.
> 
>                 if (arg_cache->is_caller_allocates) {
> 
> Maybe put all the allocation code in its own function so
> _invoke_marshal_in_args itself doesn't do so much work?

I could do.

>     arg->v_pointer = py_arg;
> 
> I vaguely remember that we had decided not to support passing PyObjects to the
> C side, because of having no guarantees of whether the object will still be
> alive.
> 

We need to support this for TreeModels as PyGTK does.

> pygi-marshal.c
> 
> What do you think of splitting it in pygi-marshal-in.c and pygi-marshal-out.c ?
> At 2145 lines is one of the biggest files in pygobject.

I don't see a reason.  It is easily searched not that it matters much either way.  

>     /* assume hashtable has boxed key and value */
> 
> Should this be a FIXME? If so, can we check if those are indeed boxed and raise
> an Unimplemented exception otherwise?

I would assume any API that sends in a hash table will have the proper free functions.  There is no real way to check that.

> That's all, let's merge it!

Comment 17 johnp 2011-07-13 19:44:25 UTC

Changes made.  Please review for any more needed changes and green light the merge otherwise.

Comment 18 Tomeu Vizoso 2011-07-16 07:48:30 UTC

(In reply to comment #16)
> (In reply to comment #15)
> > gi/pygi-invoke-state-struct.h
> > 
> > Why its own file?
> 
> Dependency issues when including the header files.

A forward declaration isn't enough? I think keeping the size of files balanced is very important, it makes it easier to know where to look at, and also what it is what you are looking at. That's why I suggested splitting pygi-marshal.c.

Think of a search tree, keeping it balanced make searches more efficient. If grep was always enough to find what you are looking for, it wouldn't matter, but often you don't have a good term to grep for and need to go through a lot of files.

> >     /* We don't use the class parameter sent in by  the structure
> >      * so we remove it from the py_args tuple but we keep it 
> >      * around just in case we want to call actual gobject constructors
> >      * in the future instead of calling g_object_new
> >      */
> > 
> > I think this should be at most a TODO comment, without any unneeded code. Also,
> > I don't think we'll want to use the _new* functions as those are mere
> > convenience for C users (and should probably be skipped in the .gir).
> 
> Unfortunately a lot of things still can't be done without using the proper new
> function so they should be kept.

That's a bug in the library that must be fixed, otherwise inheriting won't always work, etc.

> But in any case that is a general design
> question and not really in the scope of the invoke rewrite

Maybe I got confused by "in the future", I thought you meant that in the future we may decide to construct objects by default with _new functions rather than with g_object_newv.

> >     arg->v_pointer = py_arg;
> > 
> > I vaguely remember that we had decided not to support passing PyObjects to the
> > C side, because of having no guarantees of whether the object will still be
> > alive.
> > 
> 
> We need to support this for TreeModels as PyGTK does.

Well, but that's a broken feature because Python may decide to garbage collect that object by the time we take it out from the model. I'm not sure we should aim to support something that is broken by default and will cause crashes.

Why don't we support only Python suclasses of GObject and rely on GObject's recounting to keep the object around?

> > pygi-marshal.c
> > 
> > What do you think of splitting it in pygi-marshal-in.c and pygi-marshal-out.c ?
> > At 2145 lines is one of the biggest files in pygobject.
> 
> I don't see a reason.  It is easily searched not that it matters much either
> way.  

See above for the reason.

> >     /* assume hashtable has boxed key and value */
> > 
> > Should this be a FIXME? If so, can we check if those are indeed boxed and raise
> > an Unimplemented exception otherwise?
> 
> I would assume any API that sends in a hash table will have the proper free
> functions.  There is no real way to check that.

Ah, the comment sounded like a FIXME to me.

> > That's all, let's merge it!

I think we can merge it, but is unfortunate to have to keep the old pygi-argument.c file around because it's a lot of duplicated code.

Comment 19 johnp 2011-07-18 15:12:17 UTC

Ok, code merged and invoke-ng is now renamed back to invoke and there is only one invoke code path.  I also split the marshalling files into pygi-marshal-in.[c|h] and pygi-marshal-out.[c|h].  All issues should now be filed as new bugs.  I'm going to work on making this parallel installable with pygobject2 and then we can do a release.