Bug 696369 – GTask and GSimpleAsyncResult shouldn't keep a reference on the source object

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 696369 - GTask and GSimpleAsyncResult shouldn't keep a reference on the source object


Summary:	GTask and GSimpleAsyncResult shouldn't keep a reference on the source object


Status:	RESOLVED NOTABUG

Product:	glib
Classification:	Platform
Component:	gio
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-03-22 09:27 UTC by Mardy
Modified:	2013-03-23 20:38 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Mardy 2013-03-22 09:27:11 UTC

The reference counting in GTask (and GSimpleAsyncResult) is likely to create cyclic reference loops and forces an (IMHO) unnatural behaviour on the asynchronous calls.

Cyclic references: the object which creates the GTask (which is therefore the owner) is likely to be used as the source_object as well. If GTask adds a reference to it, the objects are holding a reference to each other.

Unnatural behaviour: because of the above, the object which implements the async call will not die until the async call has completed. Consider code like:

1    priv->baker = my_baker_new();
2    /* "self" now holds a reference to "baker" */
3    my_baker_bake_cake_async(priv->baker,
4                             priv->cancellable,
5                             on_cake_ready,
6                             self);
7    g_object_unref(priv->baker);
8    priv->baker = NULL;

Given that "self" is the only owner of "baker", it seems just obvious to expect that once "self" drops its reference to it, "baker" will be destroyed, or that at the very least the on_cake_ready() callback won't be invoked (I recognize that this will sound weird to developers accustomed to GLib, but in other object frameworks this is a very obvious statement).
Not respecting this expected behaviour is especially annoying if you think of having lines 7 and 8 in "self"'s dispose() method: the on_cake_ready() callback will be invoked after "self" has been destroyed, forcing the developer to use some tricks (weak pointers?) in order to prevent accessing an invalid pointer to "self".

Note that for GTask (and GSimpleAsyncResult) the reference to source_object is totally unnecessary: not only because GTask doesn't access source_object in any way (it just passes the pointer back to the callback) but especially because the GTask is owned by source_object, and we can be sure that source_object will always be alive while the GTask is. The source_object destructor should in fact destroy the GTask, in order not to leak it.


I understand that this is a behaviour that cannot be changed at this point, or it would break all the existing code. But we could add a

  g_task_set_reference_source_object(GTask *task, gboolean keep_reference);

which if called with "keep_reference == FALSE" drops the reference from the source_object (and a similar one for g_simple_async_result).

This would allow preserving the current behaviour or the (IMHO) saner one when dropping all the explicit references to an object means deleting it and stopping receiving any events from it.

Comment 1 Dan Winship 2013-03-22 13:31:05 UTC

(In reply to comment #0)
> Unnatural behaviour: because of the above, the object which implements the
> async call will not die until the async call has completed.

That is considered to be a feature. Experience with pre-gio async APIs in GNOME showed that trying to implement "cancel by unreffing" very often leads to bugs.

The gio behavior is also more compatible with bindings for garbage-collected languages, where you would inherently have this behavior of the source object remaining alive as long as there was some piece of code still referring to it.

> Given that "self" is the only owner of "baker", it seems just obvious to expect
> that once "self" drops its reference to it, "baker" will be destroyed, or that
> at the very least the on_cake_ready() callback won't be invoked (I recognize
> that this will sound weird to developers accustomed to GLib, but in other
> object frameworks this is a very obvious statement).

Again, past experience with async APIs in GNOME has shown that having callbacks that are sometimes invoked and sometimes not leads to bugs, and especially memory leaks. (Eg, if you need to allocate a struct to contain the data you want to pass to the async callback, and the callback never gets called, then you leak that struct.)

> Note that for GTask (and GSimpleAsyncResult) the reference to source_object is
> totally unnecessary... because the GTask is owned by source_object

If you look at the code in gio, this is not how GTasks are normally used. The object does not keep a pointer to the task, it just passes it along the chain of async calls and unrefs it at the end.



Note that in GTask (and in GSimpleAsyncResult if you call g_simple_async_result_set_check_cancellable()), it is guaranteed that if you call g_cancellable_cancel() on the operation's GCancellable, then when the callback is invoked, it will receive a G_IO_ERROR_CANCELLED error (even if the async operation code itself never looks at the GCancellable, and even if it calls g_task_return_pointer() or whatever to return a successful response). So that means that you can do basically what you want by doing:

   g_cancellable_cancel (priv->cancellable);
   g_object_unref (priv->baker);
   priv->baker = NULL;

and in on_cake_ready, do:

   cake = my_baker_bake_cake_finish (MY_BAKER (object), result, &error);
   if (g_error_matches (error, G_IO_ERROR, G_IO_ERROR_CANCELLED))
       return;

   /* normal callback activity here */

in this case the baker won't actually be finalized until on_cake_ready() is called and returns, but you don't have to worry about on_cake_ready() accessing any already-freed data, because you know it will bail out right at the beginning.

Comment 2 Mardy 2013-03-22 14:48:11 UTC

Hi Dan, first of all thanks for the suggestion: checking the GError type is a good enough solution for the case I was solving, so that's great. :-)

However, I disagree with you when you say that this is not a bug, for multiple reasons.

First of all, your practical suggestion of checking for the GError type works only if the object destruction is the only reason why the operation could be cancelled. But what if my "self" object from the example above had a cancel() method?
Then, it would implement like this:

    my_kitchen_cancel(self)
    {
        ....
        g_cancellable_cancel (self->priv->cancellable);
    }

but then, in the on_cake_ready() callback, I might not want to just return, but perform other actions (maybe emit a "cancelled" signal?), and I cannot do that, because I don't know if the operation was cancelled because my object was disposed or because the user requested the cancellation.


Secondly (though this is the most important reason for me), the current implementation leads to a lot of confusion over object ownership, which is evident by the fact that there is are cyclic references.

Your reply makes this confusion quite manifest:

(In reply to comment #1)
> > Note that for GTask (and GSimpleAsyncResult) the reference to source_object is
> > totally unnecessary... because the GTask is owned by source_object
> 
> If you look at the code in gio, this is not how GTasks are normally used. The
> object does not keep a pointer to the task, it just passes it along the chain
> of async calls and unrefs it at the end.

Keeping or not keeping a pointer is a detail: what matters are references. The GTask creator keeps a reference to it, and unreferences it at the end.
Which means that this object is the owner of the GTask, and that its lifetime is guaranteed to be longer than that of the GTask, which therefore shouldn't worry to keep a reference to it.

The reason why it does is that it wants to ensure that the source object does not die until the asynchronous operation terminates, which IMHO is very wrong. If I unreference the source object, and I'm the one who created it, I expect it to die immediately, and especially without kicking callbacks from the afterlife.

And this is especially weird for language bindings:

> The gio behavior is also more compatible with bindings for garbage-collected
> languages, where you would inherently have this behavior of the source object
> remaining alive as long as there was some piece of code still referring to it.

But *I* don't have any piece of code still referring to it!
See:

    def make_cake(self):
        baker = MyBaker()
        baker.bake_cake_async(None, self.on_cake_ready, None)
        # baker gets garbage-collected

    def on_cake_ready(self, source, result, data):
        print "Cake is ready!"

Every experienced python developer which is not familiar with GIO would say that the code above is wrong (unless on_cake_ready() is called before bake_cake_async() returns), because the callback will never be invoked since "baker" gets garbage-collected. And it's true that "baker" gets garbage collected, yet the callback is invoked from the GLib mainloop and the python object gets recreated (as the "source" variable).
Still you don't find it confusing?

Note that I understand that this might be convenient in some cases, so I'm not saying that it should be reverted (especially given the amount of code depending on this behaviour!). I'm just asking for the possibility of handling this differently, more in line with what other OO frameworks provide.

Comment 3 Simon Feltman 2013-03-23 03:51:40 UTC

(In reply to comment #2)
> But *I* don't have any piece of code still referring to it!
> See:
> 
>     def make_cake(self):
>         baker = MyBaker()
>         baker.bake_cake_async(None, self.on_cake_ready, None)
>         # baker gets garbage-collected
> 
>     def on_cake_ready(self, source, result, data):
>         print "Cake is ready!"
> 
> Every experienced python developer which is not familiar with GIO would say
> that the code above is wrong (unless on_cake_ready() is called before
> bake_cake_async() returns), because the callback will never be invoked since
> "baker" gets garbage-collected. And it's true that "baker" gets garbage
> collected, yet the callback is invoked from the GLib mainloop and the python
> object gets recreated (as the "source" variable).
> Still you don't find it confusing?

I partially agree with this, but only because I think it is semantically confusing, not necessarily because I think the implementation model is incorrect. In this case "baker.bake_cake_async" is a convenience method that has the side effect of adding a task holding itself to a global queue (or however it works internally). This is potentially non-obvious to an unknowing reader. In this sense it could be clearer if it was using a global queuing function or a method on a queue/loop object:

baker = MyBaker()
task = Gio.Task(baker, ...)
Gio.task_run_in_thread(task, baker.bake_cake_async)

By unrolling things a bit (and leaving out a lot of details), it might give the reader a different (more accurate) model of object ownership.

Out of my own curiosity what other OO frameworks are you talking about?

Comment 4 Mardy 2013-03-23 06:22:41 UTC

(In reply to comment #3)
> I partially agree with this, but only because I think it is semantically
> confusing, not necessarily because I think the implementation model is
> incorrect.

Note that the python example above is equally confusing (to me) even if rewritten in plain C. When I drop my reference to an object which I created myself, and which I didn't pass as parameter to another object's member, I expect the object to be deleted.
What this bug is about: please give developers a way (I can contribute the patch myself, if there's an agreement on the goal) to do this.

> baker = MyBaker()
> task = Gio.Task(baker, ...)
> Gio.task_run_in_thread(task, baker.bake_cake_async)
> 
> By unrolling things a bit (and leaving out a lot of details), it might give the
> reader a different (more accurate) model of object ownership.

Yes, that snippet above is quite clear: "baker" is used by "task", and "task" is not garbage-collected because it's being used by Gio.

> Out of my own curiosity what other OO frameworks are you talking about?

I'm talking of Qt, but I think that any OO framework where you don't have object reference counting will behave the same way.

Comment 5 Dan Winship 2013-03-23 13:15:17 UTC

(In reply to comment #4)
> What this bug is about: please give developers a way (I can contribute the
> patch myself, if there's an agreement on the goal) to do this.

The problem is that then there'd be two possible behaviors for gio async functions, and no way of knowing which was in effect for any given method.

If you want to have async methods that work in a non-gio style, then just don't use GAsyncReadyCallback and GAsyncResult.

Comment 6 Mardy 2013-03-23 13:36:53 UTC

(In reply to comment #5)
> The problem is that then there'd be two possible behaviors for gio async
> functions, and no way of knowing which was in effect for any given method.

Well, there are plenty of things in GLib/Gio which are not 100% consistent, but as long as they are well documented, people live happily with that.

Furthermore, you could simply just decide not to accept this behaviour in code submitted to Gio, but just provide it as a feature for other libraries.

> If you want to have async methods that work in a non-gio style, then just don't
> use GAsyncReadyCallback and GAsyncResult.

Why? There's nothing in these two classes which enforces the current behaviour.

Comment 7 Dan Winship 2013-03-23 20:38:11 UTC

(In reply to comment #6)
> > If you want to have async methods that work in a non-gio style, then just don't
> > use GAsyncReadyCallback and GAsyncResult.
> 
> Why? There's nothing in these two classes which enforces the current behaviour.

There's no *code*, but the docs for GAsyncResult enforce it. ("The callback for an asynchronous operation is called only once, and is always called, even in the case of a cancelled operation.")