After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 793802 - gtask: only free task data and source object from main-context thread
gtask: only free task data and source object from main-context thread
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: gio
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2018-02-25 04:22 UTC by Christian Hergert
Modified: 2018-05-24 20:15 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gtask: prototype to defer finalize to main context (3.08 KB, patch)
2018-02-27 20:58 UTC, Christian Hergert
none Details | Review

Description Christian Hergert 2018-02-25 04:22:08 UTC
I just noticed an assertion in Builder that would indicative some very bad behavior (especially given how insanely much we use GTask in Builder).

The assertion came from our GDestroyNotify for the GTask task-data being called from a non-main thread (which is the active thread when the task created, so therefore the thread for callbacks).

  ide-unsaved-files:ERROR:../src/libide/buffers/ide-unsaved-files.c:92:async_state_free: assertion failed: (IDE_IS_MAIN_THREAD ())

I wanted to check with those here and see if that could indeed happen, and if so, I think we should try really hard to prevent it.

Doing things like ref'ing a GtkWidget as task data could have disastrous behavior (and I know I'm not the only one to do this).
Comment 1 Philip Withnall 2018-02-26 11:16:09 UTC
(In reply to Christian Hergert from comment #0)
> The assertion came from our GDestroyNotify for the GTask task-data being
> called from a non-main thread

> (which is the active thread when the task
> created, so therefore the thread for callbacks).

That’s not 100% accurate. Callbacks are called in the GMainContext which was the thread-default main context at construction time of the GTask. While you’d endeavour to ensure that there is exactly one GMainContext per thread, and that it’s the thread default for the lifetime of that thread, this is a common bug in people’s code (and a hard one to spot unless you explicitly go looking for it).

That would be the first thing I’d check for; I would suspect the bug is in Builder, given how much code Builder has, and the fact that I haven’t seen a failure in the GTask code in other apps in the wild.

Generally, I check for this by:
 • ensuring that each GThreadFunc has a pair of g_main_context_[push|pop]_thread_default() calls bracketing everything it does
 • ensuring that g_main_context_new() is not called in odd places
 • ensuring that each call to g_main_context_new() has an associated g_main_context_[push|pop]_thread_default()

Please let me know what you find!
Comment 2 Christian Hergert 2018-02-26 21:38:01 UTC
(In reply to Philip Withnall from comment #1)
> 
> Please let me know what you find!

My cursory look over gtask.c would indicate this is more of a fundamental "where do you release task data" issue.

The issue at hand is that when returning a result for a GTask from the worker thread, the worker thread owns a reference to the task. Now, that thread may get very unluckily scheduled in such a way where the callback on the main thread fires immediately, and releases it's owning reference. Then we get back to the worker thread and it finally unref's it's last reference to the GTask.

Since the task_data is destroyed in finalize, we have a situation where finalizing happened on a thread different than the task callback.

There is a possible work-around I can do in Builder in that we know we never "return a gtask" while false-sharing the task_data, by creating a wrapper for g_task_new() which connects to notify::completed and manually calls g_task_set_task_data(task, NULL, NULL) which ensures immediate finalization of the task data.
Comment 3 Philip Withnall 2018-02-27 11:32:33 UTC
Ah, that makes sense. This looks like a specific instance of bug #681334.

I can’t immediately think of a great solution. Several options come to mind:
 1) Say this is your problem to deal with, and any GDestroyNotify you use for task data has to be thread safe. 
 2) Implement your workaround in GLib, and have it clear the task data just after emitting the ::completed signal.
 3) Defer calls to task_data_destroy and result_destroy to the correct GMainContext.
 4) Defer the entire g_task_finalize() to the correct GMainContext.

1) is the easiest for GLib, guarantees backwards compatibility with existing, exotic users of GTask, and doesn’t impose a performance overhead unless necessary.

2) seems like the most reasonable approach, with negligible performance impact, but there’s a risk that existing users of GTask depend on the task data not being freed until the GTask is freed. I’m not sure they would be reasonable to do so, though.

3) and 4) are roughly the same, add performance impact, affect the lifetime of the GTask so a load of unit tests will probably break, and generally don’t have much going for them.
Comment 4 Christian Hergert 2018-02-27 20:36:35 UTC
(In reply to Philip Withnall from comment #3)
> 1) is the easiest for GLib, guarantees backwards compatibility with
> existing, exotic users of GTask, and doesn’t impose a performance overhead
> unless necessary.

At the expense of existing and new code having nearly impossible to track down memory corruptions.

> 2) seems like the most reasonable approach, with negligible performance
> impact, but there’s a risk that existing users of GTask depend on the task
> data not being freed until the GTask is freed. I’m not sure they would be
> reasonable to do so, though.

Agreed, I don't think we should do this.

> 3) and 4) are roughly the same, add performance impact, affect the lifetime
> of the GTask so a load of unit tests will probably break, and generally
> don’t have much going for them.

There is a slight impact here, in that we need to check g_main_current_source()/get_context() for a match (like in g_task_return). I expect in most cases we'll have a match and can free on the immediate thread, given I've hit this case exactly once after adding these assertions to Builder a month ago.

However, if we are indeed finalized on the thread worker, we don't need to proxy the whole GObject to the main thread (as a zombie), but instead just a 2-pointer struct with the GDestroyNotify and task_data.
Comment 5 Christian Hergert 2018-02-27 20:39:42 UTC
Looking at g_task_finalize(), I guess we have another issue where the source_object too can be finalized on the thread. Yikes, this might be worse than I thought.
Comment 6 Christian Hergert 2018-02-27 20:58:39 UTC
Created attachment 369061 [details] [review]
gtask: prototype to defer finalize to main context

This defers the finalizing of some critical bits held in the task object
to the main context attached to the task.

My concern with this approach is whether or not we can guarantee that the
context will exist and be iterated by time the task is finalized.

One way to address that might be if contexts had parent contexts, so that
we can ensure they get exhausted on the proper thread.

Anyway, just meant to be an example of the issue at hand, not a
comprehensive solution.
Comment 7 Dan Winship 2018-03-01 15:25:58 UTC
> 3) Defer calls to task_data_destroy and result_destroy to the correct GMainContext.
> 4) Defer the entire g_task_finalize() to the correct GMainContext.

GTask doesn't currently assume that the context is still in use after the task completes; the caller might never iterate it again, in which case trying to defer cleanup to it would just cause a leak.
Comment 8 Christian Hergert 2018-03-12 08:44:11 UTC
I think this is further exacerbated by the fact that Vala now uses GTask to manage their async code. A quick look at some generated output shows unref's in their task_data closure free callback which could easily be widgets that were in scope of their closures.

One possibility might be to make sure that the g_task_run_in_thread() prevents the g_task_return_*() functions from doing anything until yielding back to the function calling the GTaskThreadFunc. Then, if it marshals the results *and* the task back to the threadfunc, we can ensure they don't race against each other.
Comment 9 Carlos Garnacho 2018-03-12 10:56:00 UTC
Actually, it seems the g_task_set_task_data() paths could be surrounded by:

if (!context.require_glib_version (2, 44)) {
}

as it only ever needs that data back if g_task_get_completed() is not available.

But just saying, this is not the only subtle bug around vala async code (cf. bug #789249, triggers pretty much in around the same situations) for the sake of supporting a fringe feature such as  https://wiki.gnome.org/Projects/Vala/AsyncSamples#Generator_example. I personally am trying to move away from it.
Comment 10 Christian Hergert 2018-03-22 00:33:35 UTC
(In reply to Christian Hergert from comment #8)
> One possibility might be to make sure that the g_task_run_in_thread()
> prevents the g_task_return_*() functions from doing anything until yielding
> back to the function calling the GTaskThreadFunc. Then, if it marshals the
> results *and* the task back to the threadfunc, we can ensure they don't race
> against each other.

I implemented something like this in Builder as a replacement for GTask so that we can guarantee finalization semantics until we know better how to fix things here (and if we can/should).

The API is mostly the same but it also provides some things I've wanted:

 - Chaining results (to avoid duplicated work items) which requires changing
   how tasks are stored for boxed/objects/etc.
 - Ensuring the object is passed back to maincontext from thread with the result
 - Integration with Builder's thread pools

https://gitlab.gnome.org/GNOME/gnome-builder/blob/master/src/libide/threading/ide-task.c

Hopefully we can make GTask do something similar by changing how we get the result back to the maincontext from g_task_return() while in the thread.
Comment 11 GNOME Infrastructure Team 2018-05-24 20:15:30 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/1346.