After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 778862 - Crash on exit when implementing Gio.Application.vfunc_dbus_unregister()
Crash on exit when implementing Gio.Application.vfunc_dbus_unregister()
Status: RESOLVED OBSOLETE
Product: gjs
Classification: Bindings
Component: general
1.47.x
Other Linux
: Normal major
: ---
Assigned To: Philip Chimento
gjs-maint
Depends on:
Blocks:
 
 
Reported: 2017-02-18 01:54 UTC by Philip Chimento
Modified: 2018-01-27 12:04 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
object: Downgrade severity of toggle pending on dissociate (1.57 KB, patch)
2017-02-27 00:16 UTC, Philip Chimento
committed Details | Review
object: Fix toggle check in dissociate_js_gobject() (2.17 KB, patch)
2017-03-10 02:53 UTC, Philip Chimento
committed Details | Review
object: Queue object toggles and drain them in idle (18.93 KB, patch)
2017-03-10 02:53 UTC, Philip Chimento
committed Details | Review
object: Fix comments about toggle refs (1.71 KB, patch)
2017-03-10 02:53 UTC, Philip Chimento
committed Details | Review
context: Drain toggle queue at start of GC (2.94 KB, patch)
2017-03-10 02:53 UTC, Philip Chimento
committed Details | Review
WIP - object: Clear toggles in weak pointer callback (1.08 KB, patch)
2017-03-10 02:53 UTC, Philip Chimento
none Details | Review
WIP - object: Clear toggles in weak pointer callback (2.45 KB, patch)
2017-04-01 20:25 UTC, Philip Chimento
needs-work Details | Review

Description Philip Chimento 2017-02-18 01:54:43 UTC
Sample script to reproduce the problem:

    const Gio = imports.gi.Gio;
    const Lang = imports.lang;

    const App = new Lang.Class({
        Name: 'App',
        Extends: Gio.Application,
        vfunc_activate: function () { this.parent(); },
        vfunc_dbus_unregister: function () {},
    });

    new App({ application_id: 'com.example.Foo' }).run(['foo']);

Output:

**
Gjs:ERROR:/blah/gjs/gi/object.cpp:1383:void disassociate_js_gobject(GObject*): assertion failed: (!cancel_toggle_idle(gobj, TOGGLE_UP))
Aborted (core dumped)

Looks like fallout from my recent patches.
Comment 1 Giovanni Campagna 2017-02-18 06:20:30 UTC
I'm not sure this is a recent regression, I remember having similar problems in gnome-weather in old code too.

The problem is that dbus_unregister is called after the last mainloop iteration, so we can't use the idle to clean up the toggle ref.
Comment 2 Philip Chimento 2017-02-21 05:06:24 UTC
I'm pretty sure it's a new problem in the original code which I distilled the above minimal example from. But I guess that just means that it's a race between the toggle ref resolving and the end of the mainloop.
Comment 3 Philip Chimento 2017-02-26 22:51:24 UTC
From the code:

     * Note that one would think that toggling up only happens
     * in the main thread (because toggling up is the result of
     * the JS object, previously visible only to JS code, becoming
     * visible to the refcounted C world), but because of weird
     * weak singletons like g_bus_get_sync() objects can see toggle-ups
     * from different threads too.
     * We could lock the keep alive structure and avoid the idle maybe,
     * but there aren't many peculiar objects like that and it's
     * not a big deal.

I think we should try to avoid the idle function, at least, because otherwise we have to assume there's a main loop running at all times. This certainly won't work in gjs-console.
Comment 4 Philip Chimento 2017-02-26 23:22:11 UTC
I realized that won't work; switching the GjsMaybeOwned wrapper to and from rooted mode, requires being in a request, and that can only happen from the thread that's associated with the JSRuntime. (CurrentThreadCanAccessRuntime() assertion in StartRequest().)

I don't understand why this code in gjs_object_clear_toggles()

    while (g_main_context_pending(NULL) &&
           g_atomic_int_get(&pending_idle_toggles) > 0) {
        g_main_context_iteration(NULL, false);
    }

doesn't clean up the pending toggle refs, though. I guess after the main loop is shut down there will be nothing pending on the default GMainContext?

Maybe a solution would be to put the pending toggles in an async queue, and have the main thread drain that queue in an idle function. That way, when shutting down the GjsContext, we could empty out the queue without depending on there being a main loop running.
Comment 5 Philip Chimento 2017-02-27 00:16:08 UTC
Created attachment 346779 [details] [review]
object: Downgrade severity of toggle pending on dissociate

Since there may be idle toggle refs pending when a GObject is dissociated
from its JS wrapper, during the normal course of GJS usage, we should not
assert in this case. Instead log a critical warning with a diagnostic
message explaining when this is / isn't a bug.
Comment 6 Philip Chimento 2017-02-27 00:21:35 UTC
Since we've determined that this can happen in normal usage of GJS, at least until we get rid of the idle function, here's a patch to downgrade the assert to a critical.
Comment 7 Cosimo Cecchi 2017-02-27 17:05:25 UTC
Review of attachment 346779 [details] [review]:

It's a bit ugly to spam the output with this message, because there's nothing the application can do to fix it, but better than crashing I guess...
I'd be OK with this for now, if we add a FIXME to indicate that it's temporary.
Comment 8 Philip Chimento 2017-02-28 00:04:26 UTC
OK, I'll push it with a FIXME comment referring to this bug.
Comment 9 Philip Chimento 2017-02-28 00:06:08 UTC
Comment on attachment 346779 [details] [review]
object: Downgrade severity of toggle pending on dissociate

Attachment 346779 [details] pushed as e290481 - object: Downgrade severity of toggle pending on dissociate
Comment 10 Philip Chimento 2017-03-10 02:53:26 UTC
Created attachment 347592 [details] [review]
object: Fix toggle check in dissociate_js_gobject()

If both a toggle up and toggle down were queued, we don't want to assert
here, since the object would be scheduled to get freed anyway.
Comment 11 Philip Chimento 2017-03-10 02:53:30 UTC
Created attachment 347593 [details] [review]
object: Queue object toggles and drain them in idle

Previously, object toggle notifications that happened on a non-owner
thread would cause the toggles to happen in an idle function. That was
not good if no main loop was running, because then the toggles would
just queue up indefinitely, eventually causing a crash when cleaning up
objects at context shutdown time.

Instead of doing that, we make a threadsafe FIFO queue of toggles, and
drain it in an idle function. This has just the same effect, except that
we now also have the option of draining it outside of an idle function.
So when preparing to shut down the context, we drain it synchronously.
Comment 12 Philip Chimento 2017-03-10 02:53:34 UTC
Created attachment 347594 [details] [review]
object: Fix comments about toggle refs

These comments were lagging behind the code since we switched from
KeepAlive to GjsMaybeOwned.
Comment 13 Philip Chimento 2017-03-10 02:53:38 UTC
Created attachment 347595 [details] [review]
context: Drain toggle queue at start of GC

In order to minimize the chances of a JS object being garbage collected
while it is queued to toggle up, we handle all pending object toggles
when garbage collection starts.
Comment 14 Philip Chimento 2017-03-10 02:53:41 UTC
Created attachment 347596 [details] [review]
WIP - object: Clear toggles in weak pointer callback

This is not supposed to work, but it does fix the crash from this bug.
Comment 15 Philip Chimento 2017-03-10 03:09:48 UTC
Well, this is still a mystery to me. I refactored it so that the toggles don't _have_ to be handled in an idle function (though they still normally are.)

However, the only thing that really fixes it is clearing the toggles inside the weak pointer callback. That seems fishy to me. The crash meant that a JS object wrapper was being GC'd while its GObject had already been toggled up on a different thread. So, draining the toggle queue there _should_ mean that we are rooting a JS object that is already marked for collection. And in that case, anytime we access the rooted object, we'll crash on a use-after-free.
Comment 16 Cosimo Cecchi 2017-03-12 03:51:12 UTC
Review of attachment 347592 [details] [review]:

Looks good
Comment 17 Cosimo Cecchi 2017-03-12 04:11:15 UTC
Review of attachment 347593 [details] [review]:

Nice, I only have one small comment.

::: gi/toggle.cpp
@@ +104,3 @@
+    
+    if (item.needs_unref)
+        g_object_unref(item.gobj);

It doesn't matter in practice but you could set item.needs_unref = false here
Comment 18 Cosimo Cecchi 2017-03-12 04:11:50 UTC
Review of attachment 347594 [details] [review]:

++
Comment 19 Cosimo Cecchi 2017-03-12 04:14:43 UTC
Review of attachment 347595 [details] [review]:

++
Comment 20 Cosimo Cecchi 2017-03-12 04:28:27 UTC
Review of attachment 347596 [details] [review]:

To be honest I'm not sure if it's worth trading off the critical with code that may have unintended effects...
Maybe we could have a third opinion from Giovanni or the release team.
Comment 21 Philip Chimento 2017-03-12 05:12:23 UTC
I'll push the comment fix, in any case.
Comment 22 Philip Chimento 2017-03-12 05:14:01 UTC
Comment on attachment 347594 [details] [review]
object: Fix comments about toggle refs

Attachment 347594 [details] pushed as d55d10a - object: Fix comments about toggle refs
Comment 23 Philip Chimento 2017-04-01 20:25:45 UTC
Created attachment 349125 [details] [review]
WIP - object: Clear toggles in weak pointer callback

This is not supposed to work, but it does fix the crash from this bug.
Comment 24 Philip Chimento 2017-04-01 20:29:09 UTC
Attachment 347592 [details] pushed as 7735dd2 - object: Fix toggle check in dissociate_js_gobject()
Attachment 347593 [details] pushed as 0cf7f83 - object: Queue object toggles and drain them in idle
Attachment 347595 [details] pushed as 404e705 - context: Drain toggle queue at start of GC
Comment 25 Philip Chimento 2017-04-01 20:30:12 UTC
Review of attachment 349125 [details] [review]:

Marking this "needs work" because we still don't understand why it works, when it actually should not.
Comment 26 GNOME Infrastructure Team 2018-01-27 12:04:00 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gjs/issues/103.