After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 670200 - Hang during garbage collection
Hang during garbage collection
Status: RESOLVED FIXED
Product: gnome-shell
Classification: Core
Component: general
3.2.x
Other Linux
: Normal major
: ---
Assigned To: gnome-shell-maint
gnome-shell-maint
: 697170 (view as bug list)
Depends on:
Blocks: 697436
 
 
Reported: 2012-02-16 11:10 UTC by Sam Thursfield
Modified: 2013-04-17 07:05 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
debug info (21.63 KB, text/plain)
2012-02-16 11:10 UTC, Sam Thursfield
  Details
jsapi-util,object: Add missing JS_EndRequest calls (10.48 KB, patch)
2013-04-02 20:14 UTC, Ray Strode [halfline]
committed Details | Review
value: handle argc == 0 in closure_marshal (3.55 KB, patch)
2013-04-02 20:14 UTC, Ray Strode [halfline]
committed Details | Review
gi: fix compile error (2.30 KB, patch)
2013-04-02 20:14 UTC, Ray Strode [halfline]
committed Details | Review
context: always set GC callback (3.66 KB, patch)
2013-04-03 21:40 UTC, Ray Strode [halfline]
none Details | Review
object: defer toggle notifications to idle (12.89 KB, patch)
2013-04-03 21:40 UTC, Ray Strode [halfline]
none Details | Review
object: run toggle notifies directly, when possible (13.85 KB, patch)
2013-04-03 21:40 UTC, Ray Strode [halfline]
none Details | Review
One other strange issue (3.34 KB, patch)
2013-04-03 21:56 UTC, Ray Strode [halfline]
rejected Details | Review
context: always set GC callback (3.71 KB, patch)
2013-04-04 18:48 UTC, Ray Strode [halfline]
reviewed Details | Review
object: defer toggle notifications to idle (12.88 KB, patch)
2013-04-04 18:48 UTC, Ray Strode [halfline]
reviewed Details | Review
object: run toggle notifies directly, when possible (13.91 KB, patch)
2013-04-04 18:48 UTC, Ray Strode [halfline]
none Details | Review
context: lock access to GC idle (3.03 KB, patch)
2013-04-04 21:52 UTC, Ray Strode [halfline]
needs-work Details | Review
context: always set GC callback (3.91 KB, patch)
2013-04-04 21:52 UTC, Ray Strode [halfline]
accepted-commit_now Details | Review
object: defer toggle notifications to idle (13.76 KB, patch)
2013-04-04 21:52 UTC, Ray Strode [halfline]
reviewed Details | Review
object: run toggle notifies directly, when possible (12.19 KB, patch)
2013-04-04 21:52 UTC, Ray Strode [halfline]
accepted-commit_now Details | Review
context: lock access to GC idle (3.05 KB, patch)
2013-04-05 16:01 UTC, Ray Strode [halfline]
committed Details | Review
context: always set GC callback (3.90 KB, patch)
2013-04-05 16:01 UTC, Ray Strode [halfline]
committed Details | Review
object: defer toggle notifications to idle (21.93 KB, patch)
2013-04-05 16:01 UTC, Ray Strode [halfline]
needs-work Details | Review
object: run toggle notifies directly, when possible (12.20 KB, patch)
2013-04-05 16:01 UTC, Ray Strode [halfline]
needs-work Details | Review
object: defer toggle notifications to idle (21.83 KB, patch)
2013-04-05 21:06 UTC, Ray Strode [halfline]
committed Details | Review
object: run toggle notifies directly, when possible (14.30 KB, patch)
2013-04-05 21:06 UTC, Ray Strode [halfline]
committed Details | Review

Description Sam Thursfield 2012-02-16 11:10:33 UTC
Created attachment 207744 [details]
debug info

OS: Fedora 16, GNOME Shell 3.2.2.1, js-1.8.5-7

I had pretty much just started up my computer, interacted with the network-manager menu (connected to a VPN) and then opened the 'network settings' dialog. At this point the shell process hung during a garbage collection. I gave it several minutes to recover and nothing happened.
Comment 1 Owen Taylor 2012-02-16 15:27:31 UTC
Are you sure that it was hung during garbage collection - that it wasn't doing stuff, and then garbage collection, and doing more stuff and then garbage collecting, ...?

(Unfortunately, the two choices here are;

 A) Memory corruption at some earlier part of the program caused GC to go into a loop
 B) you were in some infinite loop which we have no data about

either way, it's very hard to proceed at this point unless it happens repeatedly.)
Comment 2 Sam Thursfield 2012-02-16 15:35:25 UTC
I took several backtraces and they were always at the same point, and 'next' executed in the above trace simply hung (I guess still blocking on its mutex).

It's the first time I've seen this problem and I've been using this system for a few months, so odds of recurring sadly are probably low
Comment 3 Sam Thursfield 2012-02-22 12:38:09 UTC
It happened again, this time with --gdm-mode.

My impression is that the JS runtime threading is getting confused. We have one worker thread which is idle (blocking on its 'wakeup' cond). Meanwhile the main loop is inside AutoGCSession::AutoGCSession() in this block of code:

    /*
     * Discount the request on the current thread from contributing to
     * rt->requestCount before we wait for all other requests to finish.
     * JS_NOTIFY_REQUEST_DONE, which will wake us up, is only called on
     * rt->requestCount transitions to 0.
     */
    size_t requestDebit = cx->thread->data.requestDepth ? 1 : 0;
    JS_ASSERT(requestDebit <= rt->requestCount);
    if (requestDebit != rt->requestCount) {
        rt->requestCount -= requestDebit;

        do {
            JS_AWAIT_REQUEST_DONE(rt);
        } while (rt->requestCount > 0);
        rt->requestCount += requestDebit;
    }

I ran the following to test if we are really blocking on the cond:

    (gdb) break jsgc.cpp:2669
    Breakpoint 1 at 0x38204afa96: file jsgc.cpp, line 2670. (2 locations)
    (gdb) cont
    Continuing.

and it never woke up again, so we must really be stuck.

More interestingly, this shows something definitely isn't right with the JS runtime:

    (gdb) print rt->requestCount
    $3 = 4294967295
    (gdb) print requestDebit
    $1 = 1
    (gdb) print cx->thread->data.requestDepth
    $3 = 4294967295

I'm not sure what a 'request' is in this context.
Comment 4 Colin Walters 2012-02-22 14:37:40 UTC
It's possible the bug lies in gjs - if we have a mismatched pair of JS_BeginRequest()/JS_EndRequest(), it might result in this symptom.
Comment 5 Sam Thursfield 2012-06-19 13:51:53 UTC
*potentially* a dupe of https://bugzilla.gnome.org/show_bug.cgi?id=678378 ...
Comment 6 Ray Strode [halfline] 2013-03-28 14:25:44 UTC
oh I ran into this yesterday and discovered the same thing mentioned in comment 3 and came up with the same theory mentioned in comment 4.  I did some code inspection and found cases where we were missing JS_EndRequest, but I didn't find cases were missing JS_BeginRequest (which seems like a more likely cause given how high the numbers are).

I'm going to try doing a JS build today with asserts enabled to see if that will smoke out the issue.  I'll also attach a patch for the missing EndRequest calls once I write one.
Comment 7 Ray Strode [halfline] 2013-04-02 16:54:15 UTC
So I've spent a few days looking into this and it's a pretty tricky problem.

First, this problem has been aggravated recently because of a bug in the background code where we don't clean up the settings object associated with a background when the background actor is destroyed.  I've put a fix for that in bug 697119.

It turns out the problem isn't mismatched JS_EndRequest() calls after all (though like I said earlier there are some missing JS_EndRequest calls that i'll post a patch for). It's because we're calling JS_BeginRequest/JS_EndRequest from multiple threads.  spidermonkey only allows for JS calls to be made from one thread at a time.  We normally keep javascript to one thread, but there is a case where we sometimes don't: toggle reference notification.

Basically, GJS allocates a JSObject for every GObject.  The lifetime of of the JSObject is tied to the lifetime of the GObject.  As long as the JSObject is alive, we want to force the GObject to stay alive (since the javascript code might be using the GObject via the JSObject).  We also want to force the JSObject to stay alive as long as the GObject is alive.  That's because if javascript calls into C to regain access to an object, it should get the same JSObject back from a previous call so that any data stored on it earlier isn't lost [say actor.get_children() should always return the same JSObject wrappers for the children, so if code does this.actor._delegate=this, it can get the delegate back).  

We can force the JSObject to stay alive as long the GObject is alive, by "rooting" the JSObject to the JS global object.   We can force the GObject to stay alive as long as the JSObject is alive by taking a reference on the GObject. Doing both things together is effectively a reference cycle, though.  If the JSObject owns a reference to the GObject it will never die, and so the JSObject will never get unrooted and die, so the GObject will never get unreferenced and die.

This cycle problem is addressed using a "toggle reference".  GObject notifies GJS via a callback function when no one other than javascript is referencing the object.  It also notifies GJS via a callback function when something other than javascript starts referencing the object.  The idea is when javascript is the only thing referencing the GObject, we can "unroot" the JSObject from the global object.  Then the only thing keeping the JSObject alive is the javascript code itself. When the javascript code stops referencing the object it will die the next time the garbage collector runs.
Likewise, when GJS gets notified that something other javascript is referencing the gobject we can re-root the jsobject to the global jsobject and force it stay alive even if no javascript code is referencing it anymore.

The problem is these callbacks can be run from any thread.  unrooting and rerooting the jsobject to the global object is something that can only be done when javascript isn't being used from other threads.  And it really can't be done when the garbage collector is running in another thread.

It's tempting to defer toggle notification to the main thread (with an idle function or g_main_context_invoke).  The problem is, if the JSObject needs to be rooted to the global object, then deferring to idle could mean it gets garbage collected before the rooting takes hold.  We could turn off the garbage collector, but doing that requires using javascript functions which we can only call from the main thread.

So we're sort of stuck.  One thing we could do is block the toggle notification from proceeding until garbage collection finishes (if it's running).  That would help somewhat, but isn't a complete answer.
Comment 8 Ray Strode [halfline] 2013-04-02 18:05:49 UTC
okay, looking in jsgc.cpp I see this:

        /*•
         * Let the API user decide to defer a GC if it wants to (unless this•
         * is the last context).  Invoke the callback regardless. Sample the•
         * callback in case we are freely racing with a JS_SetGCCallback{,RT}•
         * on another thread.•
         */•
        if (JSGCCallback callback = rt->gcCallback) {•
            if (!callback(cx, JSGC_BEGIN) && gckind != GC_LAST_CONTEXT)•
                return;•
        }•


So we can suspend the GC from running by returning false from our GC callback.  We can also suspend the toggle notification callback from starting until the GC finishes (if it's already running) by locking a mutex in the GC callback and trying to lock the mutex in the notification callback.  So, in summary:

1) have the GC callback check a boolean to see if we want the GC suspended.  If so, return FALSE
2) If not, take a lock while the GC is running
3) From the toggle notification callback try to take the same lock.
4) Suspend the GC.
5) Queue an idle to process toggle notification and return
6) When idle runs, process toggle notification and then resume GC

That should be sufficient to fix this bug, I think.
Comment 9 Giovanni Campagna 2013-04-02 18:19:52 UTC
It is not true (according to the documentation) that we can only use BeginRequest from the main thread. What is true is that you cannot call BeginRequest on the same context from different threads.

So a simpler solution would be to have two different contexts: one for normal JS executions, and one for foreign JS toggle references.
Note that refcount can go from 1 to >1 only after a JS call, so only in the main thread. Which means that a toggle notify on a secondary thread will never cause us to root an object, and therefore this flow is safe.
Comment 10 Ray Strode [halfline] 2013-04-02 20:01:49 UTC
I talked with Giovanni about this on IRC. Upshot is I'm going to try comment 8 instead of comment 9, since that has the potential to be more js1.7 friendly (although not completely js1.7 friendly since the callback mentioned in comment 8 changes signature.
Comment 11 Ray Strode [halfline] 2013-04-02 20:14:09 UTC
Created attachment 240426 [details] [review]
jsapi-util,object: Add missing JS_EndRequest calls

There are a few early return cases where a JS_BeginRequest isn't
getting paired with JS_EndRequest.

This commit fixes those cases.
Comment 12 Ray Strode [halfline] 2013-04-02 20:14:33 UTC
Created attachment 240427 [details] [review]
value: handle argc == 0 in closure_marshal

It's possible for a closure to have 0 arguments.

When this happens we end up calling alloca(0) which is
undefined.

This commit protects against argc == 0.

(concretely, I was seeing this happen from
 g_closure_invoke(clsoure, &result_value, 0, NULL, NULL) in glib's
 gsourceclosure.c: source_closure_callback function)
Comment 13 Ray Strode [halfline] 2013-04-02 20:14:54 UTC
Created attachment 240428 [details] [review]
gi: fix compile error

This commit fixes a typo that can only see when compiling gjs with
extra debugging.
Comment 14 Jasper St. Pierre (not reading bugmail) 2013-04-02 20:17:01 UTC
Review of attachment 240428 [details] [review]:

OK.
Comment 15 Jasper St. Pierre (not reading bugmail) 2013-04-02 20:20:57 UTC
Review of attachment 240427 [details] [review]:

OK.
Comment 16 Jasper St. Pierre (not reading bugmail) 2013-04-02 20:21:45 UTC
Review of attachment 240426 [details] [review]:

Hm, I'm not sure I like the flow in some of them, but I think the alternatives would be worse, so OK.
Comment 17 Ray Strode [halfline] 2013-04-02 20:26:35 UTC
Attachment 240426 [details] pushed as 27a1395 - jsapi-util,object: Add missing JS_EndRequest calls
Attachment 240427 [details] pushed as a026cc5 - value: handle argc == 0 in closure_marshal
Attachment 240428 [details] pushed as e2d80eb - gi: fix compile error
Comment 18 Ray Strode [halfline] 2013-04-03 21:40:37 UTC
Created attachment 240544 [details] [review]
context: always set GC callback

Right now we only set the GC callback if gc-notifications are enabled.

This commit makes us always set the callback as prep work toward fixing a GC
deadlock.
Comment 19 Ray Strode [halfline] 2013-04-03 21:40:43 UTC
Created attachment 240545 [details] [review]
object: defer toggle notifications to idle

We don't want to be mucking with javascript from two threads at once.
This commit defers toggle notification processing to the main
thread.
Comment 20 Ray Strode [halfline] 2013-04-03 21:40:48 UTC
Created attachment 240546 [details] [review]
object: run toggle notifies directly, when possible

There are a number of cases where toggle notifies run that we can
get away with handling them directly.

This commit does that, to minimize the number of idle functions
getting queued.
Comment 21 Ray Strode [halfline] 2013-04-03 21:51:03 UTC
I talked with Owen and Colin today and we came up with a slightly different solution to comment 8, which i've posted in the above three patches.

The basic idea is 

1) defer toggle notifications to idle when necessary
2) when toggling up from 1->2 references, take a reference of our own to prevent it from toggling back down (and potentially up again) before we can deal with it
3) accept that JSObjects associated with a gobject may get garbage collected before we have a chance to root them.  This can happen if an object is already on the chopping block waiting to be GC'd and some internal library code refs it (like GDBusConnection's weak reference to the system bus)
4) handle toggle notifications immediately when possible for efficiency reasons.
Comment 22 Ray Strode [halfline] 2013-04-03 21:56:57 UTC
Created attachment 240548 [details] [review]
One other strange issue

There's one other strange thing i've run across.

If i manually force a GC, I sometimes get this assertion failure:

    at jsutil.cpp:83

This patch replaces the macros with what they do under the covers and it "fixes" it.  I haven't yet figured out why.
Comment 23 Ray Strode [halfline] 2013-04-03 21:57:49 UTC
hmm, that trace didn't make it through git bz:

  • #1 JS_Assert
    at jsutil.cpp line 83
  • #2 js::gc::Mark<JSObject>
    at jsgcinlines.h line 206
  • #3 js::gc::MarkKind
    at jsgcinlines.h line 579
  • #4 JS_CallTracer
    at jsapi.cpp line 2328
  • #5 gjs_closure_trace
    at gi/closure.c line 340

Comment 24 Ray Strode [halfline] 2013-04-04 03:50:21 UTC
Comment on attachment 240548 [details] [review]
One other strange issue

This ended up being because I build spidermonkey with -DDEBUG but didn't build gjs with -DDEBUG
Comment 25 Ray Strode [halfline] 2013-04-04 14:34:51 UTC
(this is against the gnome-3-8 branch, not master since I don't have spidermonkey 17 yet)
Comment 26 Ray Strode [halfline] 2013-04-04 18:47:49 UTC
walters gave me a shiny spidermonkey 17 rpm, so i rebased it. will post.
Comment 27 Ray Strode [halfline] 2013-04-04 18:48:11 UTC
Created attachment 240649 [details] [review]
context: always set GC callback

Right now we only set the GC callback if gc-notifications are enabled.

This commit makes us always set the callback as prep work toward fixing a GC
deadlock.
Comment 28 Ray Strode [halfline] 2013-04-04 18:48:15 UTC
Created attachment 240650 [details] [review]
object: defer toggle notifications to idle

We don't want to be mucking with javascript from two threads at once.
This commit defers toggle notification processing to the main
thread.
Comment 29 Ray Strode [halfline] 2013-04-04 18:48:20 UTC
Created attachment 240651 [details] [review]
object: run toggle notifies directly, when possible

There are a number of cases where toggle notifies run that we can
get away with handling them directly.

This commit does that, to minimize the number of idle functions
getting queued.
Comment 30 Colin Walters 2013-04-04 19:31:58 UTC
Review of attachment 240649 [details] [review]:

Looks correct modulo one question:

::: gjs/context.c
@@ +878,3 @@
     GjsContext *gjs_context = data;
 
     gjs_context->idle_emit_gc_id = 0;

(Similarly would have to grab a mutex here)

@@ +896,3 @@
+            if (gjs_context->gc_notifications_enabled) {
+                if (gjs_context->idle_emit_gc_id == 0)
+                    gjs_context->idle_emit_gc_id = g_idle_add (gjs_context_idle_emit_gc, gjs_context);

This is not a new bug, but I believe gjs_context here needs to be protected with a mutex - otherwise we face a possible race with this callback being triggered from the GC thread, and racing with the idle in the main thread?
Comment 31 Colin Walters 2013-04-04 20:38:32 UTC
Review of attachment 240650 [details] [review]:

::: gi/object.c
@@ +906,3 @@
+        idle_id = g_idle_add(idle_handle_toggle_down, operation);
+        g_object_set_data (operation->gobj,
+                           "gjs-toggle-down-notify",

Should use _set_qdata(), it avoids this function having to do the quark lookup each time.

@@ +907,3 @@
+        g_object_set_data (operation->gobj,
+                           "gjs-toggle-down-notify",
+                           GUINT_TO_POINTER(idle_id));

But honestly, I don't quite understand this.  If we go e.g. up -> down -> up, there will be two up idles queued, right?  But then at object finalize, we're only removing one of them?  Won't that crash?

@@ +911,2 @@
     } else {
+        guint idle_id;

Could hoist idle_id from both paths.

@@ +923,2 @@
          */
+        g_object_ref(gobj);

Is there a reason we're doing the ref here outside the lock?  Were we to do it inside the lock, the locking calls could be hoisted outside the conditional.

@@ +1176,3 @@
+        g_mutex_lock(&toggle_notify_lock);
+        idle_id = GPOINTER_TO_UINT(g_object_get_data(priv->gobj,
+                                                     "gjs-toggle-down-notify"));

g_object_steal_qdata() will take care of access and removing it atomically.

@@ +1185,3 @@
+
+        idle_id = GPOINTER_TO_UINT(g_object_get_data(priv->gobj,
+                                                     "gjs-toggle-up-notify"));

Ditto.
Comment 32 Colin Walters 2013-04-04 20:40:00 UTC
Review of attachment 240651 [details] [review]:

Will wait to review this one until previous dependency is ready.
Comment 33 Ray Strode [halfline] 2013-04-04 20:52:52 UTC
Review of attachment 240649 [details] [review]:

::: gjs/context.c
@@ +896,3 @@
+            if (gjs_context->gc_notifications_enabled) {
+                if (gjs_context->idle_emit_gc_id == 0)
+                    gjs_context->idle_emit_gc_id = g_idle_add (gjs_context_idle_emit_gc, gjs_context);

yea i guess you could end up in a race where idle_emit_gc_id gets zeroified from the idle right after getting set in the gc callback
Comment 34 Ray Strode [halfline] 2013-04-04 20:59:43 UTC
Review of attachment 240650 [details] [review]:

::: gi/object.c
@@ +907,3 @@
+        g_object_set_data (operation->gobj,
+                           "gjs-toggle-down-notify",
+                           GUINT_TO_POINTER(idle_id));

we can't go from up to down.  we ref the object when we go up, so we can't go back down again.

@@ +923,2 @@
          */
+        g_object_ref(gobj);

well the lock is only needed to prevent the idle handler from running before the idle id is set on the object.  I think it's better to keep the locked region more compact than to be able to shrink the function, personally. But if you feel strongly, i'm not opposed to changing it.  I had thought about moving the idle+set_data to a separate helper function.  That would make this function shrink. I'm cool with whatever.
Comment 35 Ray Strode [halfline] 2013-04-04 21:52:33 UTC
Created attachment 240665 [details] [review]
context: lock access to GC idle

There's a small race where the GC notification idle id might get
zeroed before the idle runs, if the GC callback is run from a
non-main thread.

That means there's the potential for the GC notification idle to
get dispatched after the context is finalized.

This commit fixes that by protecting access to the idle id with
a lock.
Comment 36 Ray Strode [halfline] 2013-04-04 21:52:39 UTC
Created attachment 240666 [details] [review]
context: always set GC callback

Right now we only set the GC callback if gc-notifications are enabled.

This commit makes us always set the callback as prep work toward fixing a GC
deadlock.
Comment 37 Ray Strode [halfline] 2013-04-04 21:52:47 UTC
Created attachment 240667 [details] [review]
object: defer toggle notifications to idle

We don't want to be mucking with javascript from two threads at once.
This commit defers toggle notification processing to the main
thread.
Comment 38 Ray Strode [halfline] 2013-04-04 21:52:54 UTC
Created attachment 240668 [details] [review]
object: run toggle notifies directly, when possible

There are a number of cases where toggle notifies run that we can
get away with handling them directly.

This commit does that, to minimize the number of idle functions
getting queued.
Comment 39 Ray Strode [halfline] 2013-04-04 21:53:40 UTC
(I haven't had an opportunity to thoroughly test the latest patchset, but I have to go and wanted to post it before I left)
Comment 40 Colin Walters 2013-04-04 22:08:28 UTC
Review of attachment 240665 [details] [review]:

Right.
Comment 41 Colin Walters 2013-04-04 22:09:13 UTC
Review of attachment 240666 [details] [review]:

Looks right.
Comment 42 Colin Walters 2013-04-04 22:52:08 UTC
Review of attachment 240667 [details] [review]:

A few more comments...looking closer to right to me.

::: gi/object.c
@@ +778,3 @@
+{
+    g_object_steal_qdata(gobj,
+                        g_quark_from_static_string("gjs-toggle-down-notify"));

Ah...you're kind of missing the point of the quark stuff =)  You're supposed to cache the GQuark as a static in a good place like class_init or at worst a GOnce.

"git grep g_quark_from_static_string" in e.g. GLib to see some users like gobject/gbinding.c.

@@ +810,3 @@
+    if (idle_id != 0)
+        g_source_remove (idle_id);
+}

You could unify these two functions and have them take the GQuark to remove.

@@ +873,3 @@
+    ToggleRefNotifyOperation *operation = data;
+
+    clear_toggle_down_idle_id(operation->gobj);

When we call these, don't we need to be holding the toggle notify lock?

@@ +886,3 @@
+    ToggleRefNotifyOperation *operation = data;
+
+    clear_toggle_up_idle_id(operation->gobj);

And here?
Comment 43 Colin Walters 2013-04-04 23:05:17 UTC
Review of attachment 240668 [details] [review]:

This looks reasonable to me...definitely tricky code, but I can't think of anything wrong with this.
Comment 44 Ray Strode [halfline] 2013-04-05 01:14:27 UTC
Hi,

(In reply to comment #42)
> Ah...you're kind of missing the point of the quark stuff =)  You're supposed to
> cache the GQuark as a static in a good place like class_init or at worst a
> GOnce.
It was my understanding that quarks created from static strings were very cheap (essentially return (GQuark) static_string;) Looking quickly, that understanding is wrong. I'll change it.

> @@ +810,3 @@
> +    if (idle_id != 0)
> +        g_source_remove (idle_id);
> +}
> 
> You could unify these two functions and have them take the GQuark to remove.
sure, why not.
 
> @@ +873,3 @@
> +    ToggleRefNotifyOperation *operation = data;
> +
> +    clear_toggle_down_idle_id(operation->gobj);
> 
> When we call these, don't we need to be holding the toggle notify lock?
I don't think so.  We'll never have two toggle down idles dispatched at the same time.  If the GC is running in a non-default thread and finalizes a JSObject then we will call cancel_pending_toggle_down_idle_id, but that's harmless to call if the idle is already running. Actually thinking about it more, that case can't actually happen since the JSObject is rooted to the keep-alive object until the idle dispatches.  So really cancel_pending_toggle_down_idle_id should be an assert.
 
> @@ +886,3 @@
> +    ToggleRefNotifyOperation *operation = data;
> +
> +    clear_toggle_up_idle_id(operation->gobj);
> 
> And here?

Again, we can't have two idles dispatched at the same time, so that aspect of needing the lock isn't a concern. But we do have a problem where a JSObject could be getting garbage collected on a secondary thread while being used in the main thread. hmm. The only way we could prevent that from happening is to suspend the gc from running until the idle is dispatched, and block the idle from running while the GC is running.
Comment 45 Ray Strode [halfline] 2013-04-05 15:49:02 UTC
Review of attachment 240665 [details] [review]:

::: gjs/context.c
@@ +883,2 @@
     gjs_context->idle_emit_gc_id = 0;
+    g_mutex_lock(&gc_idle_lock);

this should say unlock
Comment 46 Ray Strode [halfline] 2013-04-05 16:01:08 UTC
Created attachment 240764 [details] [review]
context: lock access to GC idle

There's a small race where the GC notification idle id might get
zeroed before the idle runs, if the GC callback is run from a
non-main thread.

That means there's the potential for the GC notification idle to
get dispatched after the context is finalized.

This commit fixes that by protecting access to the idle id with
a lock.
Comment 47 Ray Strode [halfline] 2013-04-05 16:01:16 UTC
Created attachment 240765 [details] [review]
context: always set GC callback

Right now we only set the GC callback if gc-notifications are enabled.

This commit makes us always set the callback as prep work toward fixing a GC
deadlock.
Comment 48 Ray Strode [halfline] 2013-04-05 16:01:22 UTC
Created attachment 240766 [details] [review]
object: defer toggle notifications to idle

We don't want to be mucking with javascript from two threads at once.
This commit defers toggle notification processing to the main
thread, and introduces a mutex to prevent the garbage collector
and toggle notifications from running at the same time.
Comment 49 Ray Strode [halfline] 2013-04-05 16:01:29 UTC
Created attachment 240767 [details] [review]
object: run toggle notifies directly, when possible

There are a number of cases where toggle notifies run that we can
get away with handling them directly.

This commit does that, to minimize the number of idle functions
getting queued.
Comment 50 Colin Walters 2013-04-05 16:10:15 UTC
(In reply to comment #44)

> > When we call these, don't we need to be holding the toggle notify lock?
> I don't think so.  We'll never have two toggle down idles dispatched at the
> same time.  If the GC is running in a non-default thread and finalizes a
> JSObject then we will call cancel_pending_toggle_down_idle_id, but that's
> harmless to call if the idle is already running. 

Basically when I saw the mutex, I went looking for the "other half" - what's the code that races with this one? 

Oh, are we protecting against e.g. the GDBus thread bouncing it up at the same time another thread is decrementing it, i.e. two threads calling into g_object_set_data() at once?  But that's actually a threadsafe API, so even that's not a concern.

So given that, do we need the mutex here at all?

> Again, we can't have two idles dispatched at the same time, so that aspect of
> needing the lock isn't a concern. But we do have a problem where a JSObject
> could be getting garbage collected on a secondary thread while being used in
> the main thread. hmm. The only way we could prevent that from happening is to
> suspend the gc from running until the idle is dispatched, and block the idle
> from running while the GC is running.

We discussed this one, and were going to accept the "dewrappering" right?  I think the way we'd have to do this correctly is something like a "destroyed" flag in the priv object.  Then when the JSObject gets destroyed, if we have an idle pending, it holds a ref on priv.  When we fire, we check the destroyed flag, if it's set, do nothing.  Then drop the ref on priv.
Comment 51 Ray Strode [halfline] 2013-04-05 17:36:02 UTC
(In reply to comment #50)
> Basically when I saw the mutex, I went looking for the "other half" - what's
> the code that races with this one? 
> So given that, do we need the mutex here at all?
The point of the mutex is because we do

foo = g_idle_add(...)
set_data(foo);

and inside the idle handler unset the data.  The idle handler may run before g_idle_add returns. If it unsets the data before the data gets set, then the data will stay around after the idle handler runs.

> We discussed this one, and were going to accept the "dewrappering" right? 
Right, the dewrappering is unavoidable, and not even really that problematic in practice.
But what we can't do is try to root the JSObject in the idle handler while the JSObject is getting GC'd.
That means running javascript from two threads at once, and that means mucking with an already freed object.
I'm saying we accept the dewrappering, but we have to know when it happens and do the right thing.

> is two threads calling JS at the same time.
> think the way we'd have to do this correctly is something like a "destroyed"
> flag in the priv object.
The priv object is off limits from the second thread (since it requires calling Javascript functions to get at)
I think the patchset i've posted right before lunch should work okay, but I'm going to do read through of it and some more testing. Let me know what you think.
Comment 52 Colin Walters 2013-04-05 17:38:07 UTC
Review of attachment 240765 [details] [review]:

Right.
Comment 53 Colin Walters 2013-04-05 17:38:25 UTC
Review of attachment 240764 [details] [review]:

Still right.
Comment 54 Ray Strode [halfline] 2013-04-05 20:43:43 UTC
Review of attachment 240766 [details] [review]:

we talked on irc about this, and decided two things.

::: gi/object.c
@@ +929,3 @@
+    ToggleRefNotifyOperation *operation = data;
+
+    if (!clear_toggle_idle_id(operation->gobj, operation->direction)) {

if we go with the mutex approach we really do need to block on it here or it's not doing anything.

@@ +998,3 @@
+    qdata_key = get_qdata_key_for_toggle_direction(direction);
+
+    idle_id = g_idle_add_full(G_PRIORITY_HIGH,

but we can avoid the mutex all together by using g_idle_source_new
Comment 55 Ray Strode [halfline] 2013-04-05 20:45:00 UTC
Review of attachment 240767 [details] [review]:

::: gi/object.c
@@ +1049,3 @@
          */
+        if (gc_blocked)
+          handle_toggle_down(context, gobj);

we could potentially end up calling this before queued idles.  we need to maintain ordering.
Comment 56 Ray Strode [halfline] 2013-04-05 21:06:10 UTC
Created attachment 240798 [details] [review]
object: defer toggle notifications to idle

We don't want to be mucking with javascript from two threads at once.
This commit defers toggle notification processing to the main
thread, and introduces a mutex to prevent the garbage collector
and toggle notifications from running at the same time.
Comment 57 Ray Strode [halfline] 2013-04-05 21:06:15 UTC
Created attachment 240799 [details] [review]
object: run toggle notifies directly, when possible

There are a number of cases where toggle notifies run that we can
get away with handling them directly.

This commit does that, to minimize the number of idle functions
getting queued.
Comment 58 Colin Walters 2013-04-05 22:26:25 UTC
Review of attachment 240798 [details] [review]:

Ok, this makes more sense to me.
Comment 59 Colin Walters 2013-04-05 22:33:15 UTC
Review of attachment 240799 [details] [review]:

I'll admit to not running through all the cases here, but this looks on the surface right.  Let's go with it.
Comment 60 Ray Strode [halfline] 2013-04-05 22:51:58 UTC
Attachment 240764 [details] pushed as d112a71 - context: lock access to GC idle
Attachment 240765 [details] pushed as 7ab5e0e - context: always set GC callback
Attachment 240798 [details] pushed as 50a3d86 - object: defer toggle notifications to idle
Attachment 240799 [details] pushed as 0adcb39 - object: run toggle notifies directly, when possible
Comment 61 Ray Strode [halfline] 2013-04-06 00:23:44 UTC
*** Bug 697170 has been marked as a duplicate of this bug. ***
Comment 62 Colin Walters 2013-04-06 15:11:16 UTC
Follow up bug: https://bugzilla.gnome.org/show_bug.cgi?id=697436
Comment 63 Marshall 2013-04-16 16:47:59 UTC
Any news on when the patch for this bug will be pushed to the Ubuntu Quantal (12.10) PPA?  It affects me ~3 times a day.
Comment 64 Ray Strode [halfline] 2013-04-16 20:42:31 UTC
not sure.  (I don't think anyone on the CC list of this bug works on ubuntu or the quantal ppa)  Maybe ask in launchpad?
Comment 65 sangu 2013-04-17 07:05:28 UTC
Can you release new version for GNOME 3.8.1 ?