GNOME Bugzilla – Bug 679832
Remove manual garbage collection on tweeners end
Last modified: 2015-06-24 21:30:09 UTC
The shell currently often locks up when invoking a manual gc pass on the end of tweeners; Jasper thinks it's not necessary anymore, since the garbage collector is being fixed in gjs...but also, in my testing, this gets rid of the lockups.
Created attachment 218668 [details] [review] shell-global: remove unused shell_global_maybe_gc() It's not used anywhere, and we're removing the manual garbage collection invocations anyway.
Created attachment 218669 [details] [review] js: use System.gc() instead of shell_global_gc() gjs now offers a gc() method in the System module, no need to use our own.
Created attachment 218670 [details] [review] shell-global: remove shell_global_gc() It's unused now.
Created attachment 218671 [details] [review] global: don't run a garbage collection on tweeners end This currently causes the shell to freeze very often in a thread deadlock, and the gjs garbage collector behavior is currently getting fixed at the right level in gjs itself.
Review of attachment 218668 [details] [review]: Looks fine.
Review of attachment 218669 [details] [review]: Sure.
Review of attachment 218670 [details] [review]: Yes.
Review of attachment 218671 [details] [review]: Go for it.
Attachment 218668 [details] pushed as e82fe14 - shell-global: remove unused shell_global_maybe_gc() Attachment 218669 [details] pushed as e756c2d - js: use System.gc() instead of shell_global_gc() Attachment 218670 [details] pushed as 85bc8cc - shell-global: remove shell_global_gc() Attachment 218671 [details] pushed as 6f60559 - global: don't run a garbage collection on tweeners end
Review of attachment 218671 [details] [review]: Woah. "This currently causes the shell to freeze very often in a thread deadlock". What changed? It certainly didn't when I wrote the code. This is very likely because of Giovanni's tracing changes, and has *nothing* to do with us calling GC here. Your commit message doesn't address at all the text in the comment you're deleting - namely the concerns about heap fragmentation. Now possibly we're calling the GC "too often", but if the code is locking up, figure out why and fix it. Don't treat the symptoms. See bug 679688 .
Sorry I don't regularly run 3.6, I know I suck...but if this locking-up-in-GC is a serious problem, we can trivially revert 06aa616a8c9b6d356aefc95a6b0c1c317b86c46a in gjs and re-land it when we have the fixes.
It's not due to the tracing stuff. Cosimo had a very reproducible crasher on rawhide, and taking out the gc() on leisure fixed it. Basically, when SpiderMonkey kicked off a GC thread around the same time we called JS_GC(), it deadlocked. I got deep in SpiderMonkey traces before realizing that the code just isn't meant to support that, and people in #jsapi confirmed that. We *really* shouldn't be doing this.
(In reply to comment #12) > It's not due to the tracing stuff. Cosimo had a very reproducible crasher on > rawhide, and taking out the gc() on leisure fixed it. > > Basically, when SpiderMonkey kicked off a GC thread around the same time we > called JS_GC(), it deadlocked. > > I got deep in SpiderMonkey traces before realizing that the code just isn't > meant to support that, and people in #jsapi confirmed that. We *really* > shouldn't be doing this. But how is doing that different from calling JS_GC() from looking glass? If we really shouldn't be doing it, should it be possible to invoke a GC at all while we're inside a request? The jsapi shell has API to do it last I looked. (Also, this doesn't address my concerns about memory fragmentation, the latent problems around having too much native-allocated memory that the GC isn't aware of, etc.)
And when you say "on rawhide", that very likely means someone uploaded a 1.33.3 tarball which includes Giovanni's problematic patch. So I still don't believe you that this *isn't* commit 06aa616a8c9b6d356aefc95a6b0c1c317b86c46a that introduced the problem.
Cosimo, can you please confirm that either applying 54d60c278a39ce681b35d172b675a4b2d4449b65 (or reverting 06aa616a8c9b6d356aefc95a6b0c1c317b86c46a) in gjs fixes the lockups (while also reverting your last patch?)
Yesterday I was testing this scenario with the stock shell from rawhide (3.5.3) and gjs master manually installed, and I could reproduce the lockup. In that situation, removing the manual GC from the shell solved the locks apparently. I am now trying to test various situations and this is the data I have so far: - shell 3.5.3 + gjs before tracing (1311a110db4c9bdf1c8538651dcb6e7cc91d6762) -> lockups - shell 3.5.3 + gjs master -> lockups - shell master + gjs master -> no lockups I am now testing one last scenario: shell 3.5.3 + cherry-pick of this patch + gjs before tracing, and it seems pretty stable so far...will follow up.
(In reply to comment #16) > I am now testing one last scenario: shell 3.5.3 + cherry-pick of this patch + > gjs before tracing, and it seems pretty stable so far...will follow up. For the record, I've been using this configuration all day and I didn't see any lockups.
Which patch is "this patch"?
(In reply to comment #12) > It's not due to the tracing stuff. Cosimo had a very reproducible crasher on > rawhide, and taking out the gc() on leisure fixed it. > > Basically, when SpiderMonkey kicked off a GC thread around the same time we > called JS_GC(), it deadlocked. > > I got deep in SpiderMonkey traces before realizing that the code just isn't > meant to support that, and people in #jsapi confirmed that. We *really* > shouldn't be doing this. Huh? From https://developer.mozilla.org/en/SpiderMonkey/JSAPI_Reference/JS_MaybeGC "Calling JS_MaybeGC when the application is idle can help prevent garbage collection from happening at less convenient times. Calling JS_MaybeGC periodically when the application is busy, from a JSBranchCallback or JSOperationCallback [Added in SpiderMonkey 1.8], can keep memory usage down and improve performance. Both are good practices. In both cases, frequent calls JS_MaybeGC are safe and will not cause the application to spend a lot of time doing redundant garbage collection work." IIRC Colin changed it to JS_GC instead because JS_MaybeGC can't really decide whether to run a GC or not in our case. But from that documentation doing a manual GC is supposed to work. Not sure whether this only applies to being called from a JSBranchCallback / JSOperationCallback but OTOH we have been doing that for a long time without anyone reporting lockups. So this looks like a recent regression.
We've had plenty of reported lockups. I don't know what made them be more frequent recently.
After discussion we have a theory that the code was always buggy, and the "what changed" here (why the lockups are more frequent) is simply that we started allocating enough garbage in animation cycles consistently that the GC will kick in and we'll block on it. We'll revisit this after rebasing on js187, which brings huge improvements in the GC.
*** Bug 678798 has been marked as a duplicate of this bug. ***