Bug 679832 – Remove manual garbage collection on tweeners end

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 679832 - Remove manual garbage collection on tweeners end


Summary:	Remove manual garbage collection on tweeners end


Status:	RESOLVED FIXED

Product:	gnome-shell
Classification:	Core
Component:	general
Version:	unspecified
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gnome-shell-maint
QA Contact:	gnome-shell-maint

URL:
Whiteboard:

Duplicates:	678798 (view as bug list)
Depends on:
Blocks:

Reported:	2012-07-13 00:11 UTC by Cosimo Cecchi
Modified:	2015-06-24 21:30 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
shell-global: remove unused shell_global_maybe_gc() (1.58 KB, patch) 2012-07-13 00:11 UTC, Cosimo Cecchi	committed	Details \| Review
js: use System.gc() instead of shell_global_gc() (1.99 KB, patch) 2012-07-13 00:11 UTC, Cosimo Cecchi	committed	Details \| Review
shell-global: remove shell_global_gc() (1.55 KB, patch) 2012-07-13 00:12 UTC, Cosimo Cecchi	committed	Details \| Review
global: don't run a garbage collection on tweeners end (1.18 KB, patch) 2012-07-13 00:12 UTC, Cosimo Cecchi	needs-work	Details \| Review

Description Cosimo Cecchi 2012-07-13 00:11:53 UTC

The shell currently often locks up when invoking a manual gc pass on the end of tweeners; Jasper thinks it's not necessary anymore, since the garbage collector is being fixed in gjs...but also, in my testing, this gets rid of the lockups.

Comment 1 Cosimo Cecchi 2012-07-13 00:11:55 UTC

Created attachment 218668 [details] [review]
shell-global: remove unused shell_global_maybe_gc()

It's not used anywhere, and we're removing the manual garbage collection
invocations anyway.

Comment 2 Cosimo Cecchi 2012-07-13 00:11:58 UTC

Created attachment 218669 [details] [review]
js: use System.gc() instead of shell_global_gc()

gjs now offers a gc() method in the System module, no need to use our
own.

Comment 3 Cosimo Cecchi 2012-07-13 00:12:01 UTC

Created attachment 218670 [details] [review]
shell-global: remove shell_global_gc()

It's unused now.

Comment 4 Cosimo Cecchi 2012-07-13 00:12:05 UTC

Created attachment 218671 [details] [review]
global: don't run a garbage collection on tweeners end

This currently causes the shell to freeze very often in a thread
deadlock, and the gjs garbage collector behavior is currently getting
fixed at the right level in gjs itself.

Comment 5 Jasper St. Pierre (not reading bugmail) 2012-07-13 00:13:29 UTC

Review of attachment 218668 [details] [review]:

Looks fine.

Comment 6 Jasper St. Pierre (not reading bugmail) 2012-07-13 00:13:38 UTC

Review of attachment 218669 [details] [review]:

Sure.

Comment 7 Jasper St. Pierre (not reading bugmail) 2012-07-13 00:13:47 UTC

Review of attachment 218670 [details] [review]:

Yes.

Comment 8 Jasper St. Pierre (not reading bugmail) 2012-07-13 00:14:03 UTC

Review of attachment 218671 [details] [review]:

Go for it.

Comment 9 Cosimo Cecchi 2012-07-13 00:16:31 UTC

Attachment 218668 [details] pushed as e82fe14 - shell-global: remove unused shell_global_maybe_gc()
Attachment 218669 [details] pushed as e756c2d - js: use System.gc() instead of shell_global_gc()
Attachment 218670 [details] pushed as 85bc8cc - shell-global: remove shell_global_gc()
Attachment 218671 [details] pushed as 6f60559 - global: don't run a garbage collection on tweeners end

Comment 10 Colin Walters 2012-07-13 00:18:48 UTC

Review of attachment 218671 [details] [review]:

Woah.  "This currently causes the shell to freeze very often in a thread
deadlock".  What changed?  It certainly didn't when I wrote the code.

This is very likely because of Giovanni's tracing changes, and has *nothing* to do with us calling GC here.

Your commit message doesn't address at all the text in the comment you're deleting - namely the concerns about heap fragmentation.

Now possibly we're calling the GC "too often", but if the code is locking up, figure out why and fix it.  Don't treat the symptoms.

See bug 679688 .

Comment 11 Colin Walters 2012-07-13 00:20:36 UTC

Sorry I don't regularly run 3.6, I know I suck...but if this locking-up-in-GC is a serious problem, we can trivially revert 06aa616a8c9b6d356aefc95a6b0c1c317b86c46a in gjs and re-land it when we have the fixes.

Comment 12 Jasper St. Pierre (not reading bugmail) 2012-07-13 00:27:04 UTC

It's not due to the tracing stuff. Cosimo had a very reproducible crasher on rawhide, and taking out the gc() on leisure fixed it.

Basically, when SpiderMonkey kicked off a GC thread around the same time we called JS_GC(), it deadlocked.

I got deep in SpiderMonkey traces before realizing that the code just isn't meant to support that, and people in #jsapi confirmed that. We *really* shouldn't be doing this.

Comment 13 Colin Walters 2012-07-13 02:11:17 UTC

(In reply to comment #12)
> It's not due to the tracing stuff. Cosimo had a very reproducible crasher on
> rawhide, and taking out the gc() on leisure fixed it.
> 
> Basically, when SpiderMonkey kicked off a GC thread around the same time we
> called JS_GC(), it deadlocked.
> 
> I got deep in SpiderMonkey traces before realizing that the code just isn't
> meant to support that, and people in #jsapi confirmed that. We *really*
> shouldn't be doing this.

But how is doing that different from calling JS_GC() from looking glass?  If we really shouldn't be doing it, should it be possible to invoke a GC at all while we're inside a request?

The jsapi shell has API to do it last I looked.

(Also, this doesn't address my concerns about memory fragmentation, the latent problems around having too much native-allocated memory that the GC isn't aware of, etc.)

Comment 14 Colin Walters 2012-07-13 02:12:51 UTC

And when you say "on rawhide", that very likely means someone uploaded a 1.33.3 tarball which includes Giovanni's problematic patch.  So I still don't believe you that this *isn't* commit 06aa616a8c9b6d356aefc95a6b0c1c317b86c46a that introduced the problem.

Comment 15 Colin Walters 2012-07-13 02:53:38 UTC

Cosimo, can you please confirm that either applying 54d60c278a39ce681b35d172b675a4b2d4449b65 (or reverting 06aa616a8c9b6d356aefc95a6b0c1c317b86c46a) in gjs fixes the lockups (while also reverting your last patch?)

Comment 16 Cosimo Cecchi 2012-07-13 16:07:13 UTC

Yesterday I was testing this scenario with the stock shell from rawhide (3.5.3) and gjs master manually installed, and I could reproduce the lockup. In that situation, removing the manual GC from the shell solved the locks apparently.

I am now trying to test various situations and this is the data I have so far:
- shell 3.5.3 + gjs before tracing (1311a110db4c9bdf1c8538651dcb6e7cc91d6762) -> lockups
- shell 3.5.3 + gjs master -> lockups
- shell master + gjs master -> no lockups

I am now testing one last scenario: shell 3.5.3 + cherry-pick of this patch + gjs before tracing, and it seems pretty stable so far...will follow up.

Comment 17 Cosimo Cecchi 2012-07-13 22:43:31 UTC

(In reply to comment #16)

> I am now testing one last scenario: shell 3.5.3 + cherry-pick of this patch +
> gjs before tracing, and it seems pretty stable so far...will follow up.

For the record, I've been using this configuration all day and I didn't see any lockups.

Comment 18 Jasper St. Pierre (not reading bugmail) 2012-07-13 22:48:30 UTC

Which patch is "this patch"?

Comment 19 drago01 2012-07-14 18:20:30 UTC

(In reply to comment #12)
> It's not due to the tracing stuff. Cosimo had a very reproducible crasher on
> rawhide, and taking out the gc() on leisure fixed it.
> 
> Basically, when SpiderMonkey kicked off a GC thread around the same time we
> called JS_GC(), it deadlocked.
> 
> I got deep in SpiderMonkey traces before realizing that the code just isn't
> meant to support that, and people in #jsapi confirmed that. We *really*
> shouldn't be doing this.

Huh?

From https://developer.mozilla.org/en/SpiderMonkey/JSAPI_Reference/JS_MaybeGC

"Calling JS_MaybeGC when the application is idle can help prevent garbage collection from happening at less convenient times. Calling JS_MaybeGC periodically when the application is busy, from a JSBranchCallback or JSOperationCallback [Added in SpiderMonkey 1.8], can keep memory usage down and improve performance. Both are good practices. In both cases, frequent calls JS_MaybeGC are safe and will not cause the application to spend a lot of time doing redundant garbage collection work."

IIRC Colin changed it to JS_GC instead because JS_MaybeGC can't really decide whether to run a GC or not in our case.

But from that documentation doing a manual GC is supposed to work. Not sure whether this only applies to being called from a JSBranchCallback / JSOperationCallback but OTOH we have been doing that for a long time without anyone reporting lockups. So this looks like a recent regression.

Comment 20 Jasper St. Pierre (not reading bugmail) 2012-07-14 19:13:26 UTC

We've had plenty of reported lockups. I don't know what made them be more frequent recently.

Comment 21 Colin Walters 2012-07-16 13:50:28 UTC

After discussion we have a theory that the code was always buggy, and the "what changed" here (why the lockups are more frequent) is simply that we started allocating enough garbage in animation cycles consistently that the GC will kick in and we'll block on it.

We'll revisit this after rebasing on js187, which brings huge improvements in the GC.

Comment 22 Cosimo Cecchi 2012-12-14 14:20:02 UTC

*** Bug 678798 has been marked as a duplicate of this bug. ***