GNOME Bugzilla – Bug 679688
Make GC much more aggressive
Last modified: 2018-01-27 11:51:03 UTC
This is a follow up of bug 678504. Essentially, our current GC strategies tend to root much more than necessary, and this results in high memory consumption (if not outright leaks, like bug 679376). I prepared a series of patches that address this. I've been running them for a while, and didn't see regressions. They introduce a subtle API change, as callbacks become rooted to the object they're installed on, and thus become invalid if nothing keeps the object alive but the callback. OTOH, it's yet to be proven that this patches are effective at reducing memory usage. Considering normal usage patterns, I saw little to no difference for gnome-shell (around 1000 objects in lg). I did observe a marked difference (up to 30/40 MB) in gnome-documents: memory is now correctly freed when closing a document. There are still leaks when opening and closing a collection, although that could be a lifetime issue in JS.
Created attachment 218421 [details] [review] Use a GTypeModule for registering custom types This makes so that the types are installed as dynamic types, and thus loaded and unloaded on demand. In particular, resources such as signal class closures are now properly released when the use count of these types drops to zero. Additionally, to avoid unloading too early, each GObject prototype has a now a reference on the GObjectClass. This means that the above only applies at context destruction, and preserves previous semantics wrt property and signal installation.
Created attachment 218422 [details] [review] Fix object counters Previously object counters were defined as int, which allowed them to be negative. Replacing with unsigned int revealed a number of places where we weren't incrementing properly, in particular in dynamic class definitions (prototypes are finalized, and thus would decrement the counter).
Created attachment 218423 [details] [review] Keep Alive: add a debugging method to print all active roots Adds a print_roots() to keep-alive objects (such as __gc_this_on_context_destroy, where GObjects are stored), which prints a dump of all objects known to it on stderr. This is primarily meant to debug GObject cycles and leaks that are not detected.
Created attachment 218424 [details] [review] Rework typechecking and private data access Previously, all type safety checks were inside priv_from_js, which returned NULL if the object was of the wrong class. Except that a lot of code was not checking for NULL, causing potential crashes. Also, we were only checking the JSClass, not the actual type of object, so for example a function accepting a boxed would take in any boxed instance, not just the right one (and then that could crash JS code). Rework this by splitting typechecking and private data access. Internal API for typechecking is provided by all object classes (object, boxed, union, param, gerror) and it takes the necessary amount of information (GType, GIBaseInfo) to ensure type safety. priv_from_js becomes unsafe, and thus ok to call in dangerous data paths such as tracing and finalization.
Created attachment 218425 [details] [review] Associate callbacks with the object on which their installed Callbacks installed with GObject methods are now traced from that object instead of being rooted, so cycles no longer cause leaks. This implementing by making GjsCallbackTrampoline use GjsClosure internally, and generalizing the existing code that dealt with signal connections. Also, to keep tests ok, we now have class finalization functions for custom types (to unregister the closures).
Created attachment 218426 [details] [review] GObject: don't use toggle references unless necessary Many GObject (such as widgets and actors) don't have JS state, but they're kept alive by C code. We can therefore save some memory by GCing these objects, and creating new wrappers when needed. If state is ever set, we transparently switch to toggle refs, so no change should be visible at the JS level.
Created attachment 218427 [details] [review] Assorted memory leak fixes Spotted by valgrind.
Review of attachment 218425 [details] [review]: The signals equivalent to this made total sense to me - signals are clearly associated with a given GObject. However, this patch feels too "magical" to me in that callbacks on functions which happen to be GObject methods get special treatment, but static methods and global functions don't. It's worth noting too that callbacks have a scope - all you can possibly be doing here is cleaning them up *earlier*. I'm also worried about it potentially breaking previously valid code. Can you give me a specific example of code that was "fixed" by this patch?
Review of attachment 218427 [details] [review]: ::: gi/object.c @@ +381,3 @@ old_parent = parent; parent = g_object_info_get_parent(old_parent); + g_base_info_unref(old_parent); This change I understand, makes sense. @@ +2487,3 @@ priv = priv_from_js(cx, obj); pspec = gjs_g_param_from_param(cx, pspec_js); + g_param_spec_ref(pspec); But this one doesn't look like a memory leak fix. What's going on? Something with the pspec being destroyed during a GC?
Review of attachment 218426 [details] [review]: This is a pretty interesting idea. Did you just think of it? Or is there some literature or other bindings you were looking at? I need to think about this one a bit.
Review of attachment 218421 [details] [review]: The original intention of the GTypeModule stuff was for dlopen(), but in reality I think everyone (a11y/input modules) switched to just keeping their type modules alive indefinitely because it's too insane to unload code. Repurposing it for JavaScript subclasses makes some sense. I don't see a problem with the code. If anything though the danger lies in the fact that g_type_module is barely used.
Review of attachment 218422 [details] [review]: Ok.
Review of attachment 218423 [details] [review]: Why not actually return the objects, rather than printing them? Then you could e.g. use it from looking glass, and/or write some JS code to filter them at runtime.
(In reply to comment #11) > Review of attachment 218426 [details] [review]: > > This is a pretty interesting idea. Did you just think of it? Or is there some > literature or other bindings you were looking at? > > I need to think about this one a bit. I copied the idea from PyGObject, which has been doing this for a while.
Review of attachment 218426 [details] [review]: This is an interesting idea, but I'm curious if it's worth the risk. I don't want to get into a scenario where we're constantly allocating/deallocating the same wrapper object because the GC keeps cleaning it up. Do we know where this would help? ::: gi/object.c @@ +294,3 @@ + if (!gjs_get_string_id(context, id, &name)) { + /* not resolved, but no error */ + ensure_uses_toggle_ref(context, obj); Can we put this in a similar path? goto set_expando; set_expando: /* We need to keep the wrapper object alive * in order to not lose expando properties. */ ensure_uses_toggle_ref(context, obj); return JS_TRUE; @@ +317,3 @@ ret = JS_FALSE; case NO_SUCH_G_PROPERTY: + ensure_uses_toggle_ref(context, obj); I didn't think you meant to set a toggle ref on SOME_ERROR_OCCURRED.
Review of attachment 218424 [details] [review]: ::: gi/boxed.c @@ +43,3 @@ typedef struct { GIBoxedInfo *info; + GType gtype; This should be a separate cleanup patch. Yes, I already know it's lying around somewhere, and I think I rejected it the first time around. Attach it again :)
Review of attachment 218424 [details] [review]: Conceptually most of this patch is changing callers of gjs_g_foo_from_foo() to also call a typecheck beforehand, which makes sense. However...you've also changed how the "dynamic js class" stuff works, and that's a very tricky part of the code. Can that be separated conceptually or are they really tied together? ::: gjs/jsapi-util.c @@ -62,3 @@ - JSClass base; - JSClass *static_class; -} DynamicJSClass; Like I said in the overall comment...this is a nontrivial change to a tricky part of the code. @@ +463,3 @@ + g_critical("Dynamic class must have a name in the JSClass struct"); + return NULL; + } Just use g_error() here. No reason to continue the process. @@ +509,3 @@ + class_copy = g_slice_new0(JSClass); + memcpy(class_copy, clasp, sizeof(JSClass)); This worries me a bit; is it really valid to just memcpy() JSClass instances? That may be one reason for the hoops the previous code went through. @@ +607,3 @@ g_assert(obj_class != NULL); + if (obj_class->finalize != static_clasp->finalize) { Er...can't we just check obj_class != static_clasp ? @@ -677,3 @@ - if (g_hash_table_lookup(rd->dynamic_classes, proto_class) == NULL) { - gjs_throw(context, "Prototype is not for a dynamically-registered class"); - goto error; This assertion just got deleted, is anything replacing it? Though it looks like gjs_construct_object_dynamic() is only called in one place which is probably safe, so maybe it's OK to just delete. ::: gjs/jsapi-util.h @@ +71,3 @@ +#define GJS_DEFINE_PRIV_FROM_JS(type, class) \ + __attribute__((unused)) static inline JSBool \ + do_typecheck(JSContext *context, \ Can you call this "do_base_typecheck"? Since we're extending it for object/boxed etc. @@ +107,3 @@ + { \ + return JS_GetPrivate(context, object); \ + } \ Do we even need to wrap this? I can see deleting it would significantly bloat the patch...maybe do that as a step 2?
(In reply to comment #15) > (In reply to comment #11) > > Review of attachment 218426 [details] [review] [details]: > > > > This is a pretty interesting idea. Did you just think of it? Or is there some > > literature or other bindings you were looking at? > > > > I need to think about this one a bit. > > I copied the idea from PyGObject, which has been doing this for a while. Is this commit 66cec4c99c32016388e8c968284f72c8bdbd0e62 ? Doing "git log -S g_object_add_toggle_ref" in pygobject only shows two commits.
This point: http://git.gnome.org/browse/pygobject/tree/gi/_gobject/pygobject.c#n560
Review of attachment 218421 [details] [review]: I don't understand the purpose of this. Is the idea that for something like this: function makeSubClass() { const MySubClass = new Lang.Class({ Extends: GObject.Object }); return MySubClass; } that when the MySubClass wrapper is GC'd, the GType class is too? I can't think of a case where this would be handy immediately, besides some crazy GDBus magic, as usual. I'm also curious how this would work if this was: function makeSubClass() { const MySubClass = new Lang.Class({ Extends: GObject.Object }); return MySubClass.$gtype; } What will the GType wrapper do in this case?
Review of attachment 218423 [details] [review]: Having the objects at runtime sounds like a bad idea. I don't think we should fall into the same pitfall that got Python wrt. gc.get_objects().
(In reply to comment #22) > Review of attachment 218423 [details] [review]: > > Having the objects at runtime sounds like a bad idea. I don't think we should > fall into the same pitfall that got Python wrt. gc.get_objects(). What's the pitfall?
(In reply to comment #21) > Review of attachment 218421 [details] [review]: > > I don't understand the purpose of this. The purpose of this is much more mundane: without it, tests with fixed counters don't pass, because the closure installed by GObject.signal_override_class_closure in GObject.Class is never finalized (so tests correctly report we leak memory) Doing any of the magic you mention is wrong, and would break badly, with or without this patch. GTypes, once registered, are there forever, and the GTypeModule is only about calling class_finalize when the GjsContext is teared down (which is a bad idea)
This depends on 678504 to re-land.
(In reply to comment #24) > Doing any of the magic you mention is wrong, and would break badly, with or > without this patch. GTypes, once registered, are there forever, and the > GTypeModule is only about calling class_finalize when the GjsContext is teared > down (which is a bad idea) I wonder if we could make custom types use g_type_register_dynamic, and if it would be worth it.
(In reply to comment #23) > What's the pitfall? (Sorry, email shenanigans prevented me from seeing this) If the mark and sweep is running, finds a few leaves, and you call this, the object might be freed out from under you. There's also programs that do disgusting things with gc.objects() (Google for more info), and I do not want to encourage that by making a System.getGCObjects();. If someone is debugging, chances are they just want debug spew, and in that case, they can modify the "print objects" function to do what they want.
(In reply to comment #27) > (In reply to comment #23) > > What's the pitfall? > > (Sorry, email shenanigans prevented me from seeing this) > > If the mark and sweep is running, finds a few leaves, and you call this, the > object might be freed out from under you. But if objects can be freed out from under us, can't that also happen while we're just printing them? > If someone is debugging, chances are they just want debug spew, and in that > case, they can modify the "print objects" function to do what they want. Perhaps...but I'd be happier with that if we had some better logging framework than ~/.xsession-errors. To use this with the login gnome-shell (instead of using --replace from a terminal) would be annoying.
(In reply to comment #8) > Review of attachment 218425 [details] [review]: > > The signals equivalent to this made total sense to me - signals are clearly > associated with a given GObject. > > However, this patch feels too "magical" to me in that callbacks on functions > which happen to be GObject methods get special treatment, but static methods > and global functions don't. It's worth noting too that callbacks have a scope > - all you can possibly be doing here is cleaning them up *earlier*. > > I'm also worried about it potentially breaking previously valid code. > > Can you give me a specific example of code that was "fixed" by this patch? As I mentioned before, this was totally unscientific, so no, I don't have specific examples. Generally speaking though, all code paths that doesn't have "remove callback" API and just expect to clean it up on object dispose (or worse, finalize) are prone to this problem. (In reply to comment #9) > Review of attachment 218427 [details] [review]: > > @@ +2487,3 @@ > priv = priv_from_js(cx, obj); > pspec = gjs_g_param_from_param(cx, pspec_js); > + g_param_spec_ref(pspec); > > But this one doesn't look like a memory leak fix. What's going on? Something > with the pspec being destroyed during a GC? I don't remember. Probably a combination of that and finalizing classes due to the GTypeModule. The issue is, g_object_class_install_properties takes ownership of the GParamSpec, so it must be referenced if it's not newly created. (In reply to comment #18) > Review of attachment 218424 [details] [review]: > > Conceptually most of this patch is changing callers of gjs_g_foo_from_foo() to > also call a typecheck beforehand, which makes sense. However...you've also > changed how the "dynamic js class" stuff works, and that's a very tricky part > of the code. Can that be separated conceptually or are they really tied > together? I guess I could split the typecheck part to use the previous DynamicClass system, as priv_from_js becomes unprotected anyway, so it doesn't need JS_BeginRequest to lock the GHashTable. > @@ +509,3 @@ > > + class_copy = g_slice_new0(JSClass); > + memcpy(class_copy, clasp, sizeof(JSClass)); > > This worries me a bit; is it really valid to just memcpy() JSClass instances? > That may be one reason for the hoops the previous code went through. JSClass is just a vtable, and it's meant to be statically allocated, so I'd say memcpy is fine (as long as it's contents are duplicated properly). The JS runtime never frees JSClasses. > @@ +607,3 @@ > g_assert(obj_class != NULL); > > + if (obj_class->finalize != static_clasp->finalize) { > > Er...can't we just check obj_class != static_clasp ? No, obj_class is a different pointer than static_clasp. We need to compare contents to see if obj_class was memcpy'ed from static_clasp. > @@ -677,3 @@ > - if (g_hash_table_lookup(rd->dynamic_classes, proto_class) == NULL) { > - gjs_throw(context, "Prototype is not for a dynamically-registered > class"); > - goto error; > > This assertion just got deleted, is anything replacing it? Though it looks > like gjs_construct_object_dynamic() is only called in one place which is > probably safe, so maybe it's OK to just delete. gjs_construct_object_dynamic, as well as JS_ConstructObject, need to go, replaced by JS_NewObjectWithGivenProto or JS_New. They're deprecated in modern JS API anyway. At that point, we can replace JS_InitClass, as what it does (for non standard classes) is essentially: - create a prototype with JS_NewObjectWithGivenProto - create a constructor with JS_NewFunction - link the constructor and the prototype with appropriate properties This is something we already kind of do at the JS level (in lang.Class and overrides.GObject.Class), so doing it with JSAPI doesn't look bad, and it means we can use the same JSClass for all constructor/prototype pairs and further reduce memory usage.
(In reply to comment #27) > (In reply to comment #23) > > What's the pitfall? > > (Sorry, email shenanigans prevented me from seeing this) > > If the mark and sweep is running, finds a few leaves, and you call this, the > object might be freed out from under you. There's also programs that do > disgusting things with gc.objects() (Google for more info), and I do not want > to encourage that by making a System.getGCObjects();. > > If someone is debugging, chances are they just want debug spew, and in that > case, they can modify the "print objects" function to do what they want. Those objects can't be freed, because it's not a list of all known JS objects, just a list of gjs-managed roots (which, by definition, cannot be freed). Also, once the object is outside the JS native, it becomes reachable and therefore won't be collected, even if removed from the KeepAlive.
We went a bit back and forth on this, and in the meanwhile I wrote some more code on the dynamic class stuff. I'm going to attach here what I've got, and then we decide what to do. All unit tests pass, and I've been using gnome-shell under it for a while with no issues. Full branch at github (master).
Created attachment 220593 [details] [review] GIRepositoryFunction: don't install a resolve hook GI functions don't have lazy properties, so they don't need a resolve hook. This is mainly a code cleanup.
Created attachment 220594 [details] [review] Boxed: cache the GType in the internal structure Accessing the GType from the GIRegisteredTypeInfo is not free, while caching it in the instance structure is also cleaner.
Created attachment 220595 [details] [review] Rework typechecking and private data access Previously, all type safety checks were inside priv_from_js, which returned NULL if the object was of the wrong class. Except that a lot of code was not checking for NULL, causing potential crashes. Also, we were only checking the JSClass, not the actual type of object, so for example a function accepting a boxed would take in any boxed instance, not just the right one (and then that could crash JS code). Rework this by splitting typechecking and private data access. Internal API for typechecking is provided by all object classes (object, boxed, union, param, gerror) and it takes the necessary amount of information (GType, GIBaseInfo) to ensure type safety. priv_from_js becomes unsafe, and thus ok to call in dangerous data paths such as tracing and finalization.
Created attachment 220596 [details] [review] GObject: don't use toggle references unless necessary Many GObject (such as widgets and actors) don't have JS state, but they're kept alive by C code. We can therefore save some memory by GCing these objects, and creating new wrappers when needed. If state is ever set, we transparently switch to toggle refs, so no change should be visible at the JS level.
Created attachment 220597 [details] [review] Don't use deprecated JSCLASS_CONSTRUCT_PROTOTYPE flags Not only that flag is removed in newer versions of libjs, it will break with the new dynamic class system.
Created attachment 220598 [details] [review] Don't use JS_ConstructObject JS_ConstructObject is deprecated with modern JSAPI, and should be replaced by JS_NewObject and JS_New. Also, while we're there, replace constructors with functions that throw, as it is generally an error to call them from JS anyway.
Created attachment 220599 [details] [review] Rework dynamic class system Stop creating multiple JS classes on demand, and instead use the same JS class and create multiple constructors with different prototypes, but hand-rolling our own version of JS_InitClass, that also avoids polluting the global object. This means that JS_ConstructObject is no longer usable with dynamic classes, use gjs_construct_object_dynamic or JS_New instead.
Review of attachment 220593 [details] [review]: I swore I removed this already. Good to go.
Review of attachment 220594 [details] [review]: Yes.
Attachment 218421 [details] pushed as 235a3c0 - Use a GTypeModule for registering custom types Attachment 218422 [details] pushed as b055704 - Fix object counters Attachment 220593 [details] pushed as e8a9d36 - GIRepositoryFunction: don't install a resolve hook Attachment 220594 [details] pushed as dfbcc35 - Boxed: cache the GType in the internal structure
Review of attachment 220595 [details] [review]: This looks like an awesome cleanup. I like it.
Review of attachment 220597 [details] [review]: This patch looks like it's missing something...there's a new jsapi-dynamic-class.c but we're not deleting the old functions?
(In reply to comment #43) > Review of attachment 220597 [details] [review]: > > This patch looks like it's missing something...there's a new > jsapi-dynamic-class.c but we're not deleting the old functions? Bad rebase, jsapi-dynamic-class.c is part of the next patch (Rework dynamic class system)
Comment on attachment 220595 [details] [review] Rework typechecking and private data access Attachment 220595 [details] pushed as 6ed5e40 - Rework typechecking and private data access
So, where do we stand on landing this? I understand the aggressive GC part might or might not be an improvement, but the dynamic class patches are a good cleanup, IMHO. FWIW, this is what I've been using for quite a while, so I strongly believe they should not regress.
Review of attachment 220599 [details] [review]: This is a major cleanup too...I'm fine with landing this. Just one comment/question: ::: gjs/jsapi-dynamic-class.c @@ +81,3 @@ + JSObject *global; + /* Force these variables on the stack, so the conservative GC will + find them */ Hmm...if we had to "volatile" everything on the stack, we'd run into a lot more bugs. The system must support scanning from the CPU registers too; looking at the js185 source, it appears to reuse setjmp(). See ConservativeGCThreadData::recordStackTop Right?
Review of attachment 220598 [details] [review]: Makes sense.
Review of attachment 220596 [details] [review]: Remind me about this one a bit later after everything else has landed...I need a bit of time to study it.
(In reply to comment #47) > Review of attachment 220599 [details] [review]: > > Hmm...if we had to "volatile" everything on the stack, we'd run into a lot more > bugs. The system must support scanning from the CPU registers too; looking at > the js185 source, it appears to reuse setjmp(). See > ConservativeGCThreadData::recordStackTop > > Right? Honestly, boh! I think we use explicit rooting everywhere else, but I'm ok with removing volatile (after all, they'll be in the stack anyway when registers are saved for the JSAPI call)
Attachment 220597 [details] pushed as 99ddd29 - Don't use deprecated JSCLASS_CONSTRUCT_PROTOTYPE flags (it was a prerequisite for the two other patches, and had no substantial review comment) Attachment 220598 [details] pushed as b483b9a - Don't use JS_ConstructObject Attachment 220599 [details] pushed as 349ea62 - Rework dynamic class system
Owen, can I pull in your brain here to look at "GObject: don't use toggle references unless necessary (7.24 KB, patch)"?
Review of attachment 218423 [details] [review]: We're using SpiderMonkey's persistent roots instead of keep alive now, so this does not apply anymore. This debugging feature will hopefully be fulfilled in bug 780106.
Comment on attachment 218427 [details] [review] Assorted memory leak fixes Marking this one obsolete; the param code doesn't exist anymore and the other memory leak was already fixed.
Created attachment 359950 [details] [review] object: Associate callbacks with the object on which they're installed Callbacks installed with GObject methods are now traced from that object instead of being rooted, so cycles no longer cause leaks. This implementing by making GjsCallbackTrampoline use GjsClosure internally, and generalizing the existing code that dealt with signal connections. Also, to keep tests ok, we now have class finalization functions for custom types (to unregister the closures).
Comment on attachment 218425 [details] [review] Associate callbacks with the object on which their installed I rebased this patch and re-attached it. It passes all the tests with GC verification enabled, so it should be safe to commit this, especially early in the cycle. On the other hand, like Colin's review said, it's not clear to me what it actually improves... Giovanni, do you remember?
Created attachment 359952 [details] [review] object: Associate callbacks with the object on which they're installed Callbacks installed with GObject methods are now traced from that object instead of being rooted, so cycles no longer cause leaks. This implementing by making GjsCallbackTrampoline use GjsClosure internally, and generalizing the existing code that dealt with signal connections. Also, to keep tests ok, we now have class finalization functions for custom types (to unregister the closures).
Created attachment 359953 [details] [review] object: don't use toggle references unless necessary Many GObjects (such as widgets and actors) don't have JS state, but they're kept alive by C code. We can therefore save some memory by GCing these objects, and creating new wrappers when needed. If state is ever set, we transparently switch to toggle refs, so no change should be visible at the JS level.
In any case, it might be good to run gnome-shell with these for a while before committing them.
In fact, what would really be cool is to use js::DumpHeap() to output the total heap size, instead of the heap graph, and use that to make a massif-like graph over time, then run some common gnome-shell operations under that in order to determine if this makes any difference.
(In reply to Philip Chimento from comment #56) > Comment on attachment 218425 [details] [review] [review] > Associate callbacks with the object on which their installed > > I rebased this patch and re-attached it. It passes all the tests with GC > verification enabled, so it should be safe to commit this, especially early > in the cycle. On the other hand, like Colin's review said, it's not clear to > me what it actually improves... > > Giovanni, do you remember? Basically it helps avoiding cycles that would cause an object to become uncollectable. Before the patch, if you attach a callback to an object (using a function pointer, not a signal), that object becomes uncollectable until you remove the callback. If an app writes is not paying attention, this can easily cause a leak. After the patch, the callback becomes traced from the object. If both the callback and the object are unreachable, they can both be collected, even if the callback is still attached. The risk is that the C library wants to use the callback after the object is disposed (eg. it saves the callback to a static variable). In that case, if the object is dead the C library will see a NULL pointer. Hopefully C libraries don't do that. Practically, I don't know how big of a difference this makes. Callbacks attached to objects are mostly used to replace signals (for speed or for better gobject introspectability), so they are not that common.
Review of attachment 359952 [details] [review]: LGTM
Review of attachment 359953 [details] [review]: LGTM
Review of attachment 359953 [details] [review]: Tried this on a different machine and I get a test failure in installed-tests/js/testGDBus.js: (process:4843): Gjs-CRITICAL **: Object 0x1bc80a0 (a GDBusConnection) resurfaced after the JS wrapper was finalized. This is some library doing dubious memory management inside dispose() Gjs-Message: JS LOG: Acquired name null So, still needs some looking into apparently.
Comment on attachment 359952 [details] [review] object: Associate callbacks with the object on which they're installed Attachment 359952 [details] pushed as 5db1bb3 - object: Associate callbacks with the object on which they're installed
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gjs/issues/62.