GNOME Bugzilla – Bug 303266
uncollectable refount cycle
Last modified: 2012-02-10 08:18:42 UTC
The attached program shows an uncollectable cycle in pygtk. Whenever you click the button, one more such cycle is added to the consumed memory. Feel free to click the button until your whole computer happily swaps.
Created attachment 46098 [details] consume all memory. Run with ./test.py bigimage The cycle here is this: Parent: glib-refs child child: python-refs signal signal: python-refs parent
Minor observation: python tells me it is leaking 12 python object refs per button press.
The following patch fixes the memory leak. Of course it is an absolutely incorrect patch (may collect other objects too soon); it's just a small experiment. It confirms the theory that the cycle python(parent)->gobject(parent)->gobject(child)->python(parent) is causing the gobject refcount on the child gobject to be == 2 (one from parent gobject, another from python wrapper), and refcount == 2 makes the pygobject deceive the python GC to stay alive, although in this case erroneously. --- pygobject.c 27 Nov 2004 19:12:34 -0000 1.39 +++ pygobject.c 6 May 2005 22:14:25 -0000 @@ -506,7 +506,7 @@ pygobject_traverse(PyGObject *self, visi if (ret != 0) return ret; } - if (self->obj && self->obj->ref_count == 1) + if (self->obj && self->obj->ref_count <= 2) ret = visit((PyObject *)self, arg); if (ret != 0) return ret;
Created attachment 46111 [details] diagram showing what is happening
One possible solution that comes to mind would be to, when creating a PyGClosure, check if the callable is a python bound method, and if it is, instead of keeping a reference to the bound method, keep a reference to the corresponding unbound method plus a _weak reference_ to the method instance.
Bah.. never mind, it's a stupid idea. It fixes this scenario, but introduces other unwanted side-effects. I need sleep :-P
Created attachment 46126 [details] improved diagram showing why python cyclic GC fails
Created attachment 46127 [details] improved diagram showing why python cyclic GC fails (PNG)
Created attachment 46145 [details] alternate test case A modified test case; it shows that the problem is not specific to signal handlers. In this case, the signal handler is a plain function, but a cycle is created this time using a simple member variable pointing from the button to the box.
I don't see the pygtk bug -- every gtk.Widget that is created must be destroyed and the box is left undestroyed because it is removed from the window before the window is destroyed. Therefore it can leak memory. This is unfortunate and can lead to long debugging sessions to try and figure out where the leak is, but it is not solvable without modifying the C gtk code to add something like Python's cyclic GC.
No, this is not true. At least in the C code it is not necessary to call destroy() at all. Also note the documentation for gtk_object/widget_destroy, which says that this function is only used to explicitly break references. And requiring to explicitly remove references in Python sounds like a bad idea to me.
John: Note that while it is true that GtkWindows should be destroy()ed, that's only because gtk always holds the last reference to any GtkWindow, so unreffing it from user code will not destroy it. For any other widget, removing all references causes the widget to be destroy()ed automatically. Anyway, GtkWidget is only an example, but this could happen too for other GObject types. Any GObject that can contain/own other GObjects potentially sufferes from this bug. All you need is for the python wrapper for the contained object to keep some reference to the python wrapper of the container object. The reason why destroy() in GtkWidget fixes the leak is because it 1) destroys signal handlers, and 2) unlinks containers from its children. I also agree we shouldn't make the programmer depend on some magic destroy()-like method for GC to work. For one thing, it's complicated enough having to remember which objects need destroy() and which don't. I think gtk.Window() needing to be destroy()ed as an exception is acceptable, otherwise it's too complicated for python. I have not proposed any solution, but here are a couple of proposals: 1. Stop relying on Python's cyclic GC, and instead try to use the new "toggle references" being proposed now in glib; 2. Try to push into GObject a new tp_traverse-like virtual method, so that a GObject may report which child GObjects it "owns". Unfortunately neither is easy to implement... :|
Created attachment 46176 [details] [review] a fix using g_object_traverse
I agree it would be nice to solve this -- I've spent more time than I want to remember getting my custom tree models to not leak. My point was really that it can't be done in pygtk alone and changes to gobject and many other places would be required. I don't think the toggle references will solve this; I believe Owen identified this as one of the cases that toggle references would not solve in his initial proposal.
Note that there are reference cycles in certain GTK widgets that are only broken with gtk_object_destroy(), independent of the Python bindings. The simplest forming with GtkMenuItem and GtkAccelLabel: 1. the menu item holds a reference to the label contained inside 2. accel labels hold a reference to the widget they are tracking the shortcut key combo for in ->accel_widget. In the general case of a menu item holding an accel label that watches it, you have a fairly tight reference loop that is only broken when gtk_object_destroy() is called on one of the widgets. If I write a C function that removes such a menu item, destroys the window and retains no external references to the menu item, I'll have a leak too. Expecting arbitrary GTK programs to be leak free without destroy() would probably require adding functionality like the Python cycle GC, which seems out of the scope for pygtk.
OK, but I'd rather have scenarios where leaks happen without destroy() as exceptions and not the rule. Although the test case in this bug report is gtk based, I think it was opened because of a similar program in gst-python, right Benjamin? Is there a destroy() equivalent in the GStreamer objects that can fix the leak?
Yes, the problem originally comes from GStreamer. And no, GStreamer elements do not have a destroy function, and it is not necessary to do anything but release all references to GStreamer objects via g_object_unref. You could start requiring to call g_object_run_dispose or similar on all elements, but that doesn't sound like a good idea. (GStreamer internals following) I discovered the problem when setting pad functions like gst_pad_set_link/chain/get/getcaps_function. These are implemented as closures in gst-python, so the pad's PyGObject takes care of them. However, most of the time you are going to set element functions on the pad, so you end up with a cycle like this: element =gref=> pad =pyref=> function =pyref=> element On a sidenote, I'd consider refcount cycles in Gtk a bug and would always have expected that one reference in that cycle is a weak reference. (In the example case I'd have made the GtkAccelLabel weakref the accel_widget) Gtk should probably prominently state that all widgets require an explicit destroy.
Created attachment 46276 [details] example of collectable GStreamer element
Created attachment 46277 [details] example of uncollectable GStreamer element The attached files nicely show the difference between a collectable and an uncollectable GStreamer element: Move the chain function out of the element class. The elements are otherwise identical.
Created attachment 46289 [details] example of collectable GStreamer element Now here's the real one. And it's attached as text/plain so poor Epiphany users don't get their security threatened.
Unfortunately the required GObjectClass API didn't make it to glib 2.8 API freeze :-( Benjamin, if you have any suggestions, even hacks, now is the time to bring them forward :| I guess later when/if the new glib API is accepted we can deprecate any hacks we introduce now and use the real thing instead.
Comment on attachment 46098 [details] consume all memory. Run with ./test.py bigimage If you add a del self.pixbuf in the close() method this will stop leaking memory.
The question is not whether or not memory can be freed by help from the programmer. The issue is that the memory should be freed _automatically_, without resorting to del or destroy(). And the pixbuf is there only to "amplify" the memory leak; removing the pixbuf doesn't mean there's no leak, only that the ammount of memory leaked is too small be easily noticed by a resource monitor.
Ping?
Alas.. this needs API from GObject, that's why it hasn't been fixed yet :( Of course we could also add a pygobject specific hack... letting 3rd party modules register 'traverse' functions..
Just my $.02, but it took quite a while to get the bugs out of Python's GC (and there still might be some remaining) where it is easier to traverse most objects can just their traverse dictionaries or lists. A traverse for gobject would mean that nearly every object implementation would need a function that visited everything it holds a reference to. The better solution may be to design api's that don't rely on ref counting to release resources, e.g. destroy in GtkObject.
the main point in having g_object_dispose() is to have a function at the very base of all object implementations to break all object references. this was introduced precisely to help language bindings to break reference cycles. that's because implementing a traverse function that reveals all object references an object implemetnation uses was found to be much too hard to implement (especially with custom non-gtk objects and in the presence of *multiple* language bindings in the same process) and because relying solely on ref/unref simply isn't going to work to break cycles. i.e. Benjamin's assertion that in a cycle of references, one reference would always be weak isn't implementable in practice (and may even be undecidable in terms of which reference canonically had to be weak). trying to clear/break/collect cycles by bypassing g_object_dispose() will definitely fail, because no single entity (libgobject, gtk, python bindings, scheme bindings, etc...) can track *all* references that possibly form cycles. so use g_object_dispose() rigorously to get your issues fixed (which is also why gtk always demands _destroy() which is just the GtkObject name for _dispose()).
GObject.dispose is not currently exposed in pygobject. But if we were to wrap it, we'd also need to wrap GObjectClass->dispose, calling e.g. __gdispose__ in a python class. Then, for each class we had in python that contains other objects, it would need to define a __gdispose__ which would call dispose() on each contained object reference. I have no problem with that, but this starts to look a lot like C, not Python! Even Python developers are talking about getting rid of __del__ in Python 3; should we be thinking of adding dispose/__gdispose__ in pygobject? I wonder what do C#/Perl/lisp/whatever bindings do; do they wrap dispose?
well g_object_dispose() is usually used by just obejct implementations to get their destroy/destruct/delete/dispose/breakrefs/whateveryoucallit semantics straight. so for e.g. Gtk+, you'd not wrap and offer dispose, but destroy. For other object hierarchies this may vary, the respective hierarchy authors should know though.
Created attachment 118387 [details] adaptation of test program showing number and new objects According to this program, Python (2.5) leaks 5 objects per button press (6 new and garbage-collects 1 back). I just enhanced attachment 46145 [details] a bit.
codegen and the static bindings are obsolete since pygobject 3.0, which is GI only. Thanks!