GNOME Bugzilla – Bug 320428
Break PyGObject<->GObject reference cycle
Last modified: 2007-04-14 14:22:49 UTC
Since the PyGObject refs the GObject, and the GObject refs back the PyGObject, pure reference counting doesn't work for freeing PyGObjects. We have to wait for the cyclic GC to get off his butt and give a hand. I propose to move the PyGObject instance dictionary and closures to qdata stashed in the GObject, and make the reference graph one-way, from PyGObject to the GObject, but not in the reverse. This makes garbage collection immediate in most cases.
Created attachment 54196 [details] [review] patch, v1 Caveat: this patch breaks PyGTK C ABI (breaks extensions on top of PyGTK until they are recompiled)
Note that you still get reference loops for the "connect instance methods as signal handlers" use case: method -> python instance -> GObject -> closure -> method
Yes, I know. That's why I said "This makes garbage collection immediate in *most* cases." (not all cases). tp_traverse is still there to take care of these, if need be. But it is still useful. It eliminates quite a few gochas in pygtk. I am recalling the memory "problems" when dealing with large pixbufs, for example.
Created attachment 54201 [details] [review] patch, v2 This time it "should" keep API/ABI. But I haven't tested much..
I'm not convinced yet that this will work because the Python object may be DECREF'd out of existance even though the GObject continues to live on. Won't this trigger any __del__ method or weakref callbacks? What about Python subclasses that are not registered, so they don't create different types of GObject's? It would be good if something like this worked, but I'm not convinced it's possible yet.
Created attachment 54240 [details] [review] patch, v3 Removed tp_dictoffset=0, tp_getattro/tp_setattro hacks. Now it doesn't trigger a "instance layout conflict" error in gazpacho. In reply to John's comments, yes, it's true that "the Python object may be DECREF'ed out of existence even though the GObject continues to live on". And "this trigger any __del__ method or weakref callbacks" also true. However, the instance dictionary and closures list are preserved even when the PyObject is destroyed, and when a new wrapper is requested for the same GObject it is assigned the same instance dictionary as before.
Created attachment 54241 [details] [review] patch, v3.1, corrects a minor problem
What about the following pseudo-code; wouldn't it break, assuming the original PyObject is DECREF'd? def on_click(w): w.custom_method() class MyButton(gtk.Button): def custom_method(self): do_something() b = MyButton() b.connect('clicked', on_click) I think that a generic wrapper will be created when the clicked signal is emitted and the w.custom_method() call will fail.
You're right. I'll try to fix this problem...
Created attachment 54257 [details] [review] patch, v4, corrects custom subtype problem
Created attachment 54275 [details] [review] patch v4.1, missing GIL in pygobject_data_free, more unit tests
I have built breezy packages with the 4.1 patch for testing: http://telecom.inescporto.pt/~gjc/pygtk/
Patch seems to break meld: Traceback (most recent call last):
+ Trace 63898
ret = task()
self._disconnect_buffer_handlers()
assert hasattr(buf,"handlers") AssertionError
I still have doubts about this approach because it makes pygtk instances act less like other python objects. I wonder if it's possible to keep esseantially a weak ref from the GObject to the PyObject and if the PyObject's ref count goes to 0, to keep it in a ghost state in case other refs are needed. __del__ and weakref callbacks would not be invoked until the object is really destroyed. I haven't had time to trace through python to see it this would work, but I think it might.
I think that if the PyObject's ref count goes to 0, then it was already destroyed even before calling the weakref callback. But I'm not sure, I'll have to check... OTOH, I think you can 'revive' a PyObject from tp_dealloc, but I have to check.. Note that you already cannot use __del__ with PyGObjects because it makes automatic garbage collection stop working. See in the gc module documentation: "Objects that have __del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily in the cycle but reachable only from it.".
Created attachment 54358 [details] [review] patch, v5 (new approach) This new approach changes tp_dealloc to: 1. "revive" the python wrapper object; 2. add a weak ref to the GObject -> callback Py_DECREFs the wrapper again; 3. unref the GObject; Also pygobject_new cooperates in this scheme. When a wrapper already exists attached to a GObject: 1. remove the weak ref to the GObject; 2. add a strong ref to the GObject; 3. self->obj = obj This patch seems to work pretty well, _except_ that test_glade.py now fails. I added a test case to test that objects with circular references are being collected, and that passes, but somehow the glade test fails, while it was working before. I'm not sure whether it is a problem in the test program, or in libglade, or if something is wrong about the patch... :| Warning: lots of debug spew in this patch.
Gustavo: your patch is starting to sound a lot like the approach I took before the current cyclic GC approach. It was very fragile, and was difficult to get right for the case of Python subclasses (you'd essentially have to write the code so as to not trigger any of the cleanups done by subtype_dealloc(), and then recheck the code each time a new Python release came out).
With the metaclass that we have now, we should be able to avoid those problems: static int pygobjectmeta_init(PyTypeObject *subtype, PyObject *args, PyObject *kwargs) { if (PyType_Type.tp_init((PyObject *) subtype, args, kwargs)) return -1; + subtype->tp_dealloc = (destructor) pygobject_dealloc; return pygobjectmeta_register(subtype); } This way, subtype_dealloc is not used, even for custom subclasses.
The subtype_dealloc() routine is designed to perform any cleanup required by object features added to the subclass (I think it is in subtype_new(), or similar). Does the metaclass replace that code path entirely?
Ok, maybe it's better to not override subtype_dealloc. From what I see, subtype_dealloc doesn't do much for subtypes whose base classes already had weakrefs and inst_dict. It calls __del__ and clears __slots__. Both of them could simply be considered unsupported in pygtk objects. Then the base class tp_dealloc is called. Sounds reasonably harmless, at least now in Python 2.4.
Created attachment 54368 [details] [review] patch, v5.1 (fixed glade leak) OK. Found the leak! It was due to a unwanted Py_INCREF in pygobject_new_full when reviving a wrapper from a sleeping state. Patch seems perfect now. Removed all debugging spew, and added a new test. Also fixed the glade test; it had the self.assertEqual in an idle callback instead of in the test function, causing errors to be printed but otherwise ignored.
Created attachment 54370 [details] [review] patch, v5.2 (test_glade.py seems to be missing from CVS)
breezy packages for testing here: http://telecom.inescporto.pt/~gjc/pygtk/%23320428,%20patch%205.2/
Created attachment 54512 [details] [review] patch, v5.3 OK, one more patch following the same approach. However, I discovered that this breaks weak refs, as demonstrated by testGhostWeakref in tests/test_subtype.py. When a programmer gets a weak ref from an objet that is in an "hibernate" or "ghost" state, the link to the GObject should be changed from weak to strong ref. but since pygobject does not know when a weak reference is dereferenced (called), we have no trigger to make this conversion. So, in the end we have two approaches that break weak references in different ways. If we have to keep perfect API compatibility, neither patch is good enough. Of course, if we stipulate that weak refs were never part of the API, then... ;-)
We could perhaps offset the weak refs problem by disabling python weak refs on PyGObjects, and wrapping g_object_weak_ref/g_object_weak_unref. This would not be API compatible, but it would be easy to migrate applications. Instead of: myobjref = weakref.ref(myobj) [...] myobj = myobjref() They would be changed to: myobjref = myobj.weak_ref() [...] myobj = myobjref() I suppose weak ref proxy objects could also be done...
Created attachment 54536 [details] [review] patch, v5.4 (fixes weakref problem) OK, problem solved! ;-) I did something half evil: monkey patched weakref.tp_call :P
I asked about this in Python-Dev and received some feedback: http://mail.python.org/pipermail/python-dev/2005-November/057961.html At least Martin Löwis seems to think it's better to follow the approach of patch 4.x. Of course this means breaking weak refs. Would that be acceptable? I could create a new weakref type that wraps g_object_weak_ref/g_object_weak_unref to compensate..
That'd be okay for me, as long as its clearly stated somewhere. If possible, we should print a warning/raise an exception when someone is trying to create a weakref. I don't think we ever explicitly said that weakref were allowed.
Created attachment 54671 [details] [review] patch v4.2, fixes meld problem, wrap GObject weak refs So, back to the 4.x development direction, I fixed the problem with meld, although I was unable to find a test case for it. I also implemented a new weak_ref([callback, *args]) GObject method that works the same way as weakref.ref, but tracks the GObject instance instead of the wrapper.
Created attachment 54738 [details] [review] patch, v4.3, missing NULL terminator in pygobject_weak_ref_methods
I'd really love to include this in pygobject 2.9.x. jpe, I seem to recall you saying you wanted to review this? Well, API freeze is on next monday... I've been running a patched pygtk on my ubuntu system for about a month now, and didn't discover any broken apps so far.
Hmm, did you consider using the new gobject toggle ref support?
No. As far as I could remember, there were some limitations with the toggle refs. I think owen mentioned there could be no two different bindings wrapping the same object with toggle refs. I don't think I we should live with that limitation.
I was looking at it to see if python weak refs could be supported through it. The lack of python weak refs is still a problem in my mind, though not one that would keep this patch out. I think the inability to support multiple toggle refs is a problem in gobject and I wonder if this could be fixed. If this means that language bindings can't use toggle refs and toggle refs were added for language bindings, I think there's a problem. I didn't consider toggle refs when we were discussing this back in Nov.
The patch introduces gobject weak refs; surely a good substitute for the loss of python weak refs?.. Regarding toggle refs, it sounds like a complex solution, even if it could be fixed, and complex solutions tend to be more fragile...
I'd prefer python weak refs so there's one less interface for people to learn and to maintain. I also suspect the pygtk side of a toggle ref implementation to be simpler and the gobject code is fairly straightforward (at least for the current single toggle ref case)
To be honest, reimplementing the patch using toggle refs is out of the question for me. I've spent a lot of effort already on this, not only coding but also testing, and I'm not willing to restart from scratch.
Created attachment 57293 [details] [review] Patch using toggle ref api, v6.0 This uses the toggle ref api to synchronize the life cycle of the python object with the gobject object. The advantage is it supports python weakrefs and even __del__ methods in limited cases and also removes the need to visit self twice in the gc visit function. The disadvantage is that it doesn't play as well with other language runtimes because the pyobject always holds a ref to the gobject. This could be improved somewhat for trivial objects at the cost of complicating the patch. My bias is towards tighter python integration, but a case can be made for playing nice with other language runtimes. Note that there will be problems if multiple language runtimes do something nontrivial with the objects, such as connecting signal handlers. Solving this would require something akin to a cross-language cyclic gc, which is not going to happen any time soon.
The patch looks good; I especially love the simplicity of it all. I was expecting something far more complicated. :) But I do feel bad about limiting the scope of pygtk when interacting with other languages. For instance, if we wanted to share a GObject between python and perl, then (according to what I can interpret from the documentation) you basically create a leak, and the object will never be freed, since the toggle ref callback will never be called if there are multiple toggle refs. This problem will have repercussions in the long term future, especially in the GNOME development platform. In the past, if you wanted to make some code that can be shared by others, regardless of language preference, you had to do it in C. Nowadays, pygtk automatically registers types and makes sure they work even through g_object_new from C, so we're getting closer to allowing code in python to be used outside python. If we add toggle refs, we move backwards by not allowing python gobjects to be shared with most other languages. Pressure will increase again to write any kind of libraries in C. That's what I'm afraid of.
I think it's useful to distinguish between 2 mult-language scenarios: 1) When 2 languages & their runtimes (think Java & Python) both create cycles that include wrappers for the GObject. Here no amount of being careful with reference counts is going to help. This is probably not solvable short of everything using something like the jvm or clr. This was Owen's point in some of his initial emails about the toggle ref api. 2) When a language uses an object implemented in another language, it can create basically a proxy for the gobject and delete it when no code in the language directly references the objct. This is basically what approach 4.x does. I do have an open mind about this, particularly after looking at the various options. Maybe the proxy approach is better for objects that are primarily implemented in other languages. We are not going to solve problem #1, we can make it easier to cross the runtime boundaries.
Created attachment 57428 [details] [review] patch, v4.4 (update to cvs)
Created attachment 57442 [details] Failure due to the gc Attached test case fails w/ patch 4.4 because the gc detects a cycle due to the connected self.cb handler and clears the object in preparation for deleting it. The refcount of __dict__ can be increased by one to prevent it from beng collected, but the cycle shouldn't be collected at all since self.cb creates an instance method object with a pointer to self that will be used when notify is emitted. If we just don't traverse the closure list, then cycles like this may not be collected. btw, the toggle ref patch also has problems at shutdown which would need to fixed if that approach is taken
Note1: increasing refcount doesn't seem to help; Note2: when updating the patch I forgot to call pygobject_clear from pygobject_free, hence introducing a bug; Note3: Python GC is evil; it is uses tp_traverse's visitproc to decref objects.
Created attachment 57873 [details] [review] patch, v4.5 After INCREF'ing inst_dict and overriding tp_setattro to make it copy inst_dict to inst_data as soon as the dict is created, the jpe's test passes. Unfortunately, in this case the testGCCollection test stops working. I couldn't figure out any way to make both tests pass. Next on my TODO list: change refcount of __dict__ back to what it was and change tp_traverse to make it traverse dict items but not the dict itself. Not sure it will help, but...
I get a feeling 4.x path leads to a dead end, just like 5.x. Here's a different idea: implement a hybrid model, * Until __dict__ is created, we let the wrapper die before the GObject, but don't store __dict__ in the GObject, since it doesn't even exist, maybe just the wrapper type. The wrapper owns the GObject, but the GObject doesn't own the wrapper, in this state. * Once __dict__ is created (it's easy to detect by hooking to tp_setattro), we revert to the 'old' GC behaviour (which is the current one). I think 90% of the pygtk objects never actually have a __dict__ created. It is only created when 1) the __dict__ attribute is referenced, or 2) an attributed i s assigned to the object. It would at least take care of the dreaded "pixbuf leak" gotcha...
Okay, it's on to 7.0 model :). I think the hybrid model is the way to go and should use the toggle ref api because the current gc approach fails in the problematic toggle ref situations -- where another runtime holds a ref that won't be released until pygtk releases its ref. The python gc approach will never release its ref in this situation. btw, I'd say that 90% of interesting and probably 50% of the total pygtk objects I work with do have a __dict__ associated with them.
(In reply to comment #46) > Okay, it's on to 7.0 model :). I think the hybrid model is the way to go and > should use the toggle ref api because the current gc approach fails in the > problematic toggle ref situations -- where another runtime holds a ref that > won't be released until pygtk releases its ref. The python gc approach will > never release its ref in this situation. Are you sure? From the description in documentation that doesn't sound like that is true. The docs only mention a failure when toggle_ref is used twice. Although I didn't bother to understand how toggle refs really work underneath to find out why... :P > btw, I'd say that 90% of interesting and probably 50% of the total pygtk > objects I work with do have a __dict__ associated with them. Hm.. you must be heavily subclassing every pygtk class in sight :)
Assuming only one ref from the python runtime, the toggle refs fail if the reference count never reaches 1 because two runtimes both own a reference; the current pygtk gc method also fails if the ref count never reaches 1. An example of where this might occur is if both python and java ref an object, but each can't release the ref. Solving this is essentially cross runtime gc; it's not something that can be solved easily and is one of the unavoidable consequences of ref counting.
Created attachment 60131 [details] [review] patch, v7.0 (hybrid) New patch implements the 'hybrid' approach described earlier: the back reference from GObject to PyGObject doesn't exist initially, then a toggle ref is created as soon as the instance dict is created.
jpe, now that pygobject is branched, any objections to committing this (7.0)? It would be nice to commit early to CVS, so that it can be tested for a long time. If anything goes wrong, I suppose it can always be reverted later on.
Sorry, pygobject hasn't branched yet after all, so it waits a bit more. But you can starting thinking about it... ;)
jpe? This should go in real soon, please review...
Created attachment 69990 [details] [review] patch, v7.1 (hybrid, updated to CVS HEAD)
Created attachment 81331 [details] [review] Implement undead wrappers Here is another possible implementation for removing the cycle between PyObject and GObject. This is similar to how the gtk2-perl wrappers handle the situation. When the wrapper is created, it ref's the gobject, but not itself. When the wrapper goes out of scope, it checks if the gobject has more than 1 reference count. If it doesn't, both are freed as one would expect. If there are multiple references, then the wrapper becomes "undead" and steals a reference from the gobject. If the gobject gets unref'd while the wrapper is undead, the wrapper is freed by pyg_destroy_notify. If the wrapper is brought back to life, then it refs the gobject instead of itself (to compensate for the reference stolen earlier). As far as I can see, this patch would neither break ABI nor weak refs. Comments?
Steven, about your patch, been there, done that, check the previous patches and comments, in particular comment #17 and comment #13. We moved on to better code, now, which is just waiting for a window of opportunity to apply to SVN.
gjc@nazgul:pygobject$ svn ci -m "Bug 320428 – Break PyGObject<->GObject reference cycle" Sending ChangeLog Sending gobject/gobjectmodule.c Sending gobject/pygobject-private.h Sending gobject/pygobject.c Sending gobject/pygobject.h Sending gobject/pygtype.c Sending tests/test_subtype.py Transmitting file data ....... Committed revision 642.