GNOME Bugzilla – Bug 92955
gc.collect() destroys __dict__?
Last modified: 2013-02-03 21:40:27 UTC
Using today's CVS version. This test prog should display a window and print 42 when the window is closed. This works, unless we do a gc.collect() in which case it prints an error ("AttributeError: 'T' object has no attribute 'a'"). This error happens in more complex programs without gc.collect(). import pygtk; pygtk.require('2.0') import gtk import gc class T(gtk.Window): def __init__(self): gtk.Window.__init__(self) self.a = 42 self.connect('destroy', self.destroy) def destroy(self, w): print self.a gtk.mainquit() T().show() print "gc:", gc.collect() gtk.mainloop()
mass reassign of open pygtk and gnome-python bugs.
Instead of doing: T().show() do: t = T() t.show() T().show() seems to have the same effect as t = (); t.show(); del t. Is this really a problem with PyGtk and not the way python handle references to instances?
Obviously if you keep an extra reference to the object in a global variable then you can hide the reference counting bug. But it should be fixed; a reference to an object via a signal callback is just as valid as any other. Thanks,
I have the same kind of problem (I think). It happens with my DiaCanvas2 python bindings: when I create a new instance of an object (a python object, which is inherited from a DiaCanvas2 class written in C). I add some variables to the item and add the item to the canvas (to which I hold a reference). I release the reference to the item (but the item is referenced by the canvas). After a gc.collect() the dictionary of the item seems empty. Note that the item is still referenced by the canvas, so it's not deleted itself. I'll try to catch the problem in a test case.
Created attachment 14232 [details] Test case exposing the "__dict__ is empty" bug.
I've added a small test case which exposes the behavior. I've been adding quite some documentation to the test case. In short: it exposes itself when there are no (direct) references from the python main environment (read: your code) to a PyGObject object and the PyGObject owns a reference to your GObject. The problem can be described as: - we have a GObject 'win' and a GObject 'l', and their Python wrappers. - 'win' and 'l' are connected to each other on a GObject level (such as gtk_container_add() for example). - Now, we create a cyclic reference which includes 'l' (the GC is used to unwrap cyclic references). In this example this is done with the simple Python instance 'a' of class A. - Now we can remove our references to 'l' and 'a'. Since 'l' references 'a' and visa versa, there is no need to need to add an extra reference on 'l'.. - The garbage collector examines our objects and sees 'l' and 'a' being connected to each other and cyclic. No relation is found with class 'win'.. - 'l', 'a' and their dictionaries (l.__dict__) are set for garbage collection. Their 'tp_clear' functions are invoked: - 'l' clears its connected signals - 'a' has nothing to do - the dictionaries of both 'a' and 'l' are cleared (causing 'a' to be destroyed) Possible solution: - Do not traverse (tp_traverse) if the GObject has a refcount > 1. (note that the GC only clear cyclic references).
Created attachment 14241 [details] [review] Fix.
... There is one 'but' about this approach: GObjects should not be involved in cyclic references which are part GObject, part Python objects... I also updated the test case, showing proper output.
Here is my take on the cycles caused by the example given in the initial comment. Lets call the unnamed instance 't': Calling tp_traverse() on t after the T().show() call will traverse to the following objects: - t's __dict__ - the bound method object 't.destroy', which is used as a signal handler (code for this added in bug 71435). - an empty tuple as arguments to the signal. Calling tp_traverse() on 't.destroy' will traverse to: - t (the instance this method is bound to) - T.destroy (the unbound method). So the main cycle we have is t -> t.destroy -> t. The following references to 't' are held: - t.destroy's "im_self" pointer Because this reference isn't held, the hack in pygobject.c to keep the python wrapper alive doesn't trigger, so the C level GObject doesn't grab a reference. When the cycle GC runs, it sees that the only references to 't' are held by cycles, so calls the tp_clear() slot on 't', and other objects implementing cycle GC that are only alive because of 't'. This kills off the dict. (I am a bit surprised it didn't kill the signal handler connection too, as tp_clear() on 't' invalidates all the watched closures). Maybe we need to rethink the entire "single wrapper per GObject" implementation so that it looks less like a big hack. From talks with Owen a few years back, he described how the Perl bindings handle this. - each LB wrapper holds a reference to its corresponding GObject. - each GObject holds a reference to its corresponding LB wrapper (through g_object_set_data()) - maintain a linked list of Python wrappers - In a timeout hanlder, go through the list looking for wrappers with a reference count of 1 (which means that the ref is owned by the GObject). If it finds such a wrapper, it breaks the link between the wrapper and GObject so that they can both be released. This of course won't work quite right with Python cycle GC (it would represent a cycle that it doesn't know about, so it would mean that wrappers wouldn't be able to participate in cycle GC (however, it would get rid of the hackish tp_dealloc implementation we currently have). I wonder if it would be possible to inject knowledge that there is a "wrapper -> gobject -> wrapper" cycle, but the gobject has other references, into the Python cycle GC, as that would be the other way to handle it.
Been thinking a bit more about this problem, and I think I have come up with a possible solution. It uses some of Arjan's ideas, and should simplify things greatly. 1. since we want a single wrapper per GObject, it makes sense to have the wrapper hold a ref to the GObject and vice versa (creating a cycle). We know the following: refcount(wrapper)==1 -> no references to wrapper other than GObject. refcount(GObject)==1 -> no references to GObject other than wrapper. refcount(wrapper)==refcount(GObject)==1 -> wrapper and object should be released. 2. As the GObject is not a PyObject, we need some way to feed information about references to the GObject into Python's cycle GC. I think this can be represented by presence or absence of a direct cycle wrapper -> wrapper: if GObject has external references, don't add cycle if GObject has refcount==1, add cycle This way, the cycle GC will consider the reference the GObject holds on the wrapper a cycle if the GObject has no other references. So the plan would be to: - change things so that when setting up a wrapper for a GObject, the GObject gets a ref to the PyObject and vice versa - cut all the wrapper resurrection stuff out of pygobject_dealloc() - Add the following code to pygobject_traverse(): if (self->obj->ref_count == 1) ret = visit(self, arg); if (ret != 0) return ret; - Add code to pygobject_clear() to break the cycle between the GObject and wrapper. - Change pygobject_new() so that it does the following: if the GObject has no wrapper associated with it: create the wrapper (with reference cycle), and associate it with the GObject. Incref the wrapper associated with the GObject and return it. I'll need to think about this a bit more, but I think this should work quite well, and be less likely to break with new versions of Python. It does mean that PyGTK programs will require the cycle GC in order to not leak, but I don't think that is a problem. The default build for Python 2.2.x is with the GC, and Python 2.3 will require the cycle GC as well.
Created attachment 14788 [details] [review] patch with my proposed solution
The above patch implements my proposed solution. It would be useful if some people want to test this. I have only done a little testing, but will do some more. The patch reduces the number of lines of code, which is good: ChangeLog | 15 +++++ pygobject.c | 127 ++++++++++++++++++++------------------------------ pygobject.h | 1 3 files changed, 66 insertions(+), 77 deletions(-) Some of Arjan's older unit tests will probably not pass with this modification, but that is not necessarily a bad thing. In the usual case, the wrapper will have one extra reference.
Still debugging this one. I am using a modified version of Thomas' original example with a bit of instrumentation. When the gc runs (with debugging stats turned on), it prints: gc: collecting generation 2... gc: objects in each generation: 0 0 4364 gc: done, 2 unreachable, 0 uncollectable. The two objects are the wrapper and its instance dictionary. Neither is classed as uncollectable. For some reason, it doesn't seem to be calling the tp_clear() on my wrapper though, which means things aren't getting freed. However, the 'a' attribute in the instance dict goes away, so the dictionary's tp_clear() method must be getting called. If I turn on DEBUG_SAVEALL, the instance finds its way into gc.garbage list though.
*** Bug 107697 has been marked as a duplicate of this bug. ***
Created attachment 14812 [details] [review] fixed patch
New patch that actually works. pygobject_register_class() was not propagating the tp_clear() implementation for subclasses of GObject. This is why the tp_clear() function was not being called :) With the above patch, the garbage collector successfully freed the object. One difference with this patch applied is that subclasses with __del__ methods won't get freed. This is simply a limitation of the cycle GC. I am not sure how much of a problem this will be in practice though ...
I'm getting segfaults when testing jamesh's patch on a largeish app where we've seen the disappearing __dict__ bug. The beginning of the (very long) backtrace looks like:
+ Trace 34588
I tried running under valgrind, but didn't get much more useful information... just many, many warnings about an invalid free at the point where the above segfault occurred. The bit of code in pygobject_dealloc that is causing the problem says: /* the following causes problems with subclassed types */ if (self->ob_type->tp_free) self->ob_type->tp_free((PyObject *)self); else PyObject_GC_Del(self); Since our app is full of subclassed objects, the comment seems to be accurate. :)
Created attachment 14829 [details] [review] fix up the dealloc routine so we ignore the tp_free slot again
The above patch reverts that small part of the patch. John: I am not quite sure why it would cause problems for you unless the tp_free implementation for your subclass was incorrect. With the Python cycle GC, an extra header is added to the start of container objects, so calling the wrong free function will cause problems. Maybe the function you had in the tp_free slot was doing the wrong type of free?
How can tp_free be overriden? There's no field tp_free created in the TypeObject (by codegen.py). Is it default NULL then?
Checked the patch into CVS, so resolving this bug.
I've tested the fix in CVS and it takes care of the problem I'd been seeing. Thanks James.