GNOME Bugzilla – Bug 651795
GC-related crash in pygobject_dealloc
Last modified: 2011-06-06 13:51:45 UTC
Created attachment 189168 [details] [review] the fix We are encountering an occasional Python crash under Sugar (written in Python based on pygobject2 and pygtk2 static bindings), which can be triggered simply by interacting with widgets, launching programs, etc. The sugar process exits with this error: python: Modules/gcmodule.c:262: update_refs: Assertion `gc->gc.gc_refs != 0' failed. This error occurs when Python is doing garbage collection (triggered during allocation of new objects when the number of recent allocations passes a threshold). The comment above this assertion in the Python source code neatly explains a possible cause of this: (Modules/gcmodule.c in Python-2.7.1) /* Python's cyclic gc should never see an incoming refcount * of 0: if something decref'ed to 0, it should have been * deallocated immediately at that time. * Possible cause (if the assert triggers): a tp_dealloc * routine left a gc-aware object tracked during its teardown * phase, and did something-- or allowed something to happen -- * that called back into Python. gc can trigger then, and may * see the still-tracked dying object. Before this assert * was added, such mistakes went on to allow gc to try to * delete the object again. In a debug build, that caused * a mysterious segfault, when _Py_ForgetReference tried * to remove the object from the doubly-linked list of all * objects a second time. In a release build, an actual * double deallocation occurred, which leads to corruption * of the allocator's internal bookkeeping pointers. That's * so serious that maybe this should be a release-build * check instead of an assert? */ And the backtrace from this issue strongly suggests we're looking at the same thing: http://bugzilla.redhat.com/attachment.cgi?id=409514 The backtrace of thread 1 shows that in frame #21 we are inside the tp_dealloc function for pygobect (pygobject_dealloc), and the calls that continue on from that clearly show it "calling back into Python" (PyObject_ClearWeakRefs -> PyEval_EvalFrameEx -> _PyObject_GC_New -> _PyObject_GC_Malloc) and causing further allocations to happen, which occasionally trigger the GC, leading to this crash. pygobject_dealloc clears weak refs before asking the GC to untrack the object: PyObject_ClearWeakRefs((PyObject *)self); PyObject_GC_UnTrack((PyObject *)self); yet the stack trace above shows that ClearWeakRefs is clearly a case of "calling into python" which leads to further allocations, meaning that we've done exactly what the comment told us not to do. We should untrack the object first. I found an example of a python module that calls UnTrack before ClearWeakRefs: http://www.google.com/codesearch/p?hl=en#3UmC8RnwYSc/greenlet/greenlet.c&l=653 pygi gets it right in 3 places too: pygi-struct.c _struct_dealloc pygi-boxed.c _boxed_dealloc pygi-info.c _base_info_dealloc So I'm convinced this is the right thing to do. Testing the attached patch while launching and closing a small Sugar app from a script for over an hour indicates that it does indeed solve the problem; previously this test case would trigger the crash in less than 10 minutes. If this passes review, I'm happy to commit (I have access)
For completeness, the 3(!) bug trackers where this issue has been looked at: http://bugzilla.redhat.com/show_bug.cgi?id=640972 http://dev.laptop.org/ticket/10724 http://bugs.sugarlabs.org/ticket/2064
Comment on attachment 189168 [details] [review] the fix But just to play safe, please make sure that 'make check' passes before pushing it.
pushed as 92aca4416a7930e5, thanks!