GNOME Bugzilla – Bug 321552
Double entries in trackable list when binding a slot to a slot
Last modified: 2011-02-22 10:09:26 UTC
This is a rather hard bug to describe - it's easier to read the testcase I've just written - but the basic idea is as follows: 1) You have a method 'foo' on a trackable object that takes a slot as a parameter. 2) You want to use another method 'bar' on that object as that slot. 3) You build a slot out of 'foo' and bind a slot of 'bar' at the same time. 4) You connect this slot to some signal. 5) The object goes away. Something magically wrong happens at step 3 causing each of the slots to appear in the objects notify target list twice. When the object goes away, it notifies each slot once. Then one of the slots (not sure which yet) notifies the signal, which in turn releases and deletes the slot. Then the object notifies that slot the second time causing an invalid memory access. Due to the layout of all this in memory, there is no segfault but valgrind will complain. Interesting observations: 1) If the bound slot is on another object, there is no problem. 2) If the bound slot is first created and assigned to a named variable and then this variable is bound, there is no problem. *Only* the inline case exhibits this problem. So, the trivial workaround is to assign the bound slot first. I will continue to investigate but this looks very strange.
I am clearly not paying attention. This is a libsigc++ bug.
Created attachment 54808 [details] Testcase for the binding problem
Maybe the copying in this patch has caused this somehow, though it's a wild guess: http://bugzilla.gnome.org/show_bug.cgi?id=166217 You could test a version of libsigc++ before the patch was applied.
While I could understand why it might be responsible, that's not the case here; I just tested with 2.0.9 and the problem still existed back then.
Philip, The only difference I can see between the normal and workaround cases is: o In the normal case, storage for the bound slot is allocated on the stack (it is pushed) right before the call to bind(), and it is freed (it is poped) right after the call to bind(). So the storage does not exist anymore at the time 'foo' is deleted. o In the workaround case, storage for the bound slot ('two') is allocated on the stack upon entering main(). So the storage still exists at the time 'foo' is deleted. Valgrind is not perfect: it only knows about accesses to the address space, it does not know when objects are constructed or destructed. I believe that the bug actually happens in _both_ cases, but it is only detected by valgrind in the normal case (where a memory access occurs beyond the end of the stack), and it is not detected in the workaround case (where a memory access occurs inside the stack, so it appears valid to valgrind although it happens after the object that was stored there has been destructed). In other words, I don't think the workaround case is an actual workaround. I wanted to verify this theory by myself, by crafting a more careful workaround case: allocate 'two' on the _heap_ right before the call to bind(), then delete it after the call to bind(). I believe this will help valgrind detect the problem even in the "workaround" case. Unfortunately, I'm unable to reproduce the problem on my system (which uses gcc 4.0.3) under valgrind, i.e. there is nothing after the "DoFoo" line in valgrind's output. Can you please verify my theory on your system?
Régis. I get no errors with two on the heap.
I see no valgrind errors. I'm using g++ 4.0.2-4ubuntu3 (in Ubuntu dapper - unstable, which also has valgrind 3.0.1-debian). I get "Do Foo" as output.
Actually it looks like I too have g++ 4.0.3 (the other version is the package version).
Hmm. The implication then seems to be that it's a g++ error that has since been fixed. I'm using 3.3.6 here and we use 3.3.3 for the binary that we ship, so we can't ignore it yet. As we're going to bump our compiler in due time, that will be how we fix the problem.
> The implication then seems to be that it's a g++ error that has since been fixed. I wouldn't rely on that. Could you upload the valgrind output, even if it isn't completely useful, please? If we can reproduce the problem then surely we should be able to discover when/how it is happening. Slots as bound slot parameters: You really are exercising the API.
Reopening, just in case.
Created attachment 55549 [details] Valgrind log output As requested.
Created attachment 174656 [details] tar file with modified trackable.cc and valgrind output The bug reports 321552 (this one) and 589202 "Segfault when destroying a sigc::trackable referenced multiple times in a sigc::slot" are very similar, and the solution that I propose in bug 589202, comment 8, works also for the test case in 321552. The attached archive file contains: - trackable.cc: The same modified version as attached to bug 589202. Some extra output shows when callback functions are added to and removed from trackable::callback_list_. It also shows when the callback functions are called. By activating a #define directive, trackable::remove_destroy_notify_callback can be modified as described in bug 589202. - output1.txt: Output when the test case in comment 2 is run without the work- around, with only the output activated in trackable.cc. Error reported even without valgrind. - valgrind-output1.txt: Output when the test case in comment 2 is run without the workaround, with only the output activated in trackable.cc. valgrind reports accesses to freed memory. - valgrind-output2.txt: With the workaround, with only the output activated in trackable.cc. - valgrind-output3.txt: Without the workaround, trackable::remove_destroy_notify_callback modified. It can remove (or actually disable) entries in the list of callback functions while trackable::notify_callbacks is executing. - valgrind-output4.txt: With the workaround, trackable::remove_destroy_notify_callback modified. In valgrind-output[234].txt, valgrind reports no errors, neither reading from freed memory nor memory leaks. I've run the tests on Ubuntu 10.10, with a version of libsigc++ that I cloned from the master branch in Git on November 5, 2010.
Many thanks for being so thorough. I'm at a conference now but I hope to have time to go over this properly in a couple of days.
We think this is all fixed as part of the fix in bug #589202 . *** This bug has been marked as a duplicate of bug 589202 ***