GNOME Bugzilla – Bug 589202
Segfault when destroying a sigc::trackable referenced multiple times in a sigc::slot
Last modified: 2011-02-22 10:33:06 UTC
Steps to reproduce: If a sigc::trackable object is referenced multiple times in a slot (e.g.: sigc::bind(sigc::mem_fun(this, &Foo::bar), sigc::ref(*this)) or sigc::bind(sigc::mem_fun(this, &Foo::bar), sigc::mem_fun(this, &Foo::baz)) ), if that slot is connected to a sigc::signal, destroying that object causes sigc::internal::slot_rep::destroy() to be called on the slot_rep object *after* leaving ~slot_rep(). I will attach sample code that demonstrates this problem. This code crashes readily when compiled on Windows (compiled with MSVC 2008 and run on Windows XP x64 SP2). This code does not crash when run on Linux, but I believe that this is because the Windows debug runtime fills freed memory with a non-zero bit-pattern, and Linux manages to be saved by slot_rep::destroy()'s "if (destroy_) ..." check. Through logging code that I added to libsigc++, I did verify that the slot_rep::destroy() is called as described above for both Windows and Linux. I've reproduced this with both libsigc++ 2.0.17 and with 2.2.3. One workaround is to explicitly call sigc::connection::disconnect() before destroying the trackable object. Stack trace: Other information:
Created attachment 138874 [details] Sample code that exhibits the crash on Windows.
Created attachment 138875 [details] Debugging output This is the output I get when I added some logging code to libsigc++ 2.2.3 (compiled with gcc 3.2.2, run on Linux). The addresses shown are those of the corresponding 'this' pointers (or in the case of the static sigc::internal::typed_slot_rep::destroy function, the value of its 'data' argument). You can see that in each case, sigc::internal::slot_rep::destroy is called after sigc::internal::slot_rep::~slot_rep() exited on the same object.
This is the call stack from the offending call to sigc::internal::slot_rep::destroy: feeefeee sigc::internal::slot_rep::destroy() Line 98 C++ sigc::internal::slot_rep::notify() Line 63 C++ sigc::internal::trackable_callback_list::~trackable_callback_list() Line 89 C++ sigc::internal::trackable_callback_list::`scalar deleting destructor'() C++ sigc::trackable::notify_callbacks() Line 67 C++ sigc::trackable::~trackable() Line 52 C++ main() Line 70 C++ (This is on Windows with libsigc++ 2.2.3.)
I'd like to see valgrind's memcheck output for this, probably on Linux.
The same now happens with G++ 4.5 on Linux. foo(float 1.234) 6 /bin/sh: line 4: 6870 Segmentation fault ${dir}$tst FAIL: test_retype_return with a similar backtrace: foo(float 1.234) 6 Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7bdad4a in sigc::internal::trackable_callback_list::add_callback ( this=0x60, data=0x6050b0, func=0x4010e8 <_ZN4sigc8internal8slot_rep6notifyEPv@plt>) at trackable.cc:94 94 if (!clearing_) // TODO: Is it okay to silently ignore attempts to add dependencies when the list is being cleared? (gdb) bt
+ Trace 220563
I built libsigc++ with -O0 -g here. Valgrind reports foo(float 1.234) 6 ==2558== Invalid read of size 1 ==2558== at 0x4E32D4A: sigc::internal::trackable_callback_list::add_callback(void*, void* (*)(void*)) (trackable.cc:94) with the same backtrace. The analysis of the reporter looks correct to me.
Ugh. I just hit this one too, I think. I have a class like so: class Foo : public Glib::Object { public: ... Gtk::EventBox eventBox; sigc::connection enterNotify; }; and then a method like this: bool Bar::OnArrowEnter(GdkEventCrossing* ev, Foo* foo) { gdk_window_set_cursor(ev->window, Gdk::Cursor(Gdk::HAND2).gobj()); foo->enterNotify.disconnect(); return false; } and is connected like this: foo->enterNotify = foo->eventBox.signal_enter_notify_event().connect( sigc::bind(sigc::mem_fun(this, &Bar::OnArrowEnter), foo)); While this doesn't crash in Linux, it does g_log this in the console window: GLib-GObject: instance of invalid non-instantiatable type `(null)' GLib-GObject: g_signal_handler_is_connected: assertion `G_TYPE_CHECK_INSTANCE (instance)' failed and gdb shows this stack:
+ Trace 221018
If I'm off and this isn't related, I apologize. It seemed similar in nature in that if I don't pass the Glib::Object class into the same method that it's connecting too, then I don't see this at all.
This doesn't crash for me on Ubuntu Karmic with g++ 4.4.1, but I do see some valgrind warnings like this: ==24681== Invalid read of size 4 ==24681== at 0x4044F0A: sigc::internal::slot_rep::notify(void*) (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x4044810: sigc::internal::trackable_callback_list::~trackable_callback_list() (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x404489A: sigc::trackable::notify_callbacks() (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x40448DC: sigc::trackable::~trackable() (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x8048EBE: main (in /home/murrayc/a.out) ==24681== Address 0x42c6210 is 8 bytes inside a block of size 52 free'd ==24681== at 0x402454D: operator delete(void*) (vg_replace_malloc.c:346) ==24681== by 0x404522E: sigc::slot_base::~slot_base() (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x40445C6: std::list<sigc::slot_base, std::allocator<sigc::slot_base> >::erase(std::_List_iterator<sigc::slot_base>) (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x4044217: sigc::internal::signal_impl::sweep() (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x404424F: sigc::internal::signal_impl::notify(void*) (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x4044ED6: sigc::internal::slot_rep::disconnect() (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x4044F24: sigc::internal::slot_rep::notify(void*) (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x4044810: sigc::internal::trackable_callback_list::~trackable_callback_list() (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x404489A: sigc::trackable::notify_callbacks() (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x40448DC: sigc::trackable::~trackable() (in /usr/lib/libsigc-2.0.so.0.0.0) ==24681== by 0x8048EBE: main (in /home/murrayc/a.out)
Created attachment 174496 [details] tar file with modified trackable.cc and valgrind output The attached archive file contains - trackable.cc: A modified version of the file, with some extra output that shows when callback functions are added to and removed from trackable::callback_list_. It also shows when the callback functions are called. By activating a #define directive, trackable::remove_destroy_notify_callback can be modified as described later. - valgrind-output1.txt: Output when the test case in comment 1 is run with only the output activated in trackable.cc. valgrind reports accesses to freed memory. - valgrind-output2.txt: trackable::remove_destroy_notify_callback has been modified. It can remove (or actually disable) entries in the list of callback functions while trackable::notify_callbacks is executing. valgrind reports no errors, neither reading from freed memory nor memory leaks. The cause of the problem in this bug report is that an object (a slot_rep, I believe) is registered twice in the list of callback functions in another trackable object. When the program is finished, and all destructors are called, slot_rep::notify is called twice with the same data (a pointer to slot_rep). When it's called the first time, it tries to unregister itself from the calling object, but trackable::remove_destroy_notify_callback does nothing while trackable::notify_callbacks is executing. During this first call of slot_rep::notify the referenced slot_rep is deleted. When slot_rep::notify is called a second time, the referenced slot_rep does not exist. Since it clears most of itself before it's deleted, the second call does not do much harm, provided the freed memory is still cleared. There is no request to delete the slot_rep object a second time, but the freed memory is read. The source code of the test case contains two commented calls to connection::disconnect. If they are uncommented, the valgrind errors disappear from those parts of the test. That's because trackable::remove_destroy_notify_callback accepts all requests for removal. I can think of some alternative solutions to this bug. 1. Let trackable::add_destroy_notify_callback check if the callback function is already in the list of callback functions, and in that case, don't insert a second copy. Possible problems: Can we be sure that there is no application that depends on the present behavior? A callback function (not in libsigc++) that needs to be called twice with the same data? In an object with a very long list of callback functions, the time needed for checking can be significant. The time to add n functions will be O(n^2) instead of O(n). (^ denotes exponentiation.) 2. Let trackable::remove_destroy_notify_callback remove entries when clearing_ is set, i.e. while trackable::notify_callbacks is executing. I have tested this solution and it works for the test case in comment 1. Actually, entries are not removed, since that could invalidate the iterator used in ~trackable_callback_list. Instead an entry is disabled by clearing the pointer trackable_callback::data_. Possible problem: If a callback is registered with data==0, it will never be called. If there are callbacks that use data for e.g. an index, this can be a problem. It's probably safer to disable a callback by setting trackable_callback::func_ to 0. What about comment 6? I don't know if it's the same bug. The source code in that comment is not a complete compilable program. I haven't tried to fill in the gaps. Bug 564005 "Valgrind errors and crash on exit with Gtk::UIManager" shows similar symptoms to this one, but the cause of the problem is not the same, as far as I can tell. When I run the test case in bug 564005 with my modified trackable.cc, it does not show any doubly registered objects.
Could you provide a git patch? That would be much easier for me to review.
Ah I see that your file was just for debugging purposes. 2 sounds fairly sensible. Could you try that?
Created attachment 181566 [details] [review] patch: Avoid calling the same callback function twice. Here's a patch. It fixes this bug and bug 321552. One of those bugs can be a duplicate of the other, I think. I don't know if this patch fixes the problem with the code in comment 6, though. I have not tried to make a complete program out of those code snippets. But the gdb trace shows that Glib::SignalProxyConnectionNode is involved. Therefore it's perhaps more related to bug 154498 (glibmm) or bug 564005 (gtkmm) with similar symptoms (accesses to deleted objects). Those bugs are not fixed by this patch.
Wow, you are fast. Thanks. I have pushed that. Let's see how that works for us and then make a tarball release later if everything seems OK. I don't like touching this code much. Jason, could you file your bug separately, please, with a complete (but small-as-possible) test case?
*** Bug 321552 has been marked as a duplicate of this bug. ***
(In reply to comment #12) > Wow, you are fast. Thanks. Not quite as fast as you might think. I began working on the patch yesterday, and hadn't even noticed your comment 10 when I filed the patch. And you pushed it really fast.