Bug 589202 – Segfault when destroying a sigc::trackable referenced multiple times in a sigc::slot

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 589202 - Segfault when destroying a sigc::trackable referenced multiple times in a sigc::slot


Summary:	Segfault when destroying a sigc::trackable referenced multiple times in a sig...


Status:	RESOLVED FIXED

Product:	libsigc++
Classification:	Bindings
Component:	general
Version:	2.2.x
Hardware:	Other All

Importance:	Normal critical
Target Milestone:	---
Assigned To:	Martin Schulze
QA Contact:	Martin Schulze

URL:
Whiteboard:

Duplicates:	321552 (view as bug list)
Depends on:
Blocks:

Reported:	2009-07-21 03:53 UTC by James Lin
Modified:	2011-02-22 10:33 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Sample code that exhibits the crash on Windows. (1.62 KB, text/plain) 2009-07-21 03:54 UTC, James Lin		Details
Debugging output (6.66 KB, text/plain) 2009-07-21 04:02 UTC, James Lin		Details
tar file with modified trackable.cc and valgrind output (3.51 KB, application/x-compressed-tar) 2010-11-15 08:58 UTC, Kjell Ahlstedt		Details
patch: Avoid calling the same callback function twice. (2.48 KB, patch) 2011-02-22 09:48 UTC, Kjell Ahlstedt	none	Details \| Review

Description James Lin 2009-07-21 03:53:54 UTC

Steps to reproduce:
If a sigc::trackable object is referenced multiple times in a slot (e.g.:

   sigc::bind(sigc::mem_fun(this, &Foo::bar),
              sigc::ref(*this))

or

   sigc::bind(sigc::mem_fun(this, &Foo::bar),
              sigc::mem_fun(this, &Foo::baz))

), if that slot is connected to a sigc::signal, destroying that object causes sigc::internal::slot_rep::destroy() to be called on the slot_rep object *after* leaving ~slot_rep().

I will attach sample code that demonstrates this problem.  This code crashes readily when compiled on Windows (compiled with MSVC 2008 and run on Windows XP x64 SP2).  This code does not crash when run on Linux, but I believe that this is because the Windows debug runtime fills freed memory with a non-zero bit-pattern, and Linux manages to be saved by slot_rep::destroy()'s "if (destroy_) ..." check.

Through logging code that I added to libsigc++, I did verify that the slot_rep::destroy() is called as described above for both Windows and Linux.  I've reproduced this with both libsigc++ 2.0.17 and with 2.2.3.

One workaround is to explicitly call sigc::connection::disconnect() before destroying the trackable object.


Stack trace:


Other information:

Comment 1 James Lin 2009-07-21 03:54:52 UTC

Created attachment 138874 [details]
Sample code that exhibits the crash on Windows.

Comment 2 James Lin 2009-07-21 04:02:11 UTC

Created attachment 138875 [details]
Debugging output

This is the output I get when I added some logging code to libsigc++ 2.2.3 (compiled with gcc 3.2.2, run on Linux).  The addresses shown are those of the corresponding 'this' pointers (or in the case of the static sigc::internal::typed_slot_rep::destroy function, the value of its 'data' argument).

You can see that in each case, sigc::internal::slot_rep::destroy is called after sigc::internal::slot_rep::~slot_rep() exited on the same object.

Comment 3 James Lin 2009-07-21 04:20:00 UTC

This is the call stack from the offending call to sigc::internal::slot_rep::destroy:

feeefeee	
sigc::internal::slot_rep::destroy() Line 98	C++
sigc::internal::slot_rep::notify() Line 63	C++
sigc::internal::trackable_callback_list::~trackable_callback_list() Line 89	C++
sigc::internal::trackable_callback_list::`scalar deleting destructor'()	C++
sigc::trackable::notify_callbacks() Line 67	C++
sigc::trackable::~trackable() Line 52	C++
main() Line 70	C++

(This is on Windows with libsigc++ 2.2.3.)

Comment 4 Murray Cumming 2009-12-29 12:44:57 UTC

I'd like to see valgrind's memcheck output for this, probably on Linux.

Comment 5 rguenther 2010-02-14 15:11:53 UTC

The same now happens with G++ 4.5 on Linux.

foo(float 1.234)
6
/bin/sh: line 4:  6870 Segmentation fault      ${dir}$tst
FAIL: test_retype_return

with a similar backtrace:

foo(float 1.234)
6

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bdad4a in sigc::internal::trackable_callback_list::add_callback (
    this=0x60, data=0x6050b0, 
    func=0x4010e8 <_ZN4sigc8internal8slot_rep6notifyEPv@plt>)
    at trackable.cc:94
94        if (!clearing_)  // TODO: Is it okay to silently ignore attempts to add dependencies when the list is being cleared?
(gdb) bt

+ Trace 220563

#0 sigc::internal::trackable_callback_list::add_callback
at trackable.cc line 94
#1 sigc::trackable::add_destroy_notify_callback
at trackable.cc line 56
#2 sigc::internal::slot_do_bind::operator()
at ../sigc++/functors/slot_base.h line 148
#3 sigc::internal::with_type_pointer<true, bar, sigc::internal::limit_derived_target<sigc::trackable*, sigc::internal::slot_do_bind> >::execute_
at ../sigc++/visit_each.h line 88
#4 sigc::internal::limit_derived_target<sigc::trackable*, sigc::internal::slot_do_bind>::operator()<bar>
at ../sigc++/visit_each.h line 100
#5 sigc::visit_each<sigc::internal::limit_derived_target<sigc::trackable*, sigc::internal::slot_do_bind>, bar>
at ../sigc++/visit_each.h line 144
#6 sigc::visit_each<sigc::internal::limit_derived_target<sigc::trackable*, sigc::internal::slot_do_bind>, bar>
at ../sigc++/adaptors/adaptor_trait.h line 267
#7 sigc::visit_each<sigc::internal::limit_derived_target<sigc::trackable*, sigc::internal::slot_do_bind>, void, bar>
at ../sigc++/adaptors/retype_return.h line 279
#8 sigc::visit_each_type<sigc::trackable*, sigc::internal::slot_do_bind, sigc::retype_return_functor<void, bar> >
at ../sigc++/visit_each.h line 170
#9 sigc::internal::typed_slot_rep<sigc::retype_return_functor<void, bar> >::typed_slot_rep
at ../sigc++/functors/slot.h line 43
#10 sigc::internal::typed_slot_rep<sigc::retype_return_functor<void, bar> >::dup
at ../sigc++/functors/slot.h line 77
#11 sigc::internal::slot_rep::dup
at ../sigc++/functors/slot_base.h line 104
#12 sigc::slot_base::operator=
at functors/slot_base.cc line 119
#13 sigc::slot1<void, int>::operator=
at ../sigc++/functors/slot.h line 540
#14 sigc::slot<void, int, sigc::nil, sigc::nil, sigc::nil, sigc::nil, sigc::nil, sigc::nil>::operator=
at ../sigc++/functors/slot.h line 1091
#15 main
at test_retype_return.cc line 46


I built libsigc++ with -O0 -g here.


Valgrind reports

foo(float 1.234)
6
==2558== Invalid read of size 1
==2558==    at 0x4E32D4A: sigc::internal::trackable_callback_list::add_callback(void*, void* (*)(void*)) (trackable.cc:94)

with the same backtrace.

The analysis of the reporter looks correct to me.

Comment 6 Jason 'vanRijn' Kasper 2010-03-19 23:03:25 UTC

Ugh. I just hit this one too, I think. I have a class like so:

   class Foo
      : public Glib::Object
   {
   public:
      ...
      Gtk::EventBox eventBox;
      sigc::connection enterNotify;
   };

and then a method like this:

bool
Bar::OnArrowEnter(GdkEventCrossing* ev, Foo* foo)
{
   gdk_window_set_cursor(ev->window, Gdk::Cursor(Gdk::HAND2).gobj());
   foo->enterNotify.disconnect();
   return false;
}

and is connected like this:

    foo->enterNotify =
       foo->eventBox.signal_enter_notify_event().connect(
         sigc::bind(sigc::mem_fun(this, &Bar::OnArrowEnter),
                    foo));


While this doesn't crash in Linux, it does g_log this in the console window:

GLib-GObject: instance of invalid non-instantiatable type `(null)'
GLib-GObject: g_signal_handler_is_connected: assertion `G_TYPE_CHECK_INSTANCE (instance)' failed

and gdb shows this stack:

+ Trace 221018

#0 g_log
from /lib/libglib-2.0.so.0
#1 g_type_check_instance
from /usr/lib/libgobject-2.0.so.0
#2 g_signal_handler_is_connected
from /usr/lib/libgobject-2.0.so.0
#3 Glib::SignalProxyConnectionNode::notify(void*)
from /usr/lib/libglibmm-2.4.so.1
#4 sigc::internal::slot_rep::notify(void*)
from /usr/lib/libsigc-2.0.so.0
#5 sigc::internal::trackable_callback_list::~trackable_callback_list()
from /usr/lib/libsigc-2.0.so.0
#6 sigc::trackable::notify_callbacks()
from /usr/lib/libsigc-2.0.so.0
#7 ~Bar


If I'm off and this isn't related, I apologize. It seemed similar in nature in that if I don't pass the Glib::Object class into the same method that it's connecting too, then I don't see this at all.

Comment 7 Murray Cumming 2010-04-16 08:11:17 UTC

This doesn't crash for me on Ubuntu Karmic with g++ 4.4.1, but I do see some valgrind warnings like this:

==24681== Invalid read of size 4
==24681==    at 0x4044F0A: sigc::internal::slot_rep::notify(void*) (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x4044810: sigc::internal::trackable_callback_list::~trackable_callback_list() (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x404489A: sigc::trackable::notify_callbacks() (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x40448DC: sigc::trackable::~trackable() (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x8048EBE: main (in /home/murrayc/a.out)
==24681==  Address 0x42c6210 is 8 bytes inside a block of size 52 free'd
==24681==    at 0x402454D: operator delete(void*) (vg_replace_malloc.c:346)
==24681==    by 0x404522E: sigc::slot_base::~slot_base() (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x40445C6: std::list<sigc::slot_base, std::allocator<sigc::slot_base> >::erase(std::_List_iterator<sigc::slot_base>) (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x4044217: sigc::internal::signal_impl::sweep() (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x404424F: sigc::internal::signal_impl::notify(void*) (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x4044ED6: sigc::internal::slot_rep::disconnect() (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x4044F24: sigc::internal::slot_rep::notify(void*) (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x4044810: sigc::internal::trackable_callback_list::~trackable_callback_list() (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x404489A: sigc::trackable::notify_callbacks() (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x40448DC: sigc::trackable::~trackable() (in /usr/lib/libsigc-2.0.so.0.0.0)
==24681==    by 0x8048EBE: main (in /home/murrayc/a.out)

Comment 8 Kjell Ahlstedt 2010-11-15 08:58:21 UTC

Created attachment 174496 [details]
tar file with modified trackable.cc and valgrind output

The attached archive file contains

- trackable.cc: A modified version of the file, with some extra output that
  shows when callback functions are added to and removed from
  trackable::callback_list_. It also shows when the callback functions are
  called.
  By activating a #define directive, trackable::remove_destroy_notify_callback
  can be modified as described later.

- valgrind-output1.txt: Output when the test case in comment 1 is run with
  only the output activated in trackable.cc. valgrind reports accesses to
  freed memory.

- valgrind-output2.txt: trackable::remove_destroy_notify_callback has been
  modified. It can remove (or actually disable) entries in the list of
  callback functions while trackable::notify_callbacks is executing.
  valgrind reports no errors, neither reading from freed memory nor memory
  leaks.

The cause of the problem in this bug report is that an object (a slot_rep,
I believe) is registered twice in the list of callback functions in another
trackable object. When the program is finished, and all destructors are called,
slot_rep::notify is called twice with the same data (a pointer to slot_rep).
When it's called the first time, it tries to unregister itself from the calling
object, but trackable::remove_destroy_notify_callback does nothing while
trackable::notify_callbacks is executing. During this first call of
slot_rep::notify the referenced slot_rep is deleted. When slot_rep::notify
is called a second time, the referenced slot_rep does not exist. Since it
clears most of itself before it's deleted, the second call does not do much
harm, provided the freed memory is still cleared. There is no request to delete
the slot_rep object a second time, but the freed memory is read.

The source code of the test case contains two commented calls to
connection::disconnect. If they are uncommented, the valgrind errors disappear
from those parts of the test. That's because
trackable::remove_destroy_notify_callback accepts all requests for removal.

I can think of some alternative solutions to this bug.

1. Let trackable::add_destroy_notify_callback check if the callback function
   is already in the list of callback functions, and in that case, don't insert
   a second copy.
   Possible problems: Can we be sure that there is no application that depends
   on the present behavior? A callback function (not in libsigc++) that needs
   to be called twice with the same data?
   In an object with a very long list of callback functions, the time needed for
   checking can be significant. The time to add n functions will be O(n^2)
   instead of O(n). (^ denotes exponentiation.)

2. Let trackable::remove_destroy_notify_callback remove entries when clearing_
   is set, i.e. while trackable::notify_callbacks is executing.
   I have tested this solution and it works for the test case in comment 1.
   Actually, entries are not removed, since that could invalidate the iterator
   used in ~trackable_callback_list. Instead an entry is disabled by clearing
   the pointer trackable_callback::data_.
   Possible problem: If a callback is registered with data==0, it will never
   be called. If there are callbacks that use data for e.g. an index, this can
   be a problem. It's probably safer to disable a callback by setting
   trackable_callback::func_ to 0.

What about comment 6? I don't know if it's the same bug. The source code in
that comment is not a complete compilable program. I haven't tried to fill
in the gaps.

Bug 564005 "Valgrind errors and crash on exit with Gtk::UIManager" shows similar
symptoms to this one, but the cause of the problem is not the same, as far as
I can tell. When I run the test case in bug 564005 with my modified
trackable.cc, it does not show any doubly registered objects.

Comment 9 Murray Cumming 2011-02-21 10:55:45 UTC

Could you provide a git patch? That would be much easier for me to review.

Comment 10 Murray Cumming 2011-02-22 09:21:51 UTC

Ah I see that your file was just for debugging purposes.

2 sounds fairly sensible. Could you try that?

Comment 11 Kjell Ahlstedt 2011-02-22 09:48:19 UTC

Created attachment 181566 [details] [review]
patch: Avoid calling the same callback function twice.

Here's a patch. It fixes this bug and bug 321552. One of those bugs can be
a duplicate of the other, I think.

I don't know if this patch fixes the problem with the code in comment 6,
though. I have not tried to make a complete program out of those code snippets.
But the gdb trace shows that Glib::SignalProxyConnectionNode is involved.
Therefore it's perhaps more related to bug 154498 (glibmm) or bug 564005
(gtkmm) with similar symptoms (accesses to deleted objects). Those bugs are not
fixed by this patch.

Comment 12 Murray Cumming 2011-02-22 10:08:12 UTC

Wow, you are fast. Thanks.

I have pushed that. Let's see how that works for us and then make a tarball release later if everything seems OK. I don't like touching this code much.

Jason, could you file your bug separately, please, with a complete (but small-as-possible) test case?

Comment 13 Murray Cumming 2011-02-22 10:09:26 UTC

*** Bug 321552 has been marked as a duplicate of this bug. ***

Comment 14 Kjell Ahlstedt 2011-02-22 10:33:06 UTC

(In reply to comment #12)
> Wow, you are fast. Thanks.

Not quite as fast as you might think. I began working on the patch yesterday, and hadn't even noticed your comment 10 when I filed the patch. And you pushed it really fast.