After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 169225 - Crash with multiple inheritance and "delete this".
Crash with multiple inheritance and "delete this".
Status: RESOLVED FIXED
Product: libsigc++
Classification: Bindings
Component: adaptors
2.0.x
Other Linux
: Normal normal
: ---
Assigned To: Martin Schulze
Martin Schulze
Depends on:
Blocks:
 
 
Reported: 2005-03-04 19:06 UTC by Régis Duchesne
Modified: 2005-05-01 00:59 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
The code for the small test case (658 bytes, text/plain)
2005-03-04 19:07 UTC, Régis Duchesne
  Details
Pre-processed file output (.ii file), compressed with bzip2 (-9) (222.26 KB, application/octet-stream)
2005-03-04 19:07 UTC, Régis Duchesne
  Details
Simplified test case (644 bytes, text/x-c++src)
2005-04-20 21:00 UTC, Christian Hammond
  Details
valgrind_output.txt (7.59 KB, text/plain)
2005-04-21 09:17 UTC, Murray Cumming
  Details
gdb_backtrace.txt (2.95 KB, text/plain)
2005-04-21 09:18 UTC, Murray Cumming
  Details
show_bad_cast.patch (7.86 KB, patch)
2005-04-27 08:00 UTC, Murray Cumming
none Details | Review
two_pointers.patch (3.29 KB, patch)
2005-04-27 14:30 UTC, Murray Cumming
none Details | Review
glibmm_two_pointers.patch (2.42 KB, text/x-patch)
2005-04-27 17:14 UTC, Murray Cumming
  Details
two_pointers_for_trackables.patch (5.49 KB, patch)
2005-04-27 21:46 UTC, Murray Cumming
none Details | Review
New test case (478 bytes, text/plain)
2005-04-28 00:54 UTC, Régis Duchesne
  Details
New backtrace (1.82 KB, text/plain)
2005-04-28 00:56 UTC, Régis Duchesne
  Details
test_castdeleted.cc (838 bytes, text/plain)
2005-04-29 08:01 UTC, Murray Cumming
  Details
test_castdeleted2.cc (1.02 KB, text/plain)
2005-04-29 08:08 UTC, Murray Cumming
  Details

Description Régis Duchesne 2005-03-04 19:06:54 UTC
Version details: 2.0.6

Folks,

We are reporting this issue both to the GCC people and the libsigc++ people,
because the issue could be in either of them or both. On the surface, it looks
like a g++ issue, but when you look deeper, libsigc++ might be relying on a
behavior that is unspecified by any C++ standard.

Input
-----
File test1.cc is attached.

Command line
------------
g++ -v -save-temps -O -g `pkg-config --cflags --libs gtkmm-2.4` test1.cc -o test1

Where `pkg-config --cflags --libs gtkmm-2.4` expands to:
-DXTHREADS -D_REENTRANT -DXUSE_MTSAFE_API -I/usr/include/gtkmm-2.4
-I/usr/lib/gtkmm-2.4/include -I/usr/include/glibmm-2.4
-I/usr/lib/glibmm-2.4/include -I/usr/include/gdkmm-2.4
-I/usr/lib/gdkmm-2.4/include -I/usr/include/pangomm-1.4 -I/usr/include/atkmm-1.6
-I/usr/include/gtk-2.0 -I/usr/include/sigc++-2.0 -I/usr/lib/sigc++-2.0/include
-I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/lib/gtk-2.0/include
-I/usr/X11R6/include -I/usr/include/pango-1.0 -I/usr/include/freetype2
-I/usr/include/freetype2/config -I/usr/include/atk-1.0  -Wl,--export-dynamic
-lgtkmm-2.4 -lgdkmm-2.4 -latkmm-1.6 -lgtk-x11-2.0 -lpangomm-1.4 -lglibmm-2.4
-lsigc-2.0 -lgdk-x11-2.0 -latk-1.0 -lgdk_pixbuf-2.0 -lm -lpangoxft-1.0
-lpangox-1.0 -lpango-1.0 -lgobject-2.0 -lgmodule-2.0 -ldl -lglib-2.0

Command line output
-------------------
o TTY output:
Reading specs from /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/specs
Configured with: ../gcc-3.3.4/configure --prefix=/usr --enable-shared
--enable-threads=posix --enable-__cxa_atexit --disable-checking --with-gnu-ld
--verbose --target=i486-slackware-linux --host=i486-slackware-linux
Thread model: posix
gcc version 3.3.4
 /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/cc1plus -E -D__GNUG__=3 -quiet -v
-I/usr/include/gtkmm-2.4 -I/usr/lib/gtkmm-2.4/include -I/usr/include/glibmm-2.4
-I/usr/lib/glibmm-2.4/include -I/usr/include/gdkmm-2.4
-I/usr/lib/gdkmm-2.4/include -I/usr/include/pangomm-1.4 -I/usr/include/atkmm-1.6
-I/usr/include/gtk-2.0 -I/usr/include/sigc++-2.0 -I/usr/lib/sigc++-2.0/include
-I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/lib/gtk-2.0/include
-I/usr/X11R6/include -I/usr/include/pango-1.0 -I/usr/include/freetype2
-I/usr/include/freetype2/config -I/usr/include/atk-1.0 -D__GNUC__=3
-D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=4 -D_GNU_SOURCE -DXTHREADS -D_REENTRANT
-DXUSE_MTSAFE_API test1.cc -O test1.ii
ignoring nonexistent directory "/usr/include/freetype2/config"
ignoring nonexistent directory "/usr/i486-slackware-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/gtkmm-2.4
 /usr/lib/gtkmm-2.4/include
 /usr/include/glibmm-2.4
 /usr/lib/glibmm-2.4/include
 /usr/include/gdkmm-2.4
 /usr/lib/gdkmm-2.4/include
 /usr/include/pangomm-1.4
 /usr/include/atkmm-1.6
 /usr/include/gtk-2.0
 /usr/include/sigc++-2.0
 /usr/lib/sigc++-2.0/include
 /usr/include/glib-2.0
 /usr/lib/glib-2.0/include
 /usr/lib/gtk-2.0/include
 /usr/X11R6/include
 /usr/include/pango-1.0
 /usr/include/freetype2
 /usr/include/atk-1.0
 /usr/lib/qt/include
 /usr/include/c++/3.3.4
 /usr/include/c++/3.3.4/i486-slackware-linux
 /usr/include/c++/3.3.4/backward
 /usr/local/include
 /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/include
 /usr/include
End of search list.
 /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/cc1plus -fpreprocessed test1.ii
-quiet -dumpbase test1.cc -auxbase test1 -g -O -version -o test1.s
GNU C++ version 3.3.4 (i486-slackware-linux)
        compiled by GNU C version 3.3.4.
GGC heuristics: --param ggc-min-expand=90 --param ggc-min-heapsize=113063
usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../../../i486-slackware-linux/bin/as
-V -Qy -o test1.o test1.s
GNU assembler version 2.15.92.0.2 (i486-slackware-linux) using BFD version
2.15.92.0.2 20040927
 /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/collect2 --eh-frame-hdr -m elf_i386
-dynamic-linker /lib/ld-linux.so.2 -o test1
/usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../../crt1.o
/usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../../crti.o
/usr/lib/gcc-lib/i486-slackware-linux/3.3.4/crtbegin.o
-L/usr/lib/gcc-lib/i486-slackware-linux/3.3.4
-L/usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../../../i486-slackware-linux/lib
-L/usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../.. --export-dynamic
-lgtkmm-2.4 -lgdkmm-2.4 -latkmm-1.6 -lgtk-x11-2.0 -lpangomm-1.4 -lglibmm-2.4
-lsigc-2.0 -lgdk-x11-2.0 -latk-1.0 -lgdk_pixbuf-2.0 -lpangoxft-1.0 -lpangox-1.0
-lpango-1.0 -lgobject-2.0 -lgmodule-2.0 -ldl -lglib-2.0 test1.o -lstdc++ -lm
-lgcc_s -lgcc -lc -lgcc_s -lgcc
/usr/lib/gcc-lib/i486-slackware-linux/3.3.4/crtend.o
/usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../../crtn.o

o Pre-processed file output
A compressed (bzip2 -9) version of file test1.ii is attached.

o Binary output
When running the binary, a window pops up on the screen. In that window, there
is a button. Click on the button. The binary gets a SEGV. This is the issue.

Why it could be a g++ issue
---------------------------
If you remove -O from the command line, then the binary does not get a SEGV,
which is the expected behavior.

Why it could be a libsigc++ issue
---------------------------------
o Here is the relevant backtrace at the time of the SEGV

  • #0 sigc::trackable::callback_list
    from /usr/lib/libsigc-2.0.so.0
  • #1 sigc::trackable::remove_destroy_notify_callback
    from /usr/lib/libsigc-2.0.so.0
  • #2 sigc::visit_each<sigc::internal::limit_derived_target<sigc::trackable*, sigc::internal::slot_do_unbind>, Dlg>
    at slot_base.h line 166
  • #3 sigc::visit_each<sigc::internal::limit_derived_target<sigc::trackable*, sigc::internal::slot_do_unbind>, void, Dlg>
    at mem_fun.h line 1798
  • #4 sigc::visit_each<sigc::internal::limit_derived_target<sigc::trackable*, sigc::internal::slot_do_unbind>, sigc::bound_mem_functor0<void, Dlg> >
    at adaptor_trait.h line 264
  • #5 sigc::visit_each_type<sigc::trackable*, sigc::internal::slot_do_unbind, sigc::adaptor_functor<sigc::bound_mem_functor0<void, Dlg> > >
    at visit_each.h line 124
  • #6 sigc::internal::typed_slot_rep<sigc::bound_mem_functor0<void, Dlg> >::destroy
    at slot_base.h line 160
  • #7 sigc::internal::slot_rep::notify
    from /usr/lib/libsigc-2.0.so.0
  • #8 sigc::internal::trackable_callback_list::~trackable_callback_list
    from /usr/lib/libsigc-2.0.so.0
  • #9 sigc::trackable::notify_callbacks
    from /usr/lib/libsigc-2.0.so.0
  • #10 sigc::trackable::~trackable
    from /usr/lib/libsigc-2.0.so.0
  • #11 ~Dlg
    at test1.cc line 14
  • #12 Dlg::OnClicked
    at test1.cc line 34
  • #13 sigc::adaptor_functor<sigc::bound_mem_functor0<void, Dlg> >::operator()
    at mem_fun.h line 1781
  • #14 sigc::internal::slot_call0<sigc::bound_mem_functor0<void, Dlg>, void>::call_it
    at slot.h line 103
  • #15 Glib::SignalProxyNormal::slot0_void_callback
    from /usr/lib/libglibmm-2.4.so.1

o Explanation of the backtrace

Essentially, we are inside the invokation of the sigc::mem_fun(this,
&Dlg::OnClicked) slot (frame 14) when we are deleting the Dlg instance (frame
11). Destroying the Dlg instance destroys the button, but the destruction of the
sigc::mem_fun(this, &Dlg::OnClicked) slot is delayed because we are in the
middle of invoking it. Once the Dlg destructor has finished, the sigc::trackable
destructor is called (frame 10) because Dlg inherits from Gtk::Window, which
inherits from trackable. Because the trackable is going away, it notifies all
interested parties (frame 8). One such party happens to be the
sigc::mem_fun(this, &Dlg::OnClicked) slot itself (frame 7), and in response to
that notification, the slot is telling all the trackables it is bound to that it
doesn't want to be notified by them anymore (frame 2). One such trackable is the
Dlg instance itself. But the first time the address of the trackable is
dereferenced (frame 0), the binary gets a SEGV because it is dereferencing an
invalid pointer in file trackable.cc:

internal::trackable_callback_list* trackable::callback_list() const
{
  if (!callback_list_) <--- *** SEGV occurs here ***---
    callback_list_ = new internal::trackable_callback_list;

  return callback_list_;
}

o Possible explanation for the SEGV

It seems that the address of the trackable is coming from this snippet in
visit_each.h:

  template <class T_type> struct with_type<true,T_type>
  { static void execute_(const T_type& _A_type, const T_self& _A_action)
     { _A_action.action_(&_A_type); } <--- Here
  };

At that very moment, the type of '_A_type' is 'Dlg' as shown in frame 2, and
libsigc++ is trying to obtain a sigc::trackable pointer from it by using the '&'
operator. Apparently, because the 'Dlg' facet of the object is already gone (we
are in frames < 11), this confuses g++ big time and the '&' operator returns an
invalid value. Note that Dlg inherits _virtually_ from Foo. It might be the
cause of g++'s confusion.

Conclusion
----------
We don't know if there is a bug in the g++ optimizer, or if what libsigc++ is
doing is unspecified by C++ standards.
Comment 1 Régis Duchesne 2005-03-04 19:07:31 UTC
Created attachment 38264 [details]
The code for the small test case
Comment 2 Régis Duchesne 2005-03-04 19:07:55 UTC
Created attachment 38265 [details]
Pre-processed file output (.ii file), compressed with bzip2 (-9)
Comment 3 Murray Cumming 2005-03-04 23:45:30 UTC
What platform is this? Do you have the same result with my simplified test case?
Comment 4 Murray Cumming 2005-03-21 20:01:21 UTC
Please respond.
Comment 5 Régis Duchesne 2005-04-20 20:35:05 UTC
Sorry for the lack of response, Murray. For some reason I never got an email
when you updated the bug.

My platform is x86 Linux. A co-worker of mine (Christian Hammond) came up with a
much simpler test case that only involves libsigc++ (i.e. it doesn't have any
dependency on gtkmm), and he will update this bug with the new test very soon.
Comment 6 Christian Hammond 2005-04-20 20:59:25 UTC
Hi Murray.

I've verified that this is still a problem using libsigc++ 2.0.11. I'm attaching
the simplified test case that Regis mentioned. It uses only libsigc++, and
simulates a condition like the former test case (setting up a dialog,
SpecificDialog, that is a subclass of a window, BaseDialog, and then simulating
a click).

There are a few interesting things to point out. First of all, as you'd suspect,
calling notify_callbacks() in SpecificDialog fixes the problem.

This problem only happens when our derived class inherits from more than one
class. If we removed Foo from the list of parent classes that SpecificDialog
inherits from, the problem goes away. Also, disconnecting the signal or clearing
it before deleting the class does not fix the problem, nor does storing a
sigc::connection and disconnecting that.

This is becoming a rather large problem for us in areas. I've been chasing
sigc::trackable-related crashes for days now.

I'm ChipX86 on freenode and GIMPNet, btw, if you want to discuss any of this
personally.
Comment 7 Christian Hammond 2005-04-20 21:00:19 UTC
Created attachment 45500 [details]
Simplified test case

Added the simplified test case that causes this crash with or without an
optimization level.
Comment 8 Murray Cumming 2005-04-21 09:17:52 UTC
Created attachment 45504 [details]
valgrind_output.txt
Comment 9 Murray Cumming 2005-04-21 09:18:26 UTC
Created attachment 45505 [details]
gdb_backtrace.txt
Comment 10 Murray Cumming 2005-04-21 09:24:37 UTC
Hopefully we'll find time to investigate this thoroughly. In the meantime, I
notice that this might be a temporary workaround:

- class Foo
+ class Foo : virtual public sigc::trackable

so that both base classes inherit from trackable. I don't think that should be
necessary, because it's the signal handler's class that must normally inherit
from trackable, not the signal's class.
Comment 11 Murray Cumming 2005-04-21 09:25:57 UTC
You might also t
Comment 12 Murray Cumming 2005-04-21 09:28:59 UTC
Not sure where that last coment came from.
Comment 13 Régis Duchesne 2005-04-21 18:53:09 UTC
Hi Murray, glad you could repro the bug.

I have been thinking about what would be the proper fix recently. While reading
what follows, please keep in mind that I only have very superficial knowledge of
the libsigc++ code (I read it for a few hours), so I could be completely wrong.

Essentially, mem_fun-type slots care about 2 kinds of trackable objects:
1) The instance on which to invoke the method
2) The arguments bound by reference

Currently, slots store a pointer to 1) and 2) by using their _original_ type.

Example:
   Bar *bar = new Bar();
   ...
   sigc::bind(sigc::mem_fun(this, &Foo::Frobnicate), sigc::ref(*bar))

Here, the inner slot will store a pointer of kind 1) 'this' of type Foo *, and
the outer slot will store a pointer of kind 2) 'bar' of type Bar *.

3) When the slot is invoked, it uses pointers of kind 1) and 2) as is, i.e.
their type is unmodified.

4) When the slot is being destroyed, it calls remove_destroy_notify_callback()
on all pointers of kind 1) and 2), and that is where an _implicit_ pointer
conversion occurs: in our example, the compiler converts a Foo * and a Bar * to
a sigc::trackable *. Because we use virtual inheritance, the _value_ of the
pointer is not the same.

In our test case, the bug occurs because at the time 4) happens, the Foo * facet
of the instance is gone, so the compiler is unable to make the implicit pointer
conversion: it is returning a garbage sigc::trackable pointer value (which might
be a bug, you would imagine that it could throw an exception instead), which
leads to a SEGV.

It is unclear who is the culprit here: is it libsigc++ that tries to convert a
pointer of a type that does not exist anymore, or is it g++ that should allow
for the conversion to be possible at all times until the very bottom destructor
is called? I don't know what the C++ standard says.

Assuming that it is a libsigc++ bug, what is the way to fix it? I think a good
way would be for the slots to store not one, but two pointers for instance of
the kind 1) or 2). The first pointer would be a pointer with the original type
(Foo *, Bar *) to be used in case 3), and the second pointer would be a pointer
with the sigc::trackable * type, to be used in case 4). The conversion from the
original type to the sigc::trackable * type would be done with a
dynamic_cast<sigc::trackable *>() _at the time the slot is created_, which gives
a guarantees that:
o it works fine with virtual inheritance
o the compiler can always do the conversion

I have been thinking about prototyping this solution, but I haven't done it yet.
What do you think?
Comment 14 Murray Cumming 2005-04-24 14:41:50 UTC
> 4) When the slot is being destroyed, it calls remove_destroy_notify_callback()
> on all pointers of kind 1) and 2), and that is where an _implicit_ pointer
> conversion occurs: in our example, the compiler converts a Foo * and a Bar * to
> a sigc::trackable *. Because we use virtual inheritance, the _value_ of the
> pointer is not the same.

What line of what libsigc++ source file does this "implicit cast"? A
dynamic_cast<> is usually necessary with virtual inheritance.
Comment 15 Régis Duchesne 2005-04-25 18:42:56 UTC
Here is an excerpt of your attachment in comment #9:

===
  • #0 sigc::trackable::callback_list
    at trackable.cc line 74
  • #1 sigc::trackable::remove_destroy_notify_callback
    at trackable.cc line 61
  • #2 sigc::internal::slot_do_unbind::operator()
  • #3 sigc::internal::limit_derived_target<sigc::trackable*, sigc::internal::slot_do_unbind>::with_type<true, SpecificDialog>::execute_

The source code for frame 3 is in visit_each.h. In our case,
   T_target is sigc::trackable,
   T_action is sigc::internal::slot_do_unbind,
   T_type   is SpecificDialog.
Consequently, 'execute_' takes 2 arguments:
   const SpecificDialog& _A_type
   const limit_derived_target<sigc::trackable*, sigc::internal::slot_do_unbind>
&_A_action
So by reading the code, all of this results in the invokation of the
sigc::internal::slot_do_unbind functor, with 1 argument: &_A_type. Now this is
very important: the type of this argument is 'const SpecificDialog*'.

In frame 2, we can verify that the practice correlates with our reading of the
code: the sigc::internal::slot_do_unbind functor is invoked. Now the code for
that is in sigc++/functors/slot_base.h. There, the code clearly chows that
sigc::internal::slot_do_unbind::operator() takes 1 argument, of type 'const
trackable*'.

Therefore, I believe that between frame 3 and frame 2, the compiler casts a
'const SpecificDialog*' argument to a 'const trackable*' argument. That is what
I call the "implicit cast". The compiler does it, naturally, without the need
for us to know about it. Most of the time, this works perfectly because both
pointer's numerical values are the same, so the compiler's job is a no-op. But
in our case, because of the virtual inheritance involved, the pointer's
numerical values are different, and at the time this cast is made, the
SpecificDialog side of the instance has already been destroyed, so the compiler
is unable to make the conversion of values, and the compiler returns a garbage
value for the 'const trackable*', which leads to a SEGV in frame 0, the first
time the pointer is dereferenced.

Hence my proposal: have the slot store both the actual pointer (SpecificDialog*)
and the trackable pointer (trackable*) to the instance. Do the conversion once
for all with a dynamic_cast at the time the slot is constructed. Then have the
slot use the actual pointer at invokation time, and the trackable pointer at
unbind time: this way, the compiler's conversion facility is not needed at
unbind time.

What do you think?
Comment 16 Murray Cumming 2005-04-25 20:03:56 UTC
I'm still thinking about that. But in the meantime, I notice that the
alternative code in visit_each() also removes the crash. Look for line 131 in
visit_each.h:

  //With g++ 3.3.4, 
  //specifying the types of the template specialization causes a segfault in
tests/test_bind.
  //But there is no segfault with g++ 3.4.2, and this is required by the AIX
(and maybe IRIX MipsPro  and Tru64) compilers. 
  //visit_each<type_limited_action, T_functor>(limited_action, _A_functor);
  
  //g++ (even slightly old ones) is our primary platform, so we use the
non-crashing version.
  //Users (and distributors) of libsigc++ on AIX (and maybe IRIX MipsPro  and
Tru64) may 
  //need to use this version above instead, to allow compilation.
  visit_each(limited_action, _A_functor);

Try using the first call line instead of the second, and tell me if that fixes
your crashes. What version of g++ do you have?


This makes sense in terms of your theory - by calling the correct template
instantiation, the compiler has the correct type information, so maybe it can
then cast correctly. dynamic_cast<> would only be necessary when casting from a
base to a derived class, not from derived to base.
Comment 17 Régis Duchesne 2005-04-26 00:34:13 UTC
Murray,

We are using version 2.0.6, so I was unaware of the alternative code. I just
downloaded version 2.0.11 to play with it, and Christian or I will report on our
findings.

> This makes sense in terms of your theory - by calling the correct template
> instantiation, the compiler has the correct type information, so maybe it can
> then cast correctly.

Are you saying that by using this code
 visit_each<type_limited_action, T_functor>(limited_action, _A_functor);
instead of this code
 visit_each(limited_action, _A_functor);
the compiler will use a different template instance? How is it so?

To me, there is no possible ambiguity on the template instance that the compiler
should use here, so specifying it by hand should be unnecessary, and it is just
a workaround for compiler bugs (g++ is buggy because it cannot stand it, and AIX
is buggy because it requires it).

More importantly, I fail to see why specifying the template instance we want by
hand would help solving the cast problem I describe in comment #15:
single-stepping the code should show exactly the same backtrace, and at some
point, the compiler will implicitly cast from SpecificDialog * to trackable *.

While investigating this issue here at VMware, we have noticed in several
occasions that modifying our code even very slightly, for example just adding
trivial log statements, sometimes is enough to make the SEGV in frame 0 go away.

So it seems to me that, similarly, using the alternative code for visit_each()
might seem like it makes the bug go away, but this is just luck. The bug is
still here, but for some reason (unknown to me) it does not trigger ... until
you modify the code a little bit more, and it comes back to haunt you.

I think a more important question, and the reason why we posted this bug in the
g++ mailing list in the first place, is to determine whether it is legal C++ to
try to cast a pointer to a derived instance into a pointer to the base instance,
although the derived instance has already been destroyed.

If the answer to the question is "yes, this is legal", then g++ has a bug and
the g++ maitainers should fix it, and when they do, libsigc++ will work perfectly.

If the answer to the question is "no, this is not legal" or "it is unspecified
by the C++ standard, so compilers are free to do whatever they want", then
libsigc++ should not rely on this behavior, and it should be fixed (for example
by using the 2-pointers proposal in comment #15).

> dynamic_cast<> would only be necessary when casting from a
> base to a derived class, not from derived to base.

Agreed. Forget about that.
Comment 18 Murray Cumming 2005-04-26 08:46:46 UTC
> To me, there is no possible ambiguity on the template instance that the compiler
> should use here, so specifying it by hand should be unnecessary

Yes, but many compilers do need the extra hint and g++ 3.3.4 and 3.3.5 do behave
differently. I don't know how to begin exploring that for the g++ developers.
IT's not easy to make a simple test case from this. With the specific version,
it's only some uses of sigc::ref() in test_bind that crash. Luckily, I don't
think anybody uses that.

I'm still thinking about this cast theory.
Comment 19 Régis Duchesne 2005-04-26 17:08:25 UTC
For the record: in our code here, we use g++ 3.3.4, and we use sigc::ref()
extensively (because we want slots to be invalidated when the objects bound to
them go away).

While we are talking about this, let me point out that we are actually seeing
_2_ classes of this kind of "remove_destroy_notify_callback" issues. Consider
the scenario that is implemented by the attachment provided in comment #7:

Class s(pecific) inherits from class b(ase), which itself nherits from class
sigc::(t)rackable. B defines a signal, and s connects a slot to it, which points
to one of its member functions Foo. Finally, at some point, an instance of s is
destroyed.

So we end up with something like this:

           t
           +---+
           |   |
           +---+
             ^
             | inherits
        b    |
        +-----------+
        | signal    |
        |     slot -+--\
        |           |  |
        +-----------+  |
            ^          |
            | inherits |
        s   |          |
        +-------+      |
        |       |      |
        | Foo <-+------/
        |       |
        +-------+

o The first class of issues is exactly what is reported in this bug: the
destructor for s is called, then the destructor for b, then the destructor for
the signal, then the destructor for the slot. At this very moment, the slot
tries to unbind from the s pointer, that the compiler implicitly casts to a t
pointer, and kaboom. This class of issues can be fixed, for example by using the
2-pointers proposal in comment #15.

o Now assuming that the first class of issues is fixed, there is a second class
of issues: imagine that inside the destructor for b, the signal is emit()ed.
Then s::Foo() will be invoked, with the s pointer as its 'this' argument. The
problem is: s has already been destroyed, so really the slot is holding an
invalid pointer, but it does not know it, because the destructor for t has not
been reached yet, so the slot has not been notified yet that s has gone away.
This second class of issues is annoying to solve. Fortunately, firing a signal
in a destructor is rather infrequent, and there is an easy workaround: s can
simply disconnect its s::Foo() slot in its destructor. However, it would be nice
if libsigc++ could fully take care of this second class of issues as well, so
the code that uses the library does not have to worry about these implementation
subtleties.
Comment 20 Murray Cumming 2005-04-26 17:31:59 UTC
> For the record: in our code here, we use g++ 3.3.4, and we use sigc::ref()
> extensively (because we want slots to be invalidated when the objects bound to
> them go away).

You are using sigc::ref() with sigc::bind()?
Comment 21 Murray Cumming 2005-04-26 20:58:53 UTC
> Are you saying that by using this code
>  visit_each<type_limited_action, T_functor>(limited_action, _A_functor);
> instead of this code
>  visit_each(limited_action, _A_functor);
> the compiler will use a different template instance? How is it so?

I think, because the visit_each<> template specializations are declared with
various different sets of types, in various orders. For instance, there's
  template <class T_action, int T_loc, class T_functor, class T_bound>
  void visit_each(const T_action& _A_action,
                const bind_functor<T_loc, T_functor, T_bound>& _A_target)

as well as the normal
  template <class T_action, class T_functor>
  void visit_each(const T_action& _A_action, const T_functor& _A_functor)

I also think I see a memory leak when explictly specifying the template
instantiation. So, I'll assume that g++ was calling the correct template, and
that I was just hiding the problem that exists in one particular visit_each<>
specialization. Sorry if I seem to be ignoring your replies - I have to make
sense of my own theories first.

Comment 22 Régis Duchesne 2005-04-26 23:59:57 UTC
> You are using sigc::ref() with sigc::bind()?

Yes

> Sorry if I seem to be ignoring your replies - I have to make
> sense of my own theories first.

No problem. It is cool. We perfectly understand, and we do not want to pressure
you in any way.
Comment 23 Murray Cumming 2005-04-27 07:57:13 UTC
I agree that the cast to trackable* is going wrong (during the trackable's
destructor), in 

limit_derived_target<T_target*,
T_action>::with_type<true,T_type>::execute_(const T_type& _A_type, const T_self&
_A_action)

This patch (show_bad_cast.patch) adds some output to demonstrate that (see
"TESTING CAST"). It's possible that the cast would fail earlier, but casting to
trackable* is not always valid in the previously-called templates.
Comment 24 Murray Cumming 2005-04-27 08:00:50 UTC
Created attachment 45732 [details] [review]
show_bad_cast.patch

part of the output from test_custom.cc:

OnClicked 1
~SpecificDialog this=0x804d008
~SpecificDialog (trackable*)this=0x804d018
~Foo this=0x804d008
~BaseDialog this=0x804d00c
~BaseDialog (trackable*)this=0x804d018
~trackable() before notify_callbacks: this=0x804d010
trackable::notify_callbacks: this=0x804d010
trackable::notify_callbacks: after delete callback_list this=0x804d010
~trackable() after notify_callbacks: this=0x804d010
~trackable() before notify_callbacks: this=0x804d018
trackable::notify_callbacks: this=0x804d018
visit_each<>(): const T_functor* _A_functor = 0x804d008
limit_derived_target<T_target*, T_action>::operator(): T_type*
_A_type=0x804d008
limit_derived_target<T_target*, T_action>::execute_(): TESTING CAST:
  limit_derived_target<T_target*, T_action>::execute_(): const
T_self_A_type=0x804d008
  limit_derived_target<T_target*, T_action>::execute_(): trackable*
_A_type=0x10097098

That 0x10097098 should be 0x804d018, as seen at the start.
Comment 25 Murray Cumming 2005-04-27 13:35:06 UTC
My summary so far:
- The SpecificDialog part has been destroyed, so libsigc++ shouldn't still be
using pointers to it.
- It can use the not-yet-destroyed base sigc::trackable part, but it shouldn't
expect a cast from SpecificDialog to trackable* to work.
- Where the object is a trackable, and where it is being used by reference, it
could store only a trackable*. It might not need any derived information.

- The object is stored as a templated derived pointer type in
functors/mem_fun.h's bound_mem_functor*::obj_ptr_.
- That obj_ptr_ is used in mem_fun.h's visit_each(), where the trouble starts.
If it was a trackable* instead of a specific type then maybe it would not
instantiate further template instantiations involving the derived type.
 
Comment 26 Murray Cumming 2005-04-27 14:30:07 UTC
Created attachment 45740 [details] [review]
two_pointers.patch

This patch tests our theory, successfully. It stores the pointer to base
(trackable) and uses that instead of the pointer to the (sometimes dead)
derived class. I don't think two pointers should be necessary, but I can't get
the dynamic_cast<Derived>(base) to work here yet.

Even with one pointer, this would still break API, because it requires all
signal handler classes to derive from sigc::trackable. Some (I'm not sure
which) do not need to at the moment.

Maybe this is useful to you even though we couldn't put it in libsigc++ 2.0. Or
maybe you can improve on it. It might very possibly have awful side-effects,
and  the same thing might be needed in other parts of the code.
Comment 27 Murray Cumming 2005-04-27 17:14:53 UTC
Created attachment 45744 [details]
glibmm_two_pointers.patch

Make glibmm build when using the two_pointers.patch libsigc++ patch.
Comment 28 Murray Cumming 2005-04-27 21:46:47 UTC
Created attachment 45759 [details] [review]
two_pointers_for_trackables.patch

This patch does the same thing, but only for types deriving from
sigc::trackable. Therefore it is not ABI/API-breaking. Please test this.

However, I will not commit it yet, because I want to release a version with a
SUN compilation fix, before risking build problems with this change.

If you want to do the same for bound parameters then I suggest you start with a
test case, in a new bug report. However, that patch could be more complicated
because you have to worry about copying derived types by value and calling
their derived destructors. Making trackable's destructor virtual might help
partly.
Comment 29 Régis Duchesne 2005-04-27 21:57:14 UTC
> Your comment #23 and your comment #24

Thanks for confirming that the problem is exactly what we thought it was.

Btw, in your attachment in comment #24, this part of the patch is wrong
(although it is harmless):
 test_exception_catch_SOURCES = test_exception_catch.cc
-
+test_exception_catch_SOURCES = test_exception_catch.cc

Also I believe the test case in comment #7 (on which you based your
test_custom.cc in the patch) can be simplified (which would also simplify the
backtrace): as described in comment #19, 'OnClicked()' does not have to call
'delete this;' to trigger the bug, it is not important who deletes the
SpecificDialog instance, and it could be done in main(), instead of emitting the
'clicked' signal.

> My summary so far:
> - The SpecificDialog part has been destroyed, so libsigc++ shouldn't still be
> using pointers to it.
> - It can use the not-yet-destroyed base sigc::trackable part, but it shouldn't
> expect a cast from SpecificDialog to trackable* to work.

Yes, but as I mentionned in comment #17, it might not be libsigc++'s fault. We
need to know if this is legal C++ or not: it is entirely possible that the C++
standard mandates that the compiler must ensure the correctness of all casts
until the topmost class(es) of an instance have been destroyed.

> - Where the object is a trackable, and where it is being used by reference, it
> could store only a trackable*. It might not need any derived information.

Correct.

> - The object is stored as a templated derived pointer type in
> functors/mem_fun.h's bound_mem_functor*::obj_ptr_.
> - That obj_ptr_ is used in mem_fun.h's visit_each(), where the trouble starts.
> If it was a trackable* instead of a specific type then maybe it would not
> instantiate further template instantiations involving the derived type.

Correct, but incomplete: here you are only considering the first kind (see "1)"
in comment #13) of pointer that is stored by a slot. There is another kind (see
"2)" in comment #13), which correspond to all the trackable arguments that you
bind by reference to a slot. These as stored adaptors/bind.h's
bind_functor*::bound*_. When a slot is destroyed, it also calls
remove_destroy_notify_callback() on all these trackables, and does a similar
implicit cast.

> Patch in comment #26

This looks good. A few remarks though:

1)
+    { return ( (obj_ptr_derived_)->*(this->func_ptr_))(LOOP(_A_a%1, $1)); }
I think the "this->" is not needed. I have found a similar "issue" in the
mem_functor macro, where "this->" is used for the for "pointer" version of
operator(), but interestingly "this->" is (correctly) not used for the
"reference" version of operator().

2)
+//    { return ( (($3 T_obj*)(obj_ptr_) )->*(this->func_ptr_))(LOOP(_A_a%1, $1)); }
> but I can't get the dynamic_cast<Derived>(base) to work here yet.
I think it doesn't work because you don't use a dynamic_cast here. You use an
old C-style cast, which will return the same pointer value, which is wrong in
the case of virtual inheritance.

> Even with one pointer, this would still break API, because it requires all
> signal handler classes to derive from sigc::trackable. Some (I'm not sure
> which) do not need to at the moment.

Actually, I think this API breakage can be avoided: by using
dynamic_cast<trackable *>(derived_pointer) to initialize obj_ptr_ (now I
remember why I mentionned that at the end of comment #13, and stupidly denied it
at the end of comment #17). If you do that, then the stored sigc::trackable *
will be NULL for non-trackable signal handler classes, but we don't care because
in that case the pointer is never used!

If the API is not broken, then there is no need to patch files
tests/test_compatibility.cc and tests/test_mem_fun.cc in the patch in comment
#26, and there is no need for the glibmm patch in comment #27.

My summary at this point
========================
Your patch in comment #26:
a) Fixes the problem for trackable signal handler instances, but break the API
by forcing all signal handlers to be trackable.
b) Can be trivially improved as suggested above in this comment to not break the
API.
c) Can be improved as suggested above in this comment to handle trackable
arguments that are bound by reference as well.

Once b) and c) are done, I think we will have a serious candidate for inclusion
in version 2.0.12 :)
Comment 30 Régis Duchesne 2005-04-27 22:06:57 UTC
> This patch does the same thing, but only for types deriving from
> sigc::trackable. Therefore it is not ABI/API-breaking. Please test this.

Apparently our bug updates crossed.

I'm glad we came to the same conclusion about not breaking the API, though. Your
solution looks more heavyweight than mine (dynamic_cast, a one-liner) though.
I'm still reading your patch.

> However, I will not commit it yet, because I want to release a version with a
> SUN compilation fix, before risking build problems with this change.

Makes plenty of sense. Let's test this heavily, and let's add the smallest
possible test case to the test suite to ensure no future regressions.

> If you want to do the same for bound parameters then I suggest you start with
> a test case, in a new bug report. However, that patch could be more
> complicated because you have to worry about copying derived types by value and
> calling their derived destructors. Making trackable's destructor virtual might
> help partly.

I don't think it has to be that complicated: all we have to do is to store 2
pointers the same way in adaptors/bind.h's bind_functor*. And then we need to
add template specialization for visit_each in the same file, to deal with the
trackable case specially.

Expect a new test case and a new bug report soon.
Comment 31 Murray Cumming 2005-04-27 23:10:40 UTC
> but as I mentionned in comment #17, it might not be libsigc++'s fault. We
> need to know if this is legal C++ or not.

I really doubt that it is. But it's easy to make a test case that casts from
Derived to Base after the Derived destructor has finished. It might not crash
but you can ask C++ experts about the theory of it.

> I think it doesn't work because you don't use a dynamic_cast here. You use an
> old C-style cast, which will return the same pointer value.

I didn't use it because I couldn't get it to work. I think that trackable was
not fully defined there, possibly due to header guards and prototypes.

Yes, it's best to use a static_cast<>, and I'd welcome patches to use them
instead of C-style casts. C++ will not allow a static_cast<> where a
dynamic_cast<> is necessary, so this has real benefits.

I'm happy to see a simpler patch, but I do think you need the separate
specialization. You can't cast to trackable* for types that don't derive from
trackable. I did try to move some of the implementation into a base class,
instead of copy/pasting it, but the constructors and operator() must be in the
most-derived class.
Comment 32 Régis Duchesne 2005-04-28 00:14:08 UTC
> I didn't use it because I couldn't get it to work. I think that trackable was
> not fully defined there, possibly due to header guards and prototypes.

Weird.

> Yes, it's best to use a static_cast<>, and I'd welcome patches to use them
> instead of C-style casts. C++ will not allow a static_cast<> where a
> dynamic_cast<> is necessary, so this has real benefits.

You would be surprised: we have seen cases in our code, involving virtual
inheritance, where the compiler was perfectly happy to compile with a
static_cast, but would crash at run-time. When we replaced with a dynamic_cast,
it started to work fine.

> I'm happy to see a simpler patch, but I do think you need the separate
> specialization. You can't cast to trackable* for types that don't derive from
> trackable.

Really? I thought that was the whole point of a dynamic_cast, to do this
determination at run-time. I find it surprising. We do it all the time.

If B inherits from A, but C does not inherit from A, then I think
dynamic_cast<B *>(pointer to an A) returns a valid pointer, whereas
dynamic_cast<C *>(pointer to an A) returns NULL.

I'm starting to test your patches. I will keep you posted. Thanks for your
valuable help with this so far.
Comment 33 Régis Duchesne 2005-04-28 00:54:44 UTC
Created attachment 45764 [details]
New test case

This test case is shorter and simpler than the one in comment #7.

It corresponds precisely to the scenario described in comment #19, first class
of issues.
Comment 34 Régis Duchesne 2005-04-28 00:56:46 UTC
Created attachment 45765 [details]
New backtrace

It corresponds precisely to the scenario described in comment #19, first class
of issues.
Comment 35 Régis Duchesne 2005-04-28 02:06:06 UTC
Alright Murray, I took a deeper look at the patch you submitted in comment #28:

o After unsuccessfully trying to avoid the template specialization, I now agree
that it is the right thing to do :) What I said about dynamic_cast in comment
#32 was nonsense.

o After applying the patch, the test case submitted in comment #33 passes, as
expected. Can you please make sure that this exact test case makes it to the
libsigc++ test suite?

Boy, I'm glad this problem is fixed. I will be feverishly waiting for version
2.0.12 of the lib!

I'm working on a test case/bug report for the similar "bound parameters" issue.

Thanks again for taking this issue seriously and for fixing it so quickly. You rock!
Comment 36 Régis Duchesne 2005-04-28 17:45:10 UTC
The corollary bug (for trackable arguments bound by reference) is now filed as
bug 302327.
Comment 37 Murray Cumming 2005-04-29 08:01:28 UTC
Created attachment 45812 [details]
test_castdeleted.cc

I tried tp create a test case without libsigc++ (test_castdeleted.cc), but I
can't reproduce the bad cast. Maybe you'll have more luck.
Comment 38 Murray Cumming 2005-04-29 08:08:51 UTC
Created attachment 45813 [details]
test_castdeleted2.cc

Actually I can reproduce it (test_castdeleted2.cc). The cast was wrong, just
not so wildly wrong.
Comment 39 Murray Cumming 2005-04-29 18:59:38 UTC
Here's the gcc bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21279
Comment 40 Murray Cumming 2005-05-01 00:59:42 UTC
I've committed this now because the SUN fix I was waiting for might take longer
and anyway only affects the latest SUN compiler version.