GNOME Bugzilla – Bug 169225
Crash with multiple inheritance and "delete this".
Last modified: 2005-05-01 00:59:42 UTC
Version details: 2.0.6 Folks, We are reporting this issue both to the GCC people and the libsigc++ people, because the issue could be in either of them or both. On the surface, it looks like a g++ issue, but when you look deeper, libsigc++ might be relying on a behavior that is unspecified by any C++ standard. Input ----- File test1.cc is attached. Command line ------------ g++ -v -save-temps -O -g `pkg-config --cflags --libs gtkmm-2.4` test1.cc -o test1 Where `pkg-config --cflags --libs gtkmm-2.4` expands to: -DXTHREADS -D_REENTRANT -DXUSE_MTSAFE_API -I/usr/include/gtkmm-2.4 -I/usr/lib/gtkmm-2.4/include -I/usr/include/glibmm-2.4 -I/usr/lib/glibmm-2.4/include -I/usr/include/gdkmm-2.4 -I/usr/lib/gdkmm-2.4/include -I/usr/include/pangomm-1.4 -I/usr/include/atkmm-1.6 -I/usr/include/gtk-2.0 -I/usr/include/sigc++-2.0 -I/usr/lib/sigc++-2.0/include -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/lib/gtk-2.0/include -I/usr/X11R6/include -I/usr/include/pango-1.0 -I/usr/include/freetype2 -I/usr/include/freetype2/config -I/usr/include/atk-1.0 -Wl,--export-dynamic -lgtkmm-2.4 -lgdkmm-2.4 -latkmm-1.6 -lgtk-x11-2.0 -lpangomm-1.4 -lglibmm-2.4 -lsigc-2.0 -lgdk-x11-2.0 -latk-1.0 -lgdk_pixbuf-2.0 -lm -lpangoxft-1.0 -lpangox-1.0 -lpango-1.0 -lgobject-2.0 -lgmodule-2.0 -ldl -lglib-2.0 Command line output ------------------- o TTY output: Reading specs from /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/specs Configured with: ../gcc-3.3.4/configure --prefix=/usr --enable-shared --enable-threads=posix --enable-__cxa_atexit --disable-checking --with-gnu-ld --verbose --target=i486-slackware-linux --host=i486-slackware-linux Thread model: posix gcc version 3.3.4 /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/cc1plus -E -D__GNUG__=3 -quiet -v -I/usr/include/gtkmm-2.4 -I/usr/lib/gtkmm-2.4/include -I/usr/include/glibmm-2.4 -I/usr/lib/glibmm-2.4/include -I/usr/include/gdkmm-2.4 -I/usr/lib/gdkmm-2.4/include -I/usr/include/pangomm-1.4 -I/usr/include/atkmm-1.6 -I/usr/include/gtk-2.0 -I/usr/include/sigc++-2.0 -I/usr/lib/sigc++-2.0/include -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/lib/gtk-2.0/include -I/usr/X11R6/include -I/usr/include/pango-1.0 -I/usr/include/freetype2 -I/usr/include/freetype2/config -I/usr/include/atk-1.0 -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=4 -D_GNU_SOURCE -DXTHREADS -D_REENTRANT -DXUSE_MTSAFE_API test1.cc -O test1.ii ignoring nonexistent directory "/usr/include/freetype2/config" ignoring nonexistent directory "/usr/i486-slackware-linux/include" #include "..." search starts here: #include <...> search starts here: /usr/include/gtkmm-2.4 /usr/lib/gtkmm-2.4/include /usr/include/glibmm-2.4 /usr/lib/glibmm-2.4/include /usr/include/gdkmm-2.4 /usr/lib/gdkmm-2.4/include /usr/include/pangomm-1.4 /usr/include/atkmm-1.6 /usr/include/gtk-2.0 /usr/include/sigc++-2.0 /usr/lib/sigc++-2.0/include /usr/include/glib-2.0 /usr/lib/glib-2.0/include /usr/lib/gtk-2.0/include /usr/X11R6/include /usr/include/pango-1.0 /usr/include/freetype2 /usr/include/atk-1.0 /usr/lib/qt/include /usr/include/c++/3.3.4 /usr/include/c++/3.3.4/i486-slackware-linux /usr/include/c++/3.3.4/backward /usr/local/include /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/include /usr/include End of search list. /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/cc1plus -fpreprocessed test1.ii -quiet -dumpbase test1.cc -auxbase test1 -g -O -version -o test1.s GNU C++ version 3.3.4 (i486-slackware-linux) compiled by GNU C version 3.3.4. GGC heuristics: --param ggc-min-expand=90 --param ggc-min-heapsize=113063 usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../../../i486-slackware-linux/bin/as -V -Qy -o test1.o test1.s GNU assembler version 2.15.92.0.2 (i486-slackware-linux) using BFD version 2.15.92.0.2 20040927 /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o test1 /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../../crt1.o /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../../crti.o /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/crtbegin.o -L/usr/lib/gcc-lib/i486-slackware-linux/3.3.4 -L/usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../../../i486-slackware-linux/lib -L/usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../.. --export-dynamic -lgtkmm-2.4 -lgdkmm-2.4 -latkmm-1.6 -lgtk-x11-2.0 -lpangomm-1.4 -lglibmm-2.4 -lsigc-2.0 -lgdk-x11-2.0 -latk-1.0 -lgdk_pixbuf-2.0 -lpangoxft-1.0 -lpangox-1.0 -lpango-1.0 -lgobject-2.0 -lgmodule-2.0 -ldl -lglib-2.0 test1.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/crtend.o /usr/lib/gcc-lib/i486-slackware-linux/3.3.4/../../../crtn.o o Pre-processed file output A compressed (bzip2 -9) version of file test1.ii is attached. o Binary output When running the binary, a window pops up on the screen. In that window, there is a button. Click on the button. The binary gets a SEGV. This is the issue. Why it could be a g++ issue --------------------------- If you remove -O from the command line, then the binary does not get a SEGV, which is the expected behavior. Why it could be a libsigc++ issue --------------------------------- o Here is the relevant backtrace at the time of the SEGV
+ Trace 56434
o Explanation of the backtrace Essentially, we are inside the invokation of the sigc::mem_fun(this, &Dlg::OnClicked) slot (frame 14) when we are deleting the Dlg instance (frame 11). Destroying the Dlg instance destroys the button, but the destruction of the sigc::mem_fun(this, &Dlg::OnClicked) slot is delayed because we are in the middle of invoking it. Once the Dlg destructor has finished, the sigc::trackable destructor is called (frame 10) because Dlg inherits from Gtk::Window, which inherits from trackable. Because the trackable is going away, it notifies all interested parties (frame 8). One such party happens to be the sigc::mem_fun(this, &Dlg::OnClicked) slot itself (frame 7), and in response to that notification, the slot is telling all the trackables it is bound to that it doesn't want to be notified by them anymore (frame 2). One such trackable is the Dlg instance itself. But the first time the address of the trackable is dereferenced (frame 0), the binary gets a SEGV because it is dereferencing an invalid pointer in file trackable.cc: internal::trackable_callback_list* trackable::callback_list() const { if (!callback_list_) <--- *** SEGV occurs here ***--- callback_list_ = new internal::trackable_callback_list; return callback_list_; } o Possible explanation for the SEGV It seems that the address of the trackable is coming from this snippet in visit_each.h: template <class T_type> struct with_type<true,T_type> { static void execute_(const T_type& _A_type, const T_self& _A_action) { _A_action.action_(&_A_type); } <--- Here }; At that very moment, the type of '_A_type' is 'Dlg' as shown in frame 2, and libsigc++ is trying to obtain a sigc::trackable pointer from it by using the '&' operator. Apparently, because the 'Dlg' facet of the object is already gone (we are in frames < 11), this confuses g++ big time and the '&' operator returns an invalid value. Note that Dlg inherits _virtually_ from Foo. It might be the cause of g++'s confusion. Conclusion ---------- We don't know if there is a bug in the g++ optimizer, or if what libsigc++ is doing is unspecified by C++ standards.
Created attachment 38264 [details] The code for the small test case
Created attachment 38265 [details] Pre-processed file output (.ii file), compressed with bzip2 (-9)
What platform is this? Do you have the same result with my simplified test case?
Please respond.
Sorry for the lack of response, Murray. For some reason I never got an email when you updated the bug. My platform is x86 Linux. A co-worker of mine (Christian Hammond) came up with a much simpler test case that only involves libsigc++ (i.e. it doesn't have any dependency on gtkmm), and he will update this bug with the new test very soon.
Hi Murray. I've verified that this is still a problem using libsigc++ 2.0.11. I'm attaching the simplified test case that Regis mentioned. It uses only libsigc++, and simulates a condition like the former test case (setting up a dialog, SpecificDialog, that is a subclass of a window, BaseDialog, and then simulating a click). There are a few interesting things to point out. First of all, as you'd suspect, calling notify_callbacks() in SpecificDialog fixes the problem. This problem only happens when our derived class inherits from more than one class. If we removed Foo from the list of parent classes that SpecificDialog inherits from, the problem goes away. Also, disconnecting the signal or clearing it before deleting the class does not fix the problem, nor does storing a sigc::connection and disconnecting that. This is becoming a rather large problem for us in areas. I've been chasing sigc::trackable-related crashes for days now. I'm ChipX86 on freenode and GIMPNet, btw, if you want to discuss any of this personally.
Created attachment 45500 [details] Simplified test case Added the simplified test case that causes this crash with or without an optimization level.
Created attachment 45504 [details] valgrind_output.txt
Created attachment 45505 [details] gdb_backtrace.txt
Hopefully we'll find time to investigate this thoroughly. In the meantime, I notice that this might be a temporary workaround: - class Foo + class Foo : virtual public sigc::trackable so that both base classes inherit from trackable. I don't think that should be necessary, because it's the signal handler's class that must normally inherit from trackable, not the signal's class.
You might also t
Not sure where that last coment came from.
Hi Murray, glad you could repro the bug. I have been thinking about what would be the proper fix recently. While reading what follows, please keep in mind that I only have very superficial knowledge of the libsigc++ code (I read it for a few hours), so I could be completely wrong. Essentially, mem_fun-type slots care about 2 kinds of trackable objects: 1) The instance on which to invoke the method 2) The arguments bound by reference Currently, slots store a pointer to 1) and 2) by using their _original_ type. Example: Bar *bar = new Bar(); ... sigc::bind(sigc::mem_fun(this, &Foo::Frobnicate), sigc::ref(*bar)) Here, the inner slot will store a pointer of kind 1) 'this' of type Foo *, and the outer slot will store a pointer of kind 2) 'bar' of type Bar *. 3) When the slot is invoked, it uses pointers of kind 1) and 2) as is, i.e. their type is unmodified. 4) When the slot is being destroyed, it calls remove_destroy_notify_callback() on all pointers of kind 1) and 2), and that is where an _implicit_ pointer conversion occurs: in our example, the compiler converts a Foo * and a Bar * to a sigc::trackable *. Because we use virtual inheritance, the _value_ of the pointer is not the same. In our test case, the bug occurs because at the time 4) happens, the Foo * facet of the instance is gone, so the compiler is unable to make the implicit pointer conversion: it is returning a garbage sigc::trackable pointer value (which might be a bug, you would imagine that it could throw an exception instead), which leads to a SEGV. It is unclear who is the culprit here: is it libsigc++ that tries to convert a pointer of a type that does not exist anymore, or is it g++ that should allow for the conversion to be possible at all times until the very bottom destructor is called? I don't know what the C++ standard says. Assuming that it is a libsigc++ bug, what is the way to fix it? I think a good way would be for the slots to store not one, but two pointers for instance of the kind 1) or 2). The first pointer would be a pointer with the original type (Foo *, Bar *) to be used in case 3), and the second pointer would be a pointer with the sigc::trackable * type, to be used in case 4). The conversion from the original type to the sigc::trackable * type would be done with a dynamic_cast<sigc::trackable *>() _at the time the slot is created_, which gives a guarantees that: o it works fine with virtual inheritance o the compiler can always do the conversion I have been thinking about prototyping this solution, but I haven't done it yet. What do you think?
> 4) When the slot is being destroyed, it calls remove_destroy_notify_callback() > on all pointers of kind 1) and 2), and that is where an _implicit_ pointer > conversion occurs: in our example, the compiler converts a Foo * and a Bar * to > a sigc::trackable *. Because we use virtual inheritance, the _value_ of the > pointer is not the same. What line of what libsigc++ source file does this "implicit cast"? A dynamic_cast<> is usually necessary with virtual inheritance.
Here is an excerpt of your attachment in comment #9: ===
+ Trace 58711
The source code for frame 3 is in visit_each.h. In our case, T_target is sigc::trackable, T_action is sigc::internal::slot_do_unbind, T_type is SpecificDialog. Consequently, 'execute_' takes 2 arguments: const SpecificDialog& _A_type const limit_derived_target<sigc::trackable*, sigc::internal::slot_do_unbind> &_A_action So by reading the code, all of this results in the invokation of the sigc::internal::slot_do_unbind functor, with 1 argument: &_A_type. Now this is very important: the type of this argument is 'const SpecificDialog*'. In frame 2, we can verify that the practice correlates with our reading of the code: the sigc::internal::slot_do_unbind functor is invoked. Now the code for that is in sigc++/functors/slot_base.h. There, the code clearly chows that sigc::internal::slot_do_unbind::operator() takes 1 argument, of type 'const trackable*'. Therefore, I believe that between frame 3 and frame 2, the compiler casts a 'const SpecificDialog*' argument to a 'const trackable*' argument. That is what I call the "implicit cast". The compiler does it, naturally, without the need for us to know about it. Most of the time, this works perfectly because both pointer's numerical values are the same, so the compiler's job is a no-op. But in our case, because of the virtual inheritance involved, the pointer's numerical values are different, and at the time this cast is made, the SpecificDialog side of the instance has already been destroyed, so the compiler is unable to make the conversion of values, and the compiler returns a garbage value for the 'const trackable*', which leads to a SEGV in frame 0, the first time the pointer is dereferenced. Hence my proposal: have the slot store both the actual pointer (SpecificDialog*) and the trackable pointer (trackable*) to the instance. Do the conversion once for all with a dynamic_cast at the time the slot is constructed. Then have the slot use the actual pointer at invokation time, and the trackable pointer at unbind time: this way, the compiler's conversion facility is not needed at unbind time. What do you think?
I'm still thinking about that. But in the meantime, I notice that the alternative code in visit_each() also removes the crash. Look for line 131 in visit_each.h: //With g++ 3.3.4, //specifying the types of the template specialization causes a segfault in tests/test_bind. //But there is no segfault with g++ 3.4.2, and this is required by the AIX (and maybe IRIX MipsPro and Tru64) compilers. //visit_each<type_limited_action, T_functor>(limited_action, _A_functor); //g++ (even slightly old ones) is our primary platform, so we use the non-crashing version. //Users (and distributors) of libsigc++ on AIX (and maybe IRIX MipsPro and Tru64) may //need to use this version above instead, to allow compilation. visit_each(limited_action, _A_functor); Try using the first call line instead of the second, and tell me if that fixes your crashes. What version of g++ do you have? This makes sense in terms of your theory - by calling the correct template instantiation, the compiler has the correct type information, so maybe it can then cast correctly. dynamic_cast<> would only be necessary when casting from a base to a derived class, not from derived to base.
Murray, We are using version 2.0.6, so I was unaware of the alternative code. I just downloaded version 2.0.11 to play with it, and Christian or I will report on our findings. > This makes sense in terms of your theory - by calling the correct template > instantiation, the compiler has the correct type information, so maybe it can > then cast correctly. Are you saying that by using this code visit_each<type_limited_action, T_functor>(limited_action, _A_functor); instead of this code visit_each(limited_action, _A_functor); the compiler will use a different template instance? How is it so? To me, there is no possible ambiguity on the template instance that the compiler should use here, so specifying it by hand should be unnecessary, and it is just a workaround for compiler bugs (g++ is buggy because it cannot stand it, and AIX is buggy because it requires it). More importantly, I fail to see why specifying the template instance we want by hand would help solving the cast problem I describe in comment #15: single-stepping the code should show exactly the same backtrace, and at some point, the compiler will implicitly cast from SpecificDialog * to trackable *. While investigating this issue here at VMware, we have noticed in several occasions that modifying our code even very slightly, for example just adding trivial log statements, sometimes is enough to make the SEGV in frame 0 go away. So it seems to me that, similarly, using the alternative code for visit_each() might seem like it makes the bug go away, but this is just luck. The bug is still here, but for some reason (unknown to me) it does not trigger ... until you modify the code a little bit more, and it comes back to haunt you. I think a more important question, and the reason why we posted this bug in the g++ mailing list in the first place, is to determine whether it is legal C++ to try to cast a pointer to a derived instance into a pointer to the base instance, although the derived instance has already been destroyed. If the answer to the question is "yes, this is legal", then g++ has a bug and the g++ maitainers should fix it, and when they do, libsigc++ will work perfectly. If the answer to the question is "no, this is not legal" or "it is unspecified by the C++ standard, so compilers are free to do whatever they want", then libsigc++ should not rely on this behavior, and it should be fixed (for example by using the 2-pointers proposal in comment #15). > dynamic_cast<> would only be necessary when casting from a > base to a derived class, not from derived to base. Agreed. Forget about that.
> To me, there is no possible ambiguity on the template instance that the compiler > should use here, so specifying it by hand should be unnecessary Yes, but many compilers do need the extra hint and g++ 3.3.4 and 3.3.5 do behave differently. I don't know how to begin exploring that for the g++ developers. IT's not easy to make a simple test case from this. With the specific version, it's only some uses of sigc::ref() in test_bind that crash. Luckily, I don't think anybody uses that. I'm still thinking about this cast theory.
For the record: in our code here, we use g++ 3.3.4, and we use sigc::ref() extensively (because we want slots to be invalidated when the objects bound to them go away). While we are talking about this, let me point out that we are actually seeing _2_ classes of this kind of "remove_destroy_notify_callback" issues. Consider the scenario that is implemented by the attachment provided in comment #7: Class s(pecific) inherits from class b(ase), which itself nherits from class sigc::(t)rackable. B defines a signal, and s connects a slot to it, which points to one of its member functions Foo. Finally, at some point, an instance of s is destroyed. So we end up with something like this: t +---+ | | +---+ ^ | inherits b | +-----------+ | signal | | slot -+--\ | | | +-----------+ | ^ | | inherits | s | | +-------+ | | | | | Foo <-+------/ | | +-------+ o The first class of issues is exactly what is reported in this bug: the destructor for s is called, then the destructor for b, then the destructor for the signal, then the destructor for the slot. At this very moment, the slot tries to unbind from the s pointer, that the compiler implicitly casts to a t pointer, and kaboom. This class of issues can be fixed, for example by using the 2-pointers proposal in comment #15. o Now assuming that the first class of issues is fixed, there is a second class of issues: imagine that inside the destructor for b, the signal is emit()ed. Then s::Foo() will be invoked, with the s pointer as its 'this' argument. The problem is: s has already been destroyed, so really the slot is holding an invalid pointer, but it does not know it, because the destructor for t has not been reached yet, so the slot has not been notified yet that s has gone away. This second class of issues is annoying to solve. Fortunately, firing a signal in a destructor is rather infrequent, and there is an easy workaround: s can simply disconnect its s::Foo() slot in its destructor. However, it would be nice if libsigc++ could fully take care of this second class of issues as well, so the code that uses the library does not have to worry about these implementation subtleties.
> For the record: in our code here, we use g++ 3.3.4, and we use sigc::ref() > extensively (because we want slots to be invalidated when the objects bound to > them go away). You are using sigc::ref() with sigc::bind()?
> Are you saying that by using this code > visit_each<type_limited_action, T_functor>(limited_action, _A_functor); > instead of this code > visit_each(limited_action, _A_functor); > the compiler will use a different template instance? How is it so? I think, because the visit_each<> template specializations are declared with various different sets of types, in various orders. For instance, there's template <class T_action, int T_loc, class T_functor, class T_bound> void visit_each(const T_action& _A_action, const bind_functor<T_loc, T_functor, T_bound>& _A_target) as well as the normal template <class T_action, class T_functor> void visit_each(const T_action& _A_action, const T_functor& _A_functor) I also think I see a memory leak when explictly specifying the template instantiation. So, I'll assume that g++ was calling the correct template, and that I was just hiding the problem that exists in one particular visit_each<> specialization. Sorry if I seem to be ignoring your replies - I have to make sense of my own theories first.
> You are using sigc::ref() with sigc::bind()? Yes > Sorry if I seem to be ignoring your replies - I have to make > sense of my own theories first. No problem. It is cool. We perfectly understand, and we do not want to pressure you in any way.
I agree that the cast to trackable* is going wrong (during the trackable's destructor), in limit_derived_target<T_target*, T_action>::with_type<true,T_type>::execute_(const T_type& _A_type, const T_self& _A_action) This patch (show_bad_cast.patch) adds some output to demonstrate that (see "TESTING CAST"). It's possible that the cast would fail earlier, but casting to trackable* is not always valid in the previously-called templates.
Created attachment 45732 [details] [review] show_bad_cast.patch part of the output from test_custom.cc: OnClicked 1 ~SpecificDialog this=0x804d008 ~SpecificDialog (trackable*)this=0x804d018 ~Foo this=0x804d008 ~BaseDialog this=0x804d00c ~BaseDialog (trackable*)this=0x804d018 ~trackable() before notify_callbacks: this=0x804d010 trackable::notify_callbacks: this=0x804d010 trackable::notify_callbacks: after delete callback_list this=0x804d010 ~trackable() after notify_callbacks: this=0x804d010 ~trackable() before notify_callbacks: this=0x804d018 trackable::notify_callbacks: this=0x804d018 visit_each<>(): const T_functor* _A_functor = 0x804d008 limit_derived_target<T_target*, T_action>::operator(): T_type* _A_type=0x804d008 limit_derived_target<T_target*, T_action>::execute_(): TESTING CAST: limit_derived_target<T_target*, T_action>::execute_(): const T_self_A_type=0x804d008 limit_derived_target<T_target*, T_action>::execute_(): trackable* _A_type=0x10097098 That 0x10097098 should be 0x804d018, as seen at the start.
My summary so far: - The SpecificDialog part has been destroyed, so libsigc++ shouldn't still be using pointers to it. - It can use the not-yet-destroyed base sigc::trackable part, but it shouldn't expect a cast from SpecificDialog to trackable* to work. - Where the object is a trackable, and where it is being used by reference, it could store only a trackable*. It might not need any derived information. - The object is stored as a templated derived pointer type in functors/mem_fun.h's bound_mem_functor*::obj_ptr_. - That obj_ptr_ is used in mem_fun.h's visit_each(), where the trouble starts. If it was a trackable* instead of a specific type then maybe it would not instantiate further template instantiations involving the derived type.
Created attachment 45740 [details] [review] two_pointers.patch This patch tests our theory, successfully. It stores the pointer to base (trackable) and uses that instead of the pointer to the (sometimes dead) derived class. I don't think two pointers should be necessary, but I can't get the dynamic_cast<Derived>(base) to work here yet. Even with one pointer, this would still break API, because it requires all signal handler classes to derive from sigc::trackable. Some (I'm not sure which) do not need to at the moment. Maybe this is useful to you even though we couldn't put it in libsigc++ 2.0. Or maybe you can improve on it. It might very possibly have awful side-effects, and the same thing might be needed in other parts of the code.
Created attachment 45744 [details] glibmm_two_pointers.patch Make glibmm build when using the two_pointers.patch libsigc++ patch.
Created attachment 45759 [details] [review] two_pointers_for_trackables.patch This patch does the same thing, but only for types deriving from sigc::trackable. Therefore it is not ABI/API-breaking. Please test this. However, I will not commit it yet, because I want to release a version with a SUN compilation fix, before risking build problems with this change. If you want to do the same for bound parameters then I suggest you start with a test case, in a new bug report. However, that patch could be more complicated because you have to worry about copying derived types by value and calling their derived destructors. Making trackable's destructor virtual might help partly.
> Your comment #23 and your comment #24 Thanks for confirming that the problem is exactly what we thought it was. Btw, in your attachment in comment #24, this part of the patch is wrong (although it is harmless): test_exception_catch_SOURCES = test_exception_catch.cc - +test_exception_catch_SOURCES = test_exception_catch.cc Also I believe the test case in comment #7 (on which you based your test_custom.cc in the patch) can be simplified (which would also simplify the backtrace): as described in comment #19, 'OnClicked()' does not have to call 'delete this;' to trigger the bug, it is not important who deletes the SpecificDialog instance, and it could be done in main(), instead of emitting the 'clicked' signal. > My summary so far: > - The SpecificDialog part has been destroyed, so libsigc++ shouldn't still be > using pointers to it. > - It can use the not-yet-destroyed base sigc::trackable part, but it shouldn't > expect a cast from SpecificDialog to trackable* to work. Yes, but as I mentionned in comment #17, it might not be libsigc++'s fault. We need to know if this is legal C++ or not: it is entirely possible that the C++ standard mandates that the compiler must ensure the correctness of all casts until the topmost class(es) of an instance have been destroyed. > - Where the object is a trackable, and where it is being used by reference, it > could store only a trackable*. It might not need any derived information. Correct. > - The object is stored as a templated derived pointer type in > functors/mem_fun.h's bound_mem_functor*::obj_ptr_. > - That obj_ptr_ is used in mem_fun.h's visit_each(), where the trouble starts. > If it was a trackable* instead of a specific type then maybe it would not > instantiate further template instantiations involving the derived type. Correct, but incomplete: here you are only considering the first kind (see "1)" in comment #13) of pointer that is stored by a slot. There is another kind (see "2)" in comment #13), which correspond to all the trackable arguments that you bind by reference to a slot. These as stored adaptors/bind.h's bind_functor*::bound*_. When a slot is destroyed, it also calls remove_destroy_notify_callback() on all these trackables, and does a similar implicit cast. > Patch in comment #26 This looks good. A few remarks though: 1) + { return ( (obj_ptr_derived_)->*(this->func_ptr_))(LOOP(_A_a%1, $1)); } I think the "this->" is not needed. I have found a similar "issue" in the mem_functor macro, where "this->" is used for the for "pointer" version of operator(), but interestingly "this->" is (correctly) not used for the "reference" version of operator(). 2) +// { return ( (($3 T_obj*)(obj_ptr_) )->*(this->func_ptr_))(LOOP(_A_a%1, $1)); } > but I can't get the dynamic_cast<Derived>(base) to work here yet. I think it doesn't work because you don't use a dynamic_cast here. You use an old C-style cast, which will return the same pointer value, which is wrong in the case of virtual inheritance. > Even with one pointer, this would still break API, because it requires all > signal handler classes to derive from sigc::trackable. Some (I'm not sure > which) do not need to at the moment. Actually, I think this API breakage can be avoided: by using dynamic_cast<trackable *>(derived_pointer) to initialize obj_ptr_ (now I remember why I mentionned that at the end of comment #13, and stupidly denied it at the end of comment #17). If you do that, then the stored sigc::trackable * will be NULL for non-trackable signal handler classes, but we don't care because in that case the pointer is never used! If the API is not broken, then there is no need to patch files tests/test_compatibility.cc and tests/test_mem_fun.cc in the patch in comment #26, and there is no need for the glibmm patch in comment #27. My summary at this point ======================== Your patch in comment #26: a) Fixes the problem for trackable signal handler instances, but break the API by forcing all signal handlers to be trackable. b) Can be trivially improved as suggested above in this comment to not break the API. c) Can be improved as suggested above in this comment to handle trackable arguments that are bound by reference as well. Once b) and c) are done, I think we will have a serious candidate for inclusion in version 2.0.12 :)
> This patch does the same thing, but only for types deriving from > sigc::trackable. Therefore it is not ABI/API-breaking. Please test this. Apparently our bug updates crossed. I'm glad we came to the same conclusion about not breaking the API, though. Your solution looks more heavyweight than mine (dynamic_cast, a one-liner) though. I'm still reading your patch. > However, I will not commit it yet, because I want to release a version with a > SUN compilation fix, before risking build problems with this change. Makes plenty of sense. Let's test this heavily, and let's add the smallest possible test case to the test suite to ensure no future regressions. > If you want to do the same for bound parameters then I suggest you start with > a test case, in a new bug report. However, that patch could be more > complicated because you have to worry about copying derived types by value and > calling their derived destructors. Making trackable's destructor virtual might > help partly. I don't think it has to be that complicated: all we have to do is to store 2 pointers the same way in adaptors/bind.h's bind_functor*. And then we need to add template specialization for visit_each in the same file, to deal with the trackable case specially. Expect a new test case and a new bug report soon.
> but as I mentionned in comment #17, it might not be libsigc++'s fault. We > need to know if this is legal C++ or not. I really doubt that it is. But it's easy to make a test case that casts from Derived to Base after the Derived destructor has finished. It might not crash but you can ask C++ experts about the theory of it. > I think it doesn't work because you don't use a dynamic_cast here. You use an > old C-style cast, which will return the same pointer value. I didn't use it because I couldn't get it to work. I think that trackable was not fully defined there, possibly due to header guards and prototypes. Yes, it's best to use a static_cast<>, and I'd welcome patches to use them instead of C-style casts. C++ will not allow a static_cast<> where a dynamic_cast<> is necessary, so this has real benefits. I'm happy to see a simpler patch, but I do think you need the separate specialization. You can't cast to trackable* for types that don't derive from trackable. I did try to move some of the implementation into a base class, instead of copy/pasting it, but the constructors and operator() must be in the most-derived class.
> I didn't use it because I couldn't get it to work. I think that trackable was > not fully defined there, possibly due to header guards and prototypes. Weird. > Yes, it's best to use a static_cast<>, and I'd welcome patches to use them > instead of C-style casts. C++ will not allow a static_cast<> where a > dynamic_cast<> is necessary, so this has real benefits. You would be surprised: we have seen cases in our code, involving virtual inheritance, where the compiler was perfectly happy to compile with a static_cast, but would crash at run-time. When we replaced with a dynamic_cast, it started to work fine. > I'm happy to see a simpler patch, but I do think you need the separate > specialization. You can't cast to trackable* for types that don't derive from > trackable. Really? I thought that was the whole point of a dynamic_cast, to do this determination at run-time. I find it surprising. We do it all the time. If B inherits from A, but C does not inherit from A, then I think dynamic_cast<B *>(pointer to an A) returns a valid pointer, whereas dynamic_cast<C *>(pointer to an A) returns NULL. I'm starting to test your patches. I will keep you posted. Thanks for your valuable help with this so far.
Created attachment 45764 [details] New test case This test case is shorter and simpler than the one in comment #7. It corresponds precisely to the scenario described in comment #19, first class of issues.
Created attachment 45765 [details] New backtrace It corresponds precisely to the scenario described in comment #19, first class of issues.
Alright Murray, I took a deeper look at the patch you submitted in comment #28: o After unsuccessfully trying to avoid the template specialization, I now agree that it is the right thing to do :) What I said about dynamic_cast in comment #32 was nonsense. o After applying the patch, the test case submitted in comment #33 passes, as expected. Can you please make sure that this exact test case makes it to the libsigc++ test suite? Boy, I'm glad this problem is fixed. I will be feverishly waiting for version 2.0.12 of the lib! I'm working on a test case/bug report for the similar "bound parameters" issue. Thanks again for taking this issue seriously and for fixing it so quickly. You rock!
The corollary bug (for trackable arguments bound by reference) is now filed as bug 302327.
Created attachment 45812 [details] test_castdeleted.cc I tried tp create a test case without libsigc++ (test_castdeleted.cc), but I can't reproduce the bad cast. Maybe you'll have more luck.
Created attachment 45813 [details] test_castdeleted2.cc Actually I can reproduce it (test_castdeleted2.cc). The cast was wrong, just not so wildly wrong.
Here's the gcc bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21279
I've committed this now because the SUN fix I was waiting for might take longer and anyway only affects the latest SUN compiler version.