GNOME Bugzilla – Bug 564005
Valgrind errors and crash on exit with Gtk::UIManager
Last modified: 2014-07-25 11:45:33 UTC
Please describe the problem: I have a problem with valgrind throwing errors on exit when using Gtk::UIManager. I also get "Send report to MS" dialogs in Windows and segfaults on exit on Solaris 10 with suncc. The test case is attached, as well as valgrind output. The errors go away if, instead of Glib::RefPtr<Gtk::Action>, Gtk::Action* is used. However, getting Gtk::Action* from RefPtr requires awkward action.operator->(). Another way is to keep a separate map of action enum members and Glib::RefPtr<Gtk::Action> objects. Steps to reproduce: 1. Compile the attached test case. 2. Run under windows - you should get a "Send Report" dialog box. or Run with valgrind under linux - you should get errors on exit. Actual results: See above. Expected results: No errors. Does this happen every time? yes. Other information:
Created attachment 124358 [details] Test case Compile instructions inside
Created attachment 124359 [details] Valgrind output
Does it help to use a const RefPtr& instead of just a RefPtr? Does it help to use sigc::ref()?
const RefPtr& - same errors. sigc::ref() helped, the errors are gone. Does that mean the errors occur because of changes in action object lifetime?
I really don't think that sigc::ref is the right answer. This may no longer give you valgrind warnings because you're binding a *reference* to the smart pointer, which doesn't take a reference on the underlying Action object. That means that the Action object doesn't get kept alive for longer, but it also means that you've got an invalid refptr reference as soon as the refptr that you've passed to sigc::ref() goes out of scope, and as far as I can tell, that will be at the end of the constructor. Be very careful about using sigc::ref (or using references at all) with smart pointers.
(In reply to comment #5) > I really don't think that sigc::ref is the right answer. No, I doubt that too. I was just looking for quick clues. I might look at this in more detail later.
I investigate things a big. It seems that the destroy order is the problem: - the ActionGroup is destroyed - thus - the Action is destroyed - now TestWindow itself is destroyed, with calls the deconstrutor of the slot that accesses the action that was previously destroyed. I have no clue how to fix this though. The valgrind errors disappear when action->reference() is called after connecting the signal but that leads to leaked memory later of course.
Created attachment 175472 [details] valgrind output with complete call stack traces The call stack traces in the valgrind output in comment 2 are not complete, probably because valgrind was run with the default value --num-callers=12. The attached file contains valgrind output with --num-callers=30. It also contains output when using the modified version of trackable.cc, described in bug 589202 comment 8. An excerpt from the valgrind output with comments: operator delete(void*) // Delete slot_rep sigc::slot_base::~slot_base() Glib::SignalProxyConnectionNode::destroy_notify_handler(void*, _GClosure*) Glib::RefPtr<Gtk::Action>::~RefPtr() // Delete the last ref to Action sigc::bound_argument<Glib::RefPtr<Gtk::Action> >::~bound_argument() sigc::internal::typed_slot_rep<...>::destroy(void*) sigc::internal::slot_rep::notify(void*) sigc::internal::trackable_callback_list::~trackable_callback_list() sigc::trackable::notify_callbacks() sigc::trackable::~trackable() TestWindow::~TestWindow() And here is slot_rep::notify(): //static void* slot_rep::notify(void* data) { slot_rep* self_ = reinterpret_cast<slot_rep*>(data); self_->call_ = 0; // Invalidate the slot. self_->destroy(); // Detach the stored functor from the other referred // trackables and destroy it. self_->disconnect(); // Disconnect the slot (might lead to deletion of self_) return 0; } What happens, I believe, is that *self_ is deleted during the call to self_->destroy(). Then during the call to self_->disconnect() there are several accesses to the deleted slot_rep instance. The last invalid read is an access to the deleted SignalProxyConnectionNode instance. The reason why the test case behaves differently in different operating systems can probably be found in the description in bug 589202: This code does not crash when run on Linux, but I believe that this is because the Windows debug runtime fills freed memory with a non-zero bit- pattern, and Linux manages to be saved by slot_rep::destroy()'s "if (destroy_) ..." check. In this bug 564005 it's not the "if (destroy_)" check that saves from segfault, since the faults occur after the call to slot_rep::destroy(). But both slot_rep and SignalProxyConnectionNode clear enough of themselves before they are deleted, so there is no attempt to delete a second time. A sketch of a possible solution: slot_rep::notify() checks after the call to self_->destroy() if *self_ has been deleted, and if it has, returns without calling self_->disconnect(). The check must be done without accessing *self_. If it has been deleted, it must not be accessed. One way to handle this is to keep a list of all existing slot_rep instances. The constructor stores the address of each constructed instance in a container (e.g. std::list or std::set). The destructor removes the address of each destructed instance from the container. I don't really like this solution. I hope that someone can come up with something better. What makes the test case special is that 'action' has two roles: 1. It's the receiver of signal_activate. 2. It's bound to a slot which is connected to the signal. If another Gtk::Action instance is bound to the slot, valgrind does not complain. E.g.: Glib::RefPtr<Gtk::Action> action = Gtk::Action::create("action_quit", Gtk::Stock::QUIT); Glib::RefPtr<Gtk::Action> action2 = Gtk::Action::create("action_about", Gtk::Stock::ABOUT); actiongroup_main->add(action, Gtk::AccelKey("<control>Q"), sigc::bind(sigc::bind(sigc::mem_fun(*this, &TestWindow::on_action_activated), action_quit), action2));
As noted by Kjell Ahlstedt, I think https://bugzilla.gnome.org/show_bug.cgi?id=642203 is probably the same issue. Any hope of a fix?
*** Bug 642203 has been marked as a duplicate of this bug. ***
In bug #642203, Kjell said "here is also the glibmm bug 154498, filed in the year 2004, probably describing the same error. The error is most probably in libsigc++. It is a fairly small module, but it's tricky."
Created attachment 190867 [details] [review] patch: slot_rep: Avoid access to deleted object in notify(). (libsigc++) Here's a libsigc++ patch that fixes this bug and the remaining part of bug 154498. It's a light-weight version of the solution I proposed in comment 8. There is no list of all existing slot_rep's, just a slot_rep* with the address of the last deleted slot_rep instance. This solution is perhaps not perfect, but it's the best I can think of without breaking ABI. slot_rep's constructor and destructor are inlined, which severely restricts what you can change without breaking ABI. I've added a static variable in slot_rep. It's used only in functions that are not inlined. That can't break ABI, can it? If this patch is accepted as the fix of this bug, the bug ought to be moved from gtkmm to libsigc++, I suppose.
Many thanks. If this is a bug in libsigc++ then I'd prefer to have a libsigc++-only test case for it.
Created attachment 191004 [details] A libsigc++-only test case Here's a test case that uses only libsigc++ and Glib::RefPtr<>. I think that it's necessary to include a reference counting object in order to show this bug.
Created attachment 191629 [details] A boost.signals test case I have modified the test case in comment 14 to use boost.signals and boost.bind instead of libsigc++. It works perfectly! valgrind reports no errors, neither accesses to released memory nor memory leaks. (Boost version 1.46.1.) I also added a test where Cairo::RefPtr<> is used instead of Glib::RefPtr<>. I must confess that such a test case with libsigc++ fails even with the patch in comment 12 (accesses to released memory), but with boost.signals there are no error reports from valgrind. There are lots of small differences between boost.signals and libsigc++. Judging from these test cases, some of the differences are important, and make boost.signals more robust than libsigc++. Is there a long-term plan to replace libsigc++ by boost.signals or boost.signals2 (the thread-safe variant of boost.signals)? Of course it would very much break both ABI and API.
(In reply to comment #15) > Is there a long-term plan to replace libsigc++ by boost.signals or > boost.signals2 (the thread-safe variant of boost.signals)? Of course it would > very much break both ABI and API. Boost in PITA when it comes to configuration stage (it has no pkg-config file) and "API/ABI stability" is rather a foreign statement for them. Or it was last time I checked.
We were waiting for signals to get into C++0x. But it seems that that didn't happen at the last minute, though everyone expected it.
Created attachment 191974 [details] [review] patch: slot_rep: Avoid access to deleted object in notify(). (libsigc++) Here's a libsigc++ patch that fixes this bug and the remaining part of bug 154498. It's better than the patch in comment 12. The test case in comment 14 also runs without valgrind errors, as does a similar test case with Cairo::RefPtr<> instead of Glib::RefPtr<>. Now I've used the fact that slot_rep is trackable. A local object in slot_rep::notify() is notified if the slot_rep object is deleted, and further accesses to it can be avoided. Obvious, once you think of it. Stupid of me not to think of it long ago! I have also swapped the calls to destroy() and disconnect() in slot_rep::notify(). disconnect() is now called before destroy(). This makes libsigc++ slightly more similar to boost.signals. More important, it makes it possible to implement the patch in bug 167714 comment 1, or something similar, without introducing memory leaks.
> patch: slot_rep: Avoid access to deleted object in notify(). Thanks. Pushed to master. I hate to touch the libsigc++ code, but it looks small and sane enough.
(In reply to comment #14) > Created an attachment (id=191004) [details] > A libsigc++-only test case > > Here's a test case that uses only libsigc++ and Glib::RefPtr<>. > I think that it's necessary to include a reference counting object in order to > show this bug. I have added this to libsigc++ like so: http://git.gnome.org/browse/libsigc++2/commit/?id=2f6fdd52f88ad97e7a260fca08d31c815e426dad I have also added this to glibmm, using the real RefPtr: http://git.gnome.org/browse/glibmm/commit/?id=a4df86a4e8da9a0310a3c80ba690c0178f1a5250
(In reply to comment #20) When you added the new test case to libsigc++, you did not change ChangeLog and tests/Makefile.am. You changed the corresponding files in glibmm. A drawback with this test case is that if the fixed bug creeps into the code again, it appears clearly only if the test case is run with valgrind. We have to accept that, I think. It would be difficult to construct a test case without that drawback.
Sorry. Fixed. Yes, it's better (than nothing) to have something that will show up in valgrind.
(In reply to comment #16) > Boost in PITA when it comes to configuration stage (it has no pkg-config file) > and "API/ABI stability" is rather a foreign statement for them. Or it was last > time I checked. It could be forked as libsigc++3. Definitely less work than fixing libsigc++2. (Bug #167714 most likely requires an ABI break anyway.)
The patch in comment 18 did not fix all problems. The test case in comment 14 is now tests/test_bind_refptr.cc. When that test case was run with Visual C++ in debug mode, it crashed. The reason for the crash was that a sigc::internal::signal_impl object was deleted while signal_impl::notify() was calling signal_impl::slots_.erase(). Deleting a std::list<> in the middle of a call to erase() on that very list is definitely risky. Surprising that no illegal access to deallocated memory occurred when the code was compiled with g++ or clang++. I have pushed a patch that fixes this problem. https://git.gnome.org/browse/libsigc++2/commit/?id=dfa4998657946b3ef6c769f74ceb00748b1049c4