Bug 564005 – Valgrind errors and crash on exit with Gtk::UIManager

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 564005 - Valgrind errors and crash on exit with Gtk::UIManager


Summary:	Valgrind errors and crash on exit with Gtk::UIManager


Status:	RESOLVED FIXED

Product:	libsigc++
Classification:	Bindings
Component:	signals
Version:	2.2.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Martin Schulze
QA Contact:	Martin Schulze

URL:
Whiteboard:

Duplicates:	642203 (view as bug list)
Depends on:
Blocks:	154498

Reported:	2008-12-10 16:08 UTC by Alexander Shaduri
Modified:	2014-07-25 11:45 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Test case (1.68 KB, text/plain) 2008-12-10 16:09 UTC, Alexander Shaduri		Details
Valgrind output (15.19 KB, text/plain) 2008-12-10 16:09 UTC, Alexander Shaduri		Details
valgrind output with complete call stack traces (21.89 KB, text/plain) 2010-11-29 15:25 UTC, Kjell Ahlstedt		Details
patch: slot_rep: Avoid access to deleted object in notify(). (libsigc++) (3.06 KB, patch) 2011-06-28 14:30 UTC, Kjell Ahlstedt	none	Details \| Review
A libsigc++-only test case (4.23 KB, application/x-compressed-tar) 2011-06-30 11:50 UTC, Kjell Ahlstedt		Details
A boost.signals test case (5.51 KB, application/x-compressed-tar) 2011-07-10 14:41 UTC, Kjell Ahlstedt		Details
patch: slot_rep: Avoid access to deleted object in notify(). (libsigc++) (2.45 KB, patch) 2011-07-14 15:13 UTC, Kjell Ahlstedt	none	Details \| Review

Description Alexander Shaduri 2008-12-10 16:08:18 UTC

Please describe the problem:
I have a problem with valgrind throwing errors on exit when using Gtk::UIManager.
I also get "Send report to MS" dialogs in Windows and segfaults on exit on Solaris 10 with suncc.

The test case is attached, as well as valgrind output.

The errors go away if, instead of Glib::RefPtr<Gtk::Action>, Gtk::Action* is used. However, getting
Gtk::Action* from RefPtr requires awkward action.operator->(). Another way is to keep a separate
map of action enum members and Glib::RefPtr<Gtk::Action> objects.

Steps to reproduce:
1. Compile the attached test case.
2. Run under windows - you should get a "Send Report" dialog box.
or
Run with valgrind under linux - you should get errors on exit.

Actual results:
See above.

Expected results:
No errors.

Does this happen every time?
yes.

Other information:

Comment 1 Alexander Shaduri 2008-12-10 16:09:03 UTC

Created attachment 124358 [details]
Test case

Compile instructions inside

Comment 2 Alexander Shaduri 2008-12-10 16:09:38 UTC

Created attachment 124359 [details]
Valgrind output

Comment 3 Murray Cumming 2008-12-12 01:41:42 UTC

Does it help to use a const RefPtr& instead of just a RefPtr? Does it help to use sigc::ref()?

Comment 4 Alexander Shaduri 2008-12-12 11:02:40 UTC

const RefPtr& - same errors.
sigc::ref() helped, the errors are gone.
Does that mean the errors occur because of changes in action object lifetime?

Comment 5 Jonathon Jongsma 2008-12-12 14:52:42 UTC

I really don't think that sigc::ref is the right answer.  This may no longer give you valgrind warnings because you're binding a *reference* to the smart pointer, which doesn't take a reference on the underlying Action object.  That means that the Action object doesn't get kept alive for longer, but it also means that you've got an invalid refptr reference as soon as the refptr that you've passed to sigc::ref() goes out of scope, and as far as I can tell, that will be at the end of the constructor.  Be very careful about using sigc::ref (or using references at all) with smart pointers.

Comment 6 Murray Cumming 2008-12-12 14:56:07 UTC

(In reply to comment #5)
> I really don't think that sigc::ref is the right answer.

No, I doubt that too. I was just looking for quick clues. I might look at this in more detail later.

Comment 7 Johannes Schmid 2009-04-27 08:34:35 UTC

I investigate things a big. It seems that the destroy order is the problem:

- the ActionGroup is destroyed
- thus - the Action is destroyed
- now TestWindow itself is destroyed, with calls the deconstrutor of the slot that accesses the action that was previously destroyed.

I have no clue how to fix this though. The valgrind errors disappear when action->reference() is called after connecting the signal but that leads to leaked memory later of course.

Comment 8 Kjell Ahlstedt 2010-11-29 15:25:17 UTC

Created attachment 175472 [details]
valgrind output with complete call stack traces

The call stack traces in the valgrind output in comment 2 are not complete,
probably because valgrind was run with the default value --num-callers=12.
The attached file contains valgrind output with --num-callers=30. It also
contains output when using the modified version of trackable.cc, described
in bug 589202 comment 8.

An excerpt from the valgrind output with comments:

   operator delete(void*)  // Delete slot_rep
   sigc::slot_base::~slot_base()
   Glib::SignalProxyConnectionNode::destroy_notify_handler(void*, _GClosure*)
   Glib::RefPtr<Gtk::Action>::~RefPtr()  // Delete the last ref to Action
   sigc::bound_argument<Glib::RefPtr<Gtk::Action> >::~bound_argument()
   sigc::internal::typed_slot_rep<...>::destroy(void*)
   sigc::internal::slot_rep::notify(void*)
   sigc::internal::trackable_callback_list::~trackable_callback_list()
   sigc::trackable::notify_callbacks()
   sigc::trackable::~trackable()
   TestWindow::~TestWindow()

And here is slot_rep::notify():

   //static
void* slot_rep::notify(void* data)
{
  slot_rep* self_ = reinterpret_cast<slot_rep*>(data);

  self_->call_ = 0; // Invalidate the slot.
  self_->destroy(); // Detach the stored functor from the other referred
                    // trackables and destroy it.
  self_->disconnect(); // Disconnect the slot (might lead to deletion of self_)
  return 0;
}

What happens, I believe, is that *self_ is deleted during the call to
self_->destroy(). Then during the call to self_->disconnect() there are
several accesses to the deleted slot_rep instance. The last invalid read is an
access to the deleted SignalProxyConnectionNode instance.

The reason why the test case behaves differently in different operating systems
can probably be found in the description in bug 589202:
   This code does not crash when run on Linux, but I believe that this is
   because the Windows debug runtime fills freed memory with a non-zero bit-
   pattern, and Linux manages to be saved by slot_rep::destroy()'s
   "if (destroy_) ..." check.

In this bug 564005 it's not the "if (destroy_)" check that saves from segfault,
since the faults occur after the call to slot_rep::destroy(). But both
slot_rep and SignalProxyConnectionNode clear enough of themselves before they
are deleted, so there is no attempt to delete a second time.

A sketch of a possible solution:

slot_rep::notify() checks after the call to self_->destroy() if *self_
has been deleted, and if it has, returns without calling self_->disconnect().
The check must be done without accessing *self_. If it has been deleted, it
must not be accessed. One way to handle this is to keep a list of all existing
slot_rep instances. The constructor stores the address of each constructed
instance in a container (e.g. std::list or std::set). The destructor removes
the address of each destructed instance from the container.

I don't really like this solution. I hope that someone can come up with
something better.

What makes the test case special is that 'action' has two roles:
1. It's the receiver of signal_activate.
2. It's bound to a slot which is connected to the signal.

If another Gtk::Action instance is bound to the slot, valgrind does not
complain. E.g.:

Glib::RefPtr<Gtk::Action> action =
  Gtk::Action::create("action_quit", Gtk::Stock::QUIT);
Glib::RefPtr<Gtk::Action> action2 =
  Gtk::Action::create("action_about", Gtk::Stock::ABOUT);
actiongroup_main->add(action, Gtk::AccelKey("<control>Q"),
  sigc::bind(sigc::bind(sigc::mem_fun(*this, &TestWindow::on_action_activated),
  action_quit), action2));

Comment 9 Chris MacGregor 2011-03-27 01:11:34 UTC

As noted by Kjell Ahlstedt, I think https://bugzilla.gnome.org/show_bug.cgi?id=642203 is probably the same issue.

Any hope of a fix?

Comment 10 Murray Cumming 2011-03-30 09:39:35 UTC

*** Bug 642203 has been marked as a duplicate of this bug. ***

Comment 11 Murray Cumming 2011-03-30 09:40:34 UTC

In bug #642203, Kjell said
"here is also the glibmm bug 154498, filed in the year 2004, probably describing the same error. The error is most probably in libsigc++. It is a fairly small module, but it's tricky."

Comment 12 Kjell Ahlstedt 2011-06-28 14:30:45 UTC

Created attachment 190867 [details] [review]
patch: slot_rep: Avoid access to deleted object in notify(). (libsigc++)

Here's a libsigc++ patch that fixes this bug and the remaining part of
bug 154498. It's a light-weight version of the solution I proposed in
comment 8. There is no list of all existing slot_rep's, just a slot_rep*
with the address of the last deleted slot_rep instance.

This solution is perhaps not perfect, but it's the best I can think of without
breaking ABI. slot_rep's constructor and destructor are inlined, which
severely restricts what you can change without breaking ABI.

I've added a static variable in slot_rep. It's used only in functions that are
not inlined. That can't break ABI, can it?

If this patch is accepted as the fix of this bug, the bug ought to be moved
from gtkmm to libsigc++, I suppose.

Comment 13 Murray Cumming 2011-06-29 11:49:48 UTC

Many thanks. If this is a bug in libsigc++ then I'd prefer to have a libsigc++-only test case for it.

Comment 14 Kjell Ahlstedt 2011-06-30 11:50:01 UTC

Created attachment 191004 [details]
A libsigc++-only test case

Here's a test case that uses only libsigc++ and Glib::RefPtr<>.
I think that it's necessary to include a reference counting object in order to
show this bug.

Comment 15 Kjell Ahlstedt 2011-07-10 14:41:45 UTC

Created attachment 191629 [details]
A boost.signals test case

I have modified the test case in comment 14 to use boost.signals and boost.bind
instead of libsigc++. It works perfectly! valgrind reports no errors, neither
accesses to released memory nor memory leaks. (Boost version 1.46.1.)

I also added a test where Cairo::RefPtr<> is used instead of Glib::RefPtr<>.
I must confess that such a test case with libsigc++ fails even with the patch
in comment 12 (accesses to released memory), but with boost.signals there are
no error reports from valgrind.

There are lots of small differences between boost.signals and libsigc++.
Judging from these test cases, some of the differences are important, and make
boost.signals more robust than libsigc++.

Is there a long-term plan to replace libsigc++ by boost.signals or
boost.signals2 (the thread-safe variant of boost.signals)? Of course it would
very much break both ABI and API.

Comment 16 Krzesimir Nowak 2011-07-11 07:05:22 UTC

(In reply to comment #15)
> Is there a long-term plan to replace libsigc++ by boost.signals or
> boost.signals2 (the thread-safe variant of boost.signals)? Of course it would
> very much break both ABI and API.

Boost in PITA when it comes to configuration stage (it has no pkg-config file) and "API/ABI stability" is rather a foreign statement for them. Or it was last time I checked.

Comment 17 Murray Cumming 2011-07-11 08:58:21 UTC

We were waiting for signals to get into C++0x. But it seems that that didn't happen at the last minute, though everyone expected it.

Comment 18 Kjell Ahlstedt 2011-07-14 15:13:19 UTC

Created attachment 191974 [details] [review]
patch: slot_rep: Avoid access to deleted object in notify(). (libsigc++)

Here's a libsigc++ patch that fixes this bug and the remaining part of
bug 154498. It's better than the patch in comment 12. The test case in
comment 14 also runs without valgrind errors, as does a similar test case with
Cairo::RefPtr<> instead of Glib::RefPtr<>.

Now I've used the fact that slot_rep is trackable. A local object in
slot_rep::notify() is notified if the slot_rep object is deleted, and further
accesses to it can be avoided. Obvious, once you think of it. Stupid of me not
to think of it long ago!

I have also swapped the calls to destroy() and disconnect() in
slot_rep::notify(). disconnect() is now called before destroy(). This makes
libsigc++ slightly more similar to boost.signals. More important, it makes it
possible to implement the patch in bug 167714 comment 1, or something similar,
without introducing memory leaks.

Comment 19 Murray Cumming 2011-07-19 09:15:00 UTC

> patch: slot_rep: Avoid access to deleted object in notify().

Thanks. Pushed to master.

I hate to touch the libsigc++ code, but it looks small and sane enough.

Comment 20 Murray Cumming 2011-07-19 09:36:08 UTC

(In reply to comment #14)
> Created an attachment (id=191004) [details]
> A libsigc++-only test case
> 
> Here's a test case that uses only libsigc++ and Glib::RefPtr<>.
> I think that it's necessary to include a reference counting object in order to
> show this bug.

I have added this to libsigc++ like so:
http://git.gnome.org/browse/libsigc++2/commit/?id=2f6fdd52f88ad97e7a260fca08d31c815e426dad

I have also added this to glibmm, using the real RefPtr:
http://git.gnome.org/browse/glibmm/commit/?id=a4df86a4e8da9a0310a3c80ba690c0178f1a5250

Comment 21 Kjell Ahlstedt 2011-07-19 14:05:31 UTC

(In reply to comment #20)

When you added the new test case to libsigc++, you did not change ChangeLog and
tests/Makefile.am. You changed the corresponding files in glibmm.

A drawback with this test case is that if the fixed bug creeps into the code
again, it appears clearly only if the test case is run with valgrind.
We have to accept that, I think. It would be difficult to construct a test case
without that drawback.

Comment 22 Murray Cumming 2011-07-20 06:53:34 UTC

Sorry. Fixed.

Yes, it's better (than nothing) to have something that will show up in valgrind.

Comment 23 Krzysztof Kosiński 2011-07-25 18:45:11 UTC

(In reply to comment #16)
> Boost in PITA when it comes to configuration stage (it has no pkg-config file)
> and "API/ABI stability" is rather a foreign statement for them. Or it was last
> time I checked.

It could be forked as libsigc++3. Definitely less work than fixing libsigc++2. (Bug #167714 most likely requires an ABI break anyway.)

Comment 24 Kjell Ahlstedt 2014-07-25 11:45:33 UTC

The patch in comment 18 did not fix all problems. The test case in comment 14
is now tests/test_bind_refptr.cc. When that test case was run with Visual C++
in debug mode, it crashed.

The reason for the crash was that a sigc::internal::signal_impl object was
deleted while signal_impl::notify() was calling signal_impl::slots_.erase().
Deleting a std::list<> in the middle of a call to erase() on that very list
is definitely risky. Surprising that no illegal access to deallocated memory
occurred when the code was compiled with g++ or clang++.

I have pushed a patch that fixes this problem.
https://git.gnome.org/browse/libsigc++2/commit/?id=dfa4998657946b3ef6c769f74ceb00748b1049c4