GNOME Bugzilla – Bug 167714
Large performance leak in signal_impl::notify and sweep
Last modified: 2017-11-09 08:56:18 UTC
The static member of signal_impl, notify is called every time a connection to the signal is disconnected. It then proceeds to sweep the entire list of slots to see which one was disconnected and remove it from the list. If notify is called during an emittion, it defers this sweeping until the end of execution. The flag that is set to cause the sweep at the end of emittion is never unset however, which also causes this to happen more frequently than necessary. Performance for sweep is O(n), and performed repeatedly (every disconnection) causes big problems. 52% of my application's running time was spent in these functions. I have produced a patch that fixes this problem, although I don't think it is ideal. The ideal solution would likely require more fundamental internal design changes. The performance of notify now is O(1) (it no longer calls sweep) unless performed during an emittion when it is still O(n) (and still defers the work to sweep at the end of emittion).
Created attachment 37598 [details] [review] Fix performance leak in signal_impl::notify
I have commit the 'deferred_=false' fix in signal_impl::sweep(). For the O(N) problem we need a more thorough fix. I will try to add a list that keeps track of all slots for which notify() is called during signal emission in a less hackish way.
From what I can see a thorough fix requires to break the ABI of signal_impl. I will prepare a bugfix that can go into the next major release.
I read over the comments again, and I think I had originally misunderstood one. I'm getting the impression that comment #2 implies a belief that this is only a problem for slots disconnecting during signal emission. I wanted to clarify that this occurs on -every- disconnect. If this was already clear ignore this additional comment :)
Even if it breaks ABI, a patch for this would be informative.
A patch would still be nice.
As bug 154498, bug 564005, and bug 642203 show, there are cases where the slot, which shall call signal_impl::notify(), is deleted before it makes that call. I have suggested in bug 564005, comments 8 and 12 that slot_rep::notify() shall not call the slot's parent (e.g. signal_impl::notify()) in such cases. If my suggested fix is implemented, the patch in this bug's comment 1 will replace a performance leak by a memory leak. In cases where signal_impl::notify() is not called, the self_and_iter struct allocated in signal_impl::insert() will never be deallocated. sigc::signal_impl is not involved in the problematic parts of the test cases in the 3 mentioned bugs. Instead it's Glib::SignalProxyConnectionNode which is part of the problem. But in bug 564005 comment 14, I have attached a test case without Glib::SignalProxyConnectionNode. If that test case is run without the patch in bug 564005 comment 12, signal_impl::notify() is called, the self_and_iter object is deallocated, and there is no memory leak. But the call to signal_impl::notify() is made while the called signal_impl object is being destructed. valgrind reports several invalid reads and writes during that call. Conclusion: Avoid allocating memory which will be deallocated only if signal_impl::notify() is called. Or suggest another fix of bug 564005.
What's the status now that those bugs are fixed?
The status of this bug is the same as it was before I added comment 7. Bug 564005 and its relatives have been fixed in a better way than discussed in comment 7. The pushed patch in bug 564005 comment 18 does not introduce new memory leaks, if it's combined with the patch in this bug's comment 1. But now when I've looked at it once again, I've discovered another way of getting a memory leak with the patch in comment 1. This is not related to bug 564005, but the test case in bug 564005 comment 14 can be used to demonstrate the memory leak. (Also available at http://git.gnome.org/browse/libsigc++2/tree/tests/test_bind_refptr.cc) Comment out or delete the line #define ACTIVATE_BUG 1 in test_bind_refptr.cc, compile with this bug's patch activated, and run with valgrind. It will report a memory leak. The self_and_iter object, allocated by signal_impl::insert(), is not deleted. In this case the Action object, and thus also Action::sig1, are deleted before the slot, connected to Action::sig1, has been disconnected. signal_impl::notify(), which should have deleted self_and_iter, is not called. Without the patch in comment 1, this behaviour does not cause any problem. Boost.signals uses a similar technique as the one proposed by Neal in this bug, disconnecting slots in O(1) time instead of O(n). There's no memory leak there. (Test case in bug 564005 comment 15.) I don't know how it's done. Conclusion: Avoid allocating memory which will be deallocated only if signal_impl::notify() is called, unless you can be sure that it _will_ be called.
(In reply to comment #9) > Boost.signals uses a similar technique as the one proposed by Neal in this bug, > disconnecting slots in O(1) time instead of O(n). There's no memory leak there. > (Test case in bug 564005 comment 15.) I don't know how it's done. Probably the slots are stored using some kind of intrusive list. In other words, each slot has "next" and "prev" pointers that can be used to add it to a list. The list itself is implemented as a cyclic list with a sentinel - the parent list structure also has those next and prev pointers which point at the head and tail, respectively. An empty list has prev = next = &list. This way, a slot can remove itself from the list in O(1) time. slot->next->prev = slot->prev; slot->prev->next = slot->next;
[Resetting QA Contact and Assignee to default - see bug 659799 for more info]
Created attachment 206439 [details] [review] patch: signal_base, signal_impl: Speed up disconnection of slots. This patch adds Neal's self_and_iter struct without causing memory leaks. The trick is to disconnect every slot (calling signal_impl::notify()) before it's removed from the signal's slot list. I believe that this is also how boost::signals avoids memory leaks.
Created attachment 206440 [details] A test case to measure execution time for disconnection. This program can be used for measuring the execution time with and without the patch in comment 12. Some results on my fairly old PC: Slot list size # disconnections Without patch With patch -------------- ---------------- ------------- ---------- 100 100 5 ms 4 ms 100 1000 6 ms 5 ms 100 10000 16 ms 15 ms 1000 100 6 ms 5 ms 1000 1000 14 ms 6 ms 1000 10000 103 ms 16 ms 10000 10000 1080 ms 26 ms The difference is significant only when at least about 1000 slots are connected to the signal and slots are connected and disconnected 1000s of times. Neal's program, where half of the execution time is spent disconnecting slots, must be exceptional.
I haven't pushed the patch in comment 12 because I think the time measurements in comment 13 indicate that it's unnecessary. Is there an application where the patch makes a noticeable difference? Neal's application, mentioned in comment 0, might be an example, but it's possible that Martin's commit in comment 2 plugged most of the performance leak. As mentioned in TODO comments in the patch, there is a marginally better way to make slot disconnection O(1) when ABI can be broken. Perhaps a fix can be reconsidered at the next ABI break.
Krzysztof, you say in https://mail.gnome.org/archives/gtkmm-list/2013-July/msg00052.html that this is the worst libsigc++ bug. You should have objected to my conclusion in comment 14. Have you got a program where disconnection of slots make up a considerable part of the program's execution time? Is something wrong with my timings in comment 13? The patch in comment 12 (or an updated version of it) can be applied without breaking ABI.
My opinion was very subjective. Some time ago I tried using libsigc++ in Inkscape's node editor, and the result was that selecting/deselecting a large path resulted in pauses of a several seconds even on fast machines. I later rewrote the code to use virtual functions to work around this performance problem. I'm not sure whether it was caused by this O(N) behavior or just by the overhead of creating and deleting so many slots. I assumed that nothing was done about the bug for such a long because the fix was ABI-breaking. I must have missed your earlier comments.
I have pushed the signal_base.cc part of the patch in comment 12. The other parts are obsolete and not related to this bug. https://git.gnome.org/browse/libsigc++2/commit/?id=ab3f6cd266e9cfa342847cde908b717cdacecea1
I forgot to update the documentation of signal_impl::notify() when that function was updated. Done now, at last. https://git.gnome.org/browse/libsigc++2/commit/?id=755a723b721da6b490b7f98ad29f0a5b2e0cd5a6
Created attachment 322971 [details] [review] 0001-signal_base-clear-signal_impl-in-its-own-destructor.patch > As mentioned in TODO comments in the patch, there is a marginally better way to > make slot disconnection O(1) when ABI can be broken. Perhaps a fix can be > reconsidered at the next ABI break. What ABI breaking change would have to be made? I guess it's more than just what's in this patch to libsigc++-3.0 (in the variadic_bind4 branch today). This patch causes several test cases to segfault.
It's slightly more difficult than I thought. But only slightly. All test cases pass (in the master branch), if you change ~signal_impl() to signal_impl::~signal_impl() { // unreference() must not call ~signal_impl() while clear() is executing. ++ref_count_; // Disconnect all slots before *this is deleted. clear(); // Now ref_count_ can be cleared again (not really necessary), but don't do it // with a call to unreference(). That would invoke ~signal_impl() recursively. --ref_count_; } Of course you can skip --ref_count_ and the corresponding comment.
Thanks. I have pushed that for libsigc++-3.0: https://git.gnome.org/browse/libsigc++2/commit/?h=variadic_bind4&id=06e86e782035b187ccd9509af0d82276dfc21698
However, what about that would have been ABI breaking?
Now ~signal_impl() is compiler-generated. A compiler can choose to inline it. Don't you think this scenario is possible: - An application is built with a compiler and an optimization level that makes ~signal_impl() inlined. - libsigc++ is upgraded to a version where ~signal_impl() is user-supplied, and can't be inlined. The application is not rebuilt. - Now the application calls its old inlined ~signal_impl(), which does not call clear(), and the new user-supplied ~signal_base(), which does not call impl_->clear(). Result: signal_impl::clear() will not be called when a signal is deleted.
Yes, well spotted.
These changes appear to have introduced leaks in at least 2 cases I've found. These leaks were not present in libsigc++-2.3.1 (RHEL 7.4 updated libsigc++ from that version to 2.10.0). Both patterns result in leaks if existing applications follow them, but the dynamically linked library is upgraded. The first occurs when an empty sigc::slot is connected to a signal: #include <sigc++/signal.h> int main(void) { sigc::slot<void()> empty; sigc::signal<void> sig; sig.connect(empty); } The empty slot has no rep_, and thus the set_parent() call that would normally hook signal_impl::notify does not happen resulting in the self_and_iter object never getting deleted. An interesting related case, if empty is defined using the deprecated sigc::slot<void> syntax, the leak vanishes if re-compiled against 2.10.0 headers as the slot goes through copy construction (which results in a rep_). I'm not a sigc expert, so this may have issues I haven't found, but perhaps a fix here would be to create a dummy slot_rep alter slot_base::set_parent() to create dummy rep_ so the cleanup callback can be held. It looks to me like this should be safe, though again, I'm not an expert in this codebase. At present, the slot_base::set_parent() call will only call slot_rep::set_parent() iff rep_ is set. This change would construct a dummy rep if it's not set and always call slot_rep::set_parent(): void slot_base::set_parent(void* parent, void* (*cleanup)(void*)) const noexcept { if (!rep_) rep_ = new slot_rep(nullptr, nullptr, nullptr); rep_->set_parent(parent, cleanup); } ============ The second case involves deletion of the signal while signal emission is in progress. This leaks the same self_and_iter object, though through a different path. It's unclear whether this is intended to be supported. It worked fine in earlier versions, and the bookkeeping of signal_impl by the signal_exec objects suggests it was intended to work fine. Here's a minimal example of this pattern: #include <sigc++/adaptors/bind.h> #include <sigc++/signal.h> void reap_sig(sigc::signal<void>* p_sig) { delete p_sig; } int main(void) { sigc::signal<void>* p_sig = new sigc::signal<void>(); p_sig->connect(sigc::bind(sigc::ptr_fun(&reap_sig), p_sig)); p_sig->emit(); } This has the signal object deleted by the function called during emission. This pattern is typically not so bare, typically an object may be constructed and fed into a reactor and the signal will be an event triggering its self-destruction. The leak happens because during signal emission, reference counting means that notify() (where self_and_iter would be deleted) is never called as destruction of signal_base does not call clear() if the impl is still referenced. Reference count dropping in the unwind after signal emission ends up done via signal_impl::unreference_exec() which does a "delete this" without any path to clear() or notify(). The signal_exec/signal_impl interaction appears to be entirely in header code, meaning there's no easy way to fix this purely in library code (I haven't been able to come up with one at least). This appears to be the hazard described in comment #7 and comment #23. It's unclear from the documentation I've dug through whether this is a pattern that's intended to be supported. Given libsigc++-2.10.0 has been out for more than a year, and nobody has pointed this out until now makes me think it may not be, though if it is a forbidden pattern, there should either be some protection against it, or documentation at least, especially since this was a behavior that did work fine prior to this version.
I forgot to mention in my report above, a workaround for the deletion during emission leak is to copy the signal prior to emission, though that obviously has big performance implications. Changing the emit in my above example to the following works around the leak successfully: sigc::signal<void>(*p_sig).emit();
Thanks for your very detailed bug comment. I've pushed fixes for the empty slot problem to the master branch and the libsigc++-2-10 branch. Your suggested fix of set_parent() would work fine in the libsigc++-2-10 branch, but I chose a slightly different solution, mainly because your fix can't be applied to the master branch with the coming ABI-breaking libsigc++3. slot_rep is an abstract base class there. Only subclasses of slot_rep can be instantiated. I don't know if deletion of a signal during emission was ever supposed to work, but perhaps it was. Anyway it does work without memory leak in the master branch, where signal_impl::clear() is called from the signal_impl dtor instead of the signal_base dtor. This is an ABI-breaking change, and can't be applied to libsigc++2. I've added some documentation to signal_base in the libsigc++-2-10 branch, mentioning your workaround.