GNOME Bugzilla – Bug 651942
DispatchNotifier tries to dereference destroyed Dispatcher
Last modified: 2012-05-31 14:05:06 UTC
Consider the following scenario: 1) Thread A constructs (first) Dispatcher object, which leads to construction of per-thread-singleton DispatchNotifier 2) Thread A connects 1. dispatcher to a slot 3) Thread A constructs another Dispatcher, which gets pointer to the already existing instance of DispatchNotifier 4) Thread A connects 2. dispatcher to a slot 5) Thread B calls emit() few times on A's 1. dispatcher 6) Thread C calls emit() few times on A's 2. dispatcher 7) Thread A destroys one of its Dispatchers with the possibility of leaving undelivered messages in DispatchNotifier's pipe (and DispatchNotifier continues to live since there is another Dispatcher referencing it) 8) DispatchNotifier::pipe_io_handler() is called where message from now destroyed Dispatcher is read from the pipe; since the message contains pointer to that Dispatcher and DispatchNotifier dereferences it, SEGFAULT follows Quick and dirty solution: Add std::set into the DispatchNotifier with pointers to valid Dispatchers. Registration / deregistration occurs in Dispatcher 'ctors via (un)reference_instance.
Bump.
Daniel Elstner (CCed) might be interested, but even he probably needs a simple-as-possible test case.
Created attachment 193428 [details] Test case
All right, I have created quite simple test case. I hope that it is, together with the former bug report and the comments inside, comprehensible enough.
Ok, are we going to do something about this issue? I've googled for "pipe_io_handler crash" and looks that I'm not the only one having problems - for example see this 4 years old post: http://lists-archives.org/gtk/06593-how-to-clear-pending-events.html
Daniel, are you interested in looking at Petr's test case in comment #3?
Fine, this is my last attempt. If nothing happens, I am going to give up on this and just use my patched version of Glib::Dispatcher.
(In reply to comment #7) > If nothing happens, I am going to give up on > this and just use my patched version of Glib::Dispatcher. You have a patch for Glib::Dispatcher to fix this problem?
(In reply to comment #8) > (In reply to comment #7) > > If nothing happens, I am going to give up on > > this and just use my patched version of Glib::Dispatcher. > > You have a patch for Glib::Dispatcher to fix this problem? Yes, see http://nohous.cz/gitweb/?p=bwfembed2/.git;a=blob;f=src/dispatcher2.h;h=70c70b782a67eec09649311d1189179940a089bf;hb=HEAD and http://nohous.cz/gitweb/?p=bwfembed2/.git;a=blob;f=src/dispatcher.cpp;h=33b9b583a5575d334e740ba6a01920297f1bfa6d;hb=HEAD
But as I stated in the former report, it's a quick and dirty solution. If the memory gets reused for another dispatcher, you are delivering messages of the old dispatcher to handlers of a new one.
OK, but could you attach an actual patch, even if it's just for explanation.
Allright, I will post it later today.
Created attachment 209501 [details] Test case 1 An updated version of the test case in comment 3. Uses classes in namespace Glib::Threads. I've also removed the dependence on Boost.
Created attachment 209502 [details] Test case 2 The Dispatcher2 class, linked to in comment 9, contains these lines in DispatchNotifier::pipe_io_handler(): // we don't emit directly since no more dispatcher events are processed // if nested main loop is called from signal handler (don't know why) context_->signal_idle().connect_once( sigc::mem_fun(data.dispatcher->signal_, &sigc::signal<void>::emit)); //emit Why would the signal handler start a nested main loop? It's not as far-fetched as you might think. It happens if the signal handler shows a modal dialog box, as demonstrated by this test case. It's a highly modified version of the dialogs/messagedialog example program in the gtkmm tutorial. The reason dispatcher events are not processed while a nested main loop is running, is that pipe_io_handler() is connected to an IOSource event source, and event sources are created with Source::get_can_recurse() == false. pipe_io_handler() will not be called again before the previous call has returned. The solution in Dispatcher2 is to create a new event source from which the signal handler is called. An alternative, and in my opinion better, solution is to call Source::set_can_recurse(true).
Created attachment 209518 [details] [review] patch: Dispatcher: Don't send messages to a deleted Dispatcher. (no ABI break) This patch is similar to Dispatcher2 in comment 9. There are two differences. 1. Instead of a std::set with addresses of all existing Dispatchers there is a std::set with addresses of deleted Dispatchers to which there may be messages in the pipe. This set is cleared when the pipe is empty. Almost always this set will be empty, and a search for an address will be fast. 2. Glib::Source::set_can_recurse(true) is called in DispatchNotifier's constructor. (See comment 14.) This solution suffers from the drawback mentioned in comment 10. Copied from DispatchNotifier::reference_instance(): // In the possible but unlikely case that a new dispatcher gets the same // address as a newly deleted one, if the pipe still contains messages to // the deleted dispatcher, those messages will be delivered to the new one. // Not ideal, but perhaps the best that can be done without breaking ABI. // The alternative would be to remove the following erase(), and risk not // delivering messages sent to the new dispatcher. See bug 651942. instance->deleted_dispatchers_.erase(dispatcher); This patch does not break ABI or API. (I assume that Glib::DispatchNotifier is not part of ABI.) I have added two usage rules for Dispatcher in dispatcher.h. These rules are not really new. They have just not been documented until now.
Created attachment 209522 [details] [review] patch: Dispatcher: Don't send messages to a deleted Dispatcher. (ABI break) This patch breaks ABI, but not API. It does not suffer from the drawback mentioned in comment 10 and comment 15. It's an alternative to the patch in comment 15. Each Dispatcher contains a pointer to a Dispatcher::Impl with all private data. When a Dispatcher is deleted, the Dispatcher::Impl is deleted only if the pipe is empty. Otherwise the pointer to the Dispatcher::Impl is copied to a list in the DispatchNotifier and the deletion of Dispatcher::Impl is delayed until the pipe is empty.
Is the ABI-preserving patch in comment 15 good enough? (Apart from some comments. I understand now that comments shall contain a URL to a bug report instead of just a bug number. And I should add a TODO that a better solution can be implemented in the next ABI-breaking release of glibmm.) As I've said, it's not perfect, but it does avoid the segfault that this bug is about. Is an ABI-breaking release in sight? If it's not too far ahead, it's perhaps better to wait for it, and skip the ABI-preserving patch.
(In reply to comment #17) > Is the ABI-preserving patch in comment 15 good enough? At this point you are probably the best judge of that. If it improves things and doesn't make them worse, please push it. > Is an ABI-breaking release in sight? If it's not too far ahead, it's perhaps > better to wait for it, and skip the ABI-preserving patch. I think we might do it around when GTK+ 4 happens. But that won't be particularly soon. I think we should not wait.
Pushed the ABI-preserving patch with an added TODO comment, suggesting implementation of the ABI-breaking solution when ABI can be broken. http://git.gnome.org/browse/glibmm/commit/?id=0f64aef58ac9d8ad1604800d3ba74bb998f63987
Created attachment 211301 [details] [review] Don't send a message, meant for a deleted Dispatcher, to a new Dispatcher. This patch replaces the ABI-preserving solution by the ABI-breaking solution. Probably more useful than the patch in comment 16 now when the ABI-preserving patch has been pushed. (ChangeLog is not updated. That can easily be added if and when this patch or a similar one is pushed.)
*** Bug 676119 has been marked as a duplicate of this bug. ***