GNOME Bugzilla – Bug 512348
Glib::Thread::create() does not appear to be thread safe
Last modified: 2015-09-16 13:41:29 UTC
Please describe the problem: Glib::Thread::create() constructs a new slot and passes it by pointer to the new thread through the thread function call_thread_entry_slot(). Where the method bound to the slot belongs to a class object derived from sigc::trackable, its creation appears to manipulate a list of notification callbacks (a std::list<> object) maintained by the sigc::trackable object. The same slot is disposed of in the new thread (a different thread), which also invokes the same list. That is not of itself problematic since clearly the deregistration of the slot in call_thread_entry_slot() cannot occur concurrently with its registration. However, it could occur at the same time that the original thread is creating new slots for the same target object whose method was bound to the slot: in that case, there would be undefined behaviour. See http://mail.gnome.org/archives/gtkmm-list/2008-January/msg00261.html for a more detailed explanation. Steps to reproduce: 1. 2. 3. Actual results: Expected results: Does this happen every time? Other information: It is difficult to see a way around this with the current libsigc++ implementation - it would probably require a new argument on slot creation allowing sigc::trackable to be ignored. In the meantime I have knocked up an implementation of a callback class which avoids this and which I will post and is trivial to use (but it is probably not the way glibmm will want to go). It also disposes of bug 396958
Created attachment 103813 [details] Example of one solution
"That is not of itself problematic ...". Actually, quite apart from what further slots the original thread may create, the temporary slot bound to the const reference argument of Glib::Thread::create() in that thread has to die at some stage, and thus apparently through its destructor invoke sigc::trackable functions if bound to a method of an object deriving from trackable. The standard says the bound temporary must last until the call to that function has returned, but does not specify when after that the destruction of the object is to occur. Even if on a particular compiler it occurs immediately after that return, it is at least possible that it might occur around the time that function call_thread_entry_slot() deletes the slot it has received if the thread call is short; and in particular on multi-processor machines before caches have synchronised to enable memory visibility.
Just to help me understand this bug report, have you ever actually seen this issue manifest itself in a program, or is this mostly theoretical at the moment?
Short explanation: No, it's not theoretical. It involves undefined behaviour (according to POSIX) and like most thread unsafe code the occasions of failure will be infrequent, random and difficult to attribute to any particular code path. It arises because on slot construction the list object maintained by the underlying sigc::trackable object is manipulated, and the same object is manipulated in a different thread on slot destruction without memory synchronisation. If the thread function is to continue to be passed as a sigc::slot object, the best course is to document that the slot shouldn't represent a non-static method of a class object derived from sigc::trackable. In the long term, the solution is to pass the callback by something like boost::function (or the callback object referred to in bug #396958). Long explanation: Libsigc++ advertises that if a slot created via sigc::mem_fun represents a non-static method of a class object derived from sigc::trackable, the slot will automatically be invalidated and disconnected if that object is destroyed. I have verified empirically that that is the case. It would be achieved on slot creation by a call to sigc::trackable::add_destroy_notify_callback() to register the slot with the relevant sigc::trackable target object belonging to the object whose method is bound to the slot. That function manipulates the std::list<sigc::trackable_callback> object held by the sigc::trackable object's sigc::trackable_callback_list member (it adds the relevant trackable_callback to the list using std::list<sigc::trackable_callback>::push_back()). On slot destruction the reverse would happen - a call would be made to sigc::trackable::remove_destroy_notify_callback(), which removes the relevant trackable_callback from the list by calling std::list<sigc::trackable_callback>::erase(). The "may" in the title arises from the fact that I have not been able to penetrate the sigc++ code to identify the code path through which the calls to sigc::trackable::add_destroy_notify_callback()/remove_destroy_notify_callback() are made. However, as I say, I have established empirically that automatic slot disconnection does occur. According to POSIX (and also Windows for those using that OS), modifying a memory location in two different threads, or modifying it in one and reading it in another, without synchronisation gives rise to undefined behaviour. Chris
My use of the word 'theoretical' was not intended to be pejorative. I was just curious about whether you started this analysis because you noticed a program behaving strangely, or whether you discovered this issue just by examining code. I understand most of what you've said here fairly well. The thing that's not completely clear to me yet is what situations are vulnerable and what situations are not vulnerable. I had originally understood the problem to only affect classes who have a bound member used as a thread creation slot and then also have other members bound to other slots. But your comment #2 above suggests that this may actually be a problem for any thread creation slot that is bound to a member function of a sigc::trackable object regardless of whether additional slots are bound to other member functions or not. And if I understand your analysis, it seems correct to me, though it seems unlikely to affect anything other than *very* short thread functions. So it appears that the 'safe' situations are the following: - using static and standalone functions - member functions of objects that do not derive from sigc::trackable - member functions of sigc::trackable objects that are relatively long-running and you never bind another member to a slot Does this sound correct to you?
Yes, I think that is a fair summary, although the third point hides a little complexity. For a fair non-summary on your third point, we both accept that if a std::list object can be manipulated (pushed onto, popped from or examined by eg a call to std::list<>::front()) by more than one thread without synchronisation then in due course it will reach an invalid state (empirical evidence suggests that the first time it seriously blows up will be when you are demonstrating your program to a prospective client). Each std::trackable object contains a std::list object containing destroy notify callbacks. New destroy notify callbacks appear to be pushed, and old ones popped, whenever a new slot referencing a non-static method of the object deriving from sigc::trackable is created or deleted. Therefore, we would be OK in the case of a non-static thread method of an object deriving from sigc::trackable if: (a) when Glib::Thread::create() is called and until the temporary bound to the Glib::Thread::create() function by the call to sigc::mem_fun has been destroyed, no other thread could create a new slot referencing a non-static method (any non-static method, not just the thread function) of that object until memory has re-synchronised, the object containing the std::list object remains in existence until that time, and no other slot previously bound to a non-static method of that object is destroyed in another thread until that time[1], which is an unknown time in the absence of specific synchronisation to deal with the point, but your "relatively long-running" addresses this; and (b) there is no unsynchronised memory affecting the std::list object (by the creation or deletion of any other slot objects or the going out of existence of the object containing the std::list object) when the slot copied in Glib::Thread::create() is later deleted in a different thread in call_thread_entry_slot(). What this really shows is that because libsigc++ is not thread safe (nor in my opinion should it be), it is not the right tool for this particular job. It is however the right tool for many other jobs. [1] Including the slot object deleted in call_thread_entry_slot() in the new thread.
Created attachment 112897 [details] [review] Documentation patch for Glib::Thread::create() Here is a documentation patch.
Created attachment 112898 [details] [review] Corrected patch Sorry the last patch failed to patch the second Glib::Thread::create() function.
Committed. Thanks. Are there other things to do for this bug, or can it be closed?
It would be nice to overload Glib::Thread::create() with a version taking a new callback class as an argument, such as in the example posted in earlier comments in this bug report, or to do so when c++0x comes out by using the std::function callback wrapper, and deprecate the version taking a sigc::slot object. (This would also provide a route for dealing with bug #396958.) Whether you want to keep the bug alive for that purpose I am not sure. If not I should close it.
I think it would probably be good to actually fix this at some point, though I'm still not sure which way we should do it. So i guess I'd recommend leaving it open.
> when c++0x comes out by using the std::function callback wrapper, and deprecate > the version taking a sigc::slot object. So std::function is safer than sigc::slot? Can't we make sigc::slot as safe as std::function?
The sigc::slot object itself is thread safe, but sigc::trackable, with which the slot is registered, is not. At the sigc level, the best thing would be to add functionality to be able to "switch off" trackability for particular slots. I remember some time ago looking at whether passing the address of the slot to sigc::trackable::remove_destroy_notify_callback() might do it (I think I found it would not, but I remember also thinking that the sigc code was rather opaque so that might be wrong). If it does do it, if the trackable object itself could be identified in Glib::Thread::create(), sigc::trackable::remove_destroy_notify_callback() could be called in Glib::Thread::create() on slot_copy immediately after: sigc::slot_base *const slot_copy = new sigc::slot<void>(slot); However I think I also concluded that information about the trackable object had been lost at that stage, but I am afraid it is some time since I looked at this. In any event, if this is to be dealt with at the sigc level, a better solution would be either (i) to be able to pass an additional argument to sigc::mem_fun() on slot creation which could, if changed from the default, prevent slot registration/deregistration in the first case, (ii) to have a compile time option which includes code which protects operations on the std::list object maintained by each sigc::trackable object with a mutex. The latter gives rise to dependency issues. It is the best solution however as it would allow different slots representing methods of the same object to be created in different threads. (Possibly there ought to be a more general health warning about this in the Glib::Thread documentation, as I suspect there might be a quantity of thread-unsafe code around because of sigc::trackable.) Chris
Even better, a third solution would be to be able to pass a custom container object to the sigc::trackable constructor as a constructor argument, or to have the container type as a template type (as with std::queue) - it would then be possible to pass in a thread-safe container (I would choose std::set rather than std::list anyway). That would avoid any dependency issues.
Wait a moment -- can't you just *not derive* your class from sigc::trackable and that's it? There used to be a class_slot() in older versions of sigc++, but I think since the switch to the boost syntax it is now implicitly untracked if the object does not derive from trackable.
Comment #15: Of course, and that is what the documentation patch I submitted for Glib::Thread::create() (now in glibmm-2.20) says. However, though now documented, this is a trap for the unwary - trackability is, or should be, irrelevant to the passing of a slot to start a new thread. Rather than document the problem, it would be better to prevent it by "switching off" trackability for the slot copied to call_thread_entry_slot(). This cannot be done at present.
Would someone like to summarize the current state of this bug, to make my life easier?
Summary: 1. This bug is a subset of the lack of thread safety of libsigc++ in general, and sigc::trackable in particular. In this instance, starting a thread via Glib::Thread::create() by calling a non-static member function of a class object deriving from sigc::trackable will cause undefined behaviour (and I believe I have come across a case where this has caused program malfunction). 2. The temporary fix, adopted since glibmm-2.20, is to document that no thread should be started in respect of such a function. 3. The permanent fix is either (a) to provide a new custom functor object other than sigc::slot to pass to Glib::Thread::create(), (b) to adopt tr1 and use std::tr1::function to pass to Glib::Thread::create() as an optional overload in those cases where std::tr1 is supported, or (c) to wait for glibmm to move to C++11/12 and either use std::function or abandon glib threads entirely and use std::thread. I suspect that none of the solutions in 3 is likely to seem attractive to you, so I suggest we stick with 2. A propos, I think glibmm/gtkmm would benefit greatly from some explanation about how to use gtkmm and libsigc++ in as safe a way as is practicable, because that is not obvious (even Glib::Dispatcher has significant hidden gotchas). I am willing to provide documentation patches with that in mind if there is a prospect of them being merged. There would be an issue about where to put this: probably as an addendum to the documentation on Glib::Thread and Glib::Dispatcher.
(In reply to comment #18) > I am willing to provide documentation patches with that in mind if > there is a prospect of them being merged. There would be an issue about where > to put this: probably as an addendum to the documentation on Glib::Thread and > Glib::Dispatcher. Or how about a chapter or appendix in the gtkmm book, with a link to it from the API documentation wherever it's relevant? You are the best judge of what's most appropriate.
Chris?
I'll do something this week. A new section 27.3 (recommended techniques) seems the best spot. I don't think it deserves a chapter to itself. Where is the source for the tutorial available? (It doesn't seem to be in the tarball.)
(In reply to comment #21) > Where is the source for the tutorial available? (It doesn't seem to be in the > tarball.) It's in the gtkmm-documentation module, in git, and released as a tarball.
Created attachment 183736 [details] [review] Documentation patch As I had no comments following my request for comments in the gtkmm mailing list, here is a patch to the gtkmm tutorial. This patch is against gtkmm-documentation-2.23.0. In the end, because of its length I have incorporated it as a separate chapter, but if you want to put it somewhere else that's fine. It is an adjusted (slightly expanded) version of the one I posted to the mailing list. It might be worth making a link to this from the documentation for Glib::Thread::create(). Is it OK for the glibmm documentation to link to other things in this way, and if so do you have any examples of how to do so? (If it is too complex it is probably not worthwhile: a textual cross-reference could be made instead.)
Created attachment 183741 [details] [review] ChangeLog diff Here is a separate patch for the ChangeLog. You will probably want to redate it with whatever date you apply the main patch and merge it into git.
Created attachment 184615 [details] [review] Supplementary documentation patch This is a supplementary patch (it applies over the one of two weeks ago), the purpose of which is to make point 1 under "The rules" a little more explicit and helpful.
Review of attachment 183736 [details] [review]: I have not finished reviewing this, but here is a start. ::: gtkmm-documentation-2.23.0/docs/tutorial/C/gtkmm-tutorial-in.xml.orig @@ +7378,3 @@ +provides the normal set of thread launching functions, mutexes, +condition variables and scoped locking classes required for writing +multi-threaded programs using C++. Those using recent versions of We generally try not to talk about API by first mentioning that it wraps glib API. Ideally, people don't need to think about glib. And anyway it's a general issue for everything, not just to be mentioned for this part of the API. @@ +7381,3 @@ +g++'s C++0x implementation can also make use of the thread facilities +to be provided by the forthcoming C++11/12 standard in combination +with <application>glibmm</application> and >kmm;. This should mention specific API, ideally with a link. Also, are you saying that people could/should use it instead, or in combination? @@ +7388,3 @@ +multiple threads of execution, arising from the fact that +<application>libsigc++</application> is not thread-safe. Amongst +other things, a class inheriting from Should the libsigc++ reference documentation have a comment about this? @@ +7405,3 @@ +connection. None of these complex interactions is protected by a +mutex or other means of synchronization. +</para> Yikes. That's a long bug report that's incomprehensible to the casual reader. I'd skip it and just link to a libsigc++ bug report or documentation. @@ +7421,3 @@ +Although code written for the X11 backend of GTK+ can use the GDK +global lock, as invoked by <function>gdk_threads_enter()</function> +and <function>gdk_threads_leave()</function>, so as to address GTK+ in What is X11-specific about gdk_threads_enter/leave()? Their documentation doesn't mention any platform specifics. @@ +7423,3 @@ +and <function>gdk_threads_leave()</function>, so as to address GTK+ in +more than one thread, because of <application>libsigc++</application> +this will not usually work with >kmm;. Instead, if a worker thread I would clarify "work". @@ +7424,3 @@ +more than one thread, because of <application>libsigc++</application> +this will not usually work with >kmm;. Instead, if a worker thread +needs to invoke a >kmm; function, it should do so via I would first mention that we discourage calling GTK+ from multiple threads anyway, though gdk_threads_enter/leave() is a possible solution when you absolutely must. Only then would I mention that there's a problem with that in gtkmm. @@ +7440,3 @@ +<classname>sigc::signal</classname> object should be regarded as owned +by the thread which created it. Only that thread should connect slots +with respect to the signal object, and only that thread should Can "with respect" be removed here without loss of meaning? @@ +7444,3 @@ +<methodname>operator()()</methodname> on it. It follows (amongst +other things) that any signal object provided by a >kmm; widget +should only be operated on in the main GUI thread. Replace "operated on" with "connected to"?
Created attachment 196472 [details] [review] Revised documentation patch "We generally try not to talk about API by first mentioning that it wraps glib API. Ideally, people don't need to think about glib. And anyway it's a general issue for everything, not just to be mentioned for this part of the API." Done. "[Comments about C++11 threads] should mention specific API, ideally with a link. Also, are you saying that people could/should use it instead, or in combination?" There is no linkable documentation that I know of about C++11 threads. They can be used in combination or instead of glibmm threads (and because Glib::Thread::create() is not thread safe on non-static methods of classes deriving from trackable possibly should be). However I can see that this could be seen as off-topic and if there must be documentation on C++11 threads if they are to be referred to, the best thing is just to omit the text. "Should the libsigc++ reference documentation have a comment about [its lack of thread safety]?"] I think so, but I think it is best to do one thing at a time. Once this text is agreed it can be carried across. "Yikes. That's a long bug report that's incomprehensible to the casual reader. I'd skip it and just link to a libsigc++ bug report or documentation." It's more of an architectural report than a bug report: it explains why the basic design of libsigc++ is inimicable to thread safety. That is why boost didn't try to make its first, quite similar, signal library thread safe, but instead adopted a completely different approach in its signal2 module. A libsigc++ bug report saying "please rewrite this library" is probably not going to be helpful or well received. (As an aside, if the casual reader cannot understand the text in question then I doubt they should be programming with threads using C++ because they are the things anyone programming with shared resources must think about.) Instead I have shortened it in the attached redraft. The more advanced information could go in a footnote, but I won't do that unless it is likely to be accepted (and footnotes are possible using docbook, I have no idea). "What is X11-specific about gdk_threads_enter/leave()? Their documentation doesn't mention any platform specifics." The GDK global lock is specific to the X11 implementation - see http://developer.gnome.org/gdk/stable/gdk-Threads.html#gdk-Threads.description , fourth paragraph ("Unfortunately ..."). "I would clarify 'work'" Done. I have had to refer to "relevant signal object" and "relevant trackable object" without explaining what "relevant" means, which would take a significant amount of text - in effect a relevant signal object is any gtkmm widget's signal proxy whose methods might be invoked by a worker thread or which holds a slot of a relevant tracked object, and a relevant trackable object is any object derived from sigc::trackable which has any of its non-static methods referenced by a slot held by any signal object. That risks bringing back in all the gruesome detail that you would rather see omitted. If we are going to have footnotes, that could perhaps go in a footnote. "I would first mention that we discourage calling GTK+ from multiple threads anyway, though gdk_threads_enter/leave() is a possible solution when you absolutely must. Only then would I mention that there's a problem with that in gtkmm." Done. "Can 'with respect' be removed here without loss of meaning?" Done. "Replace "operated on" with "connected to"?" "Operated on" is correct here - the issue is wider than just connecting to it. I have also added to this paragraph to explain what is meant by "nulling" a slot, as this originally referred back to the lengthy explanation (now omitted) of libsigc++ thread interactions.
Review of attachment 196472 [details] [review]: Here is a partial review of the first part. Sorry for nitpicking, but I think it's really worthwhile to make this very clear and, if possible, provide some clear rules for people to follow, without requiring them to understand all the details. ::: gtkmm-documentation-3.0.3/docs/tutorial/C/gtkmm-tutorial-in.xml.orig @@ -7132,0 +7137,17 @@ +<chapter id="chapter-multi-threaded-programs"> +<title>Multi-threaded programs</title> + ... 14 more ... I would change "None of" to "That's because none of" is not thread safe -> are not thread safe. @@ -7132,0 +7137,19 @@ +<chapter id="chapter-multi-threaded-programs"> +<title>Multi-threaded programs</title> + ... 16 more ... is protected -> are protected @@ +7171,3 @@ +Use <classname>Glib::Dispatcher</classname> to invoke gtkmm functions +from worker threads (this is dealt with in more detail in the next +section). Although code written for the X11 backend of GTK+ can use This sentence is too long and therefore confusing. Also, it might be best as a <note>. I think that any mention of trackable or slot needs to make clearer that it's a libsigc++ thing. Using sigc:: would probably be the least-awkward way to do that. @@ +7178,3 @@ +usually be sufficient to achieve thread safety unless every operation +which might affect a relevant signal object, or might affect any +relevant trackable object's lifetime or slots representing any "relevant" is a bit odd here. It might be fine without it. @@ +7180,3 @@ +relevant trackable object's lifetime or slots representing any +relevant trackable object's non-static methods, is both within the +global lock and avoids any out-of-order locking with respect to any "sequence" is less ambiguous than "order" @@ +7181,3 @@ +relevant trackable object's non-static methods, is both within the +global lock and avoids any out-of-order locking with respect to any +other locks employed in the program. (As it happens, using I'd remove the "As it happens", and I'd take this out of brackets. @@ +7190,3 @@ +<listitem> +<para> +Unless special additional synchronization is employed, any particular We can probably remove "Unless ... is employed" with no great loss of meaning. And "any particular" can be replaced with "A" @@ +7197,3 @@ +<methodname>operator()()</methodname> on it, or null any of the +connected slots. It follows (amongst other things) that without +additional synchronization any signal object provided by a >kmm; I'd remove the "without additional synchronization" here too. @@ +7198,3 @@ +connected slots. It follows (amongst other things) that without +additional synchronization any signal object provided by a >kmm; +widget should only be operated on in the main GUI thread and any I'd restructure this, to put "Any Gtk::Widget signal should only be used in the main GUI thread. And any sigc::trackable used by a Gtk::Widget signal should..." at the start of the paragraph, moving the explanation after that. @@ +7207,3 @@ +<listitem> +<para> +Unless special additional synchronization is employed, any Again, I'd remove the "unless .. is employed". @@ +7209,3 @@ +Unless special additional synchronization is employed, any +<classname>sigc::connection</classname> object should be regarded as +owned by the thread which created the slot and called the method which "called the method" here is not gramatically correct, but I don't understand what it should be instead. @@ +7218,3 @@ +<listitem> +<para> +A slot which references a non-static method of a class deriving from Is this "non-static" distinction really useful, even if correct? @@ +7221,3 @@ +<classname>sigc::trackable</classname> should never be copied to +another thread, nor destroyed by a different thread than the one which +created it (one consequence of this is that I would break the sentence instead of putting part of it in brackets. @@ +7230,3 @@ +<listitem> +<para> +If a particular class object derives from I'd make that sentence shorter, focusing on the use of mem_fun(). @@ +7243,3 @@ +<listitem> +<para> +Although <application>glib</application> is itself thread-safe thread_init() is now deprecated and unnecessary, in glib 3.1/3.2.. Let's mention it only in a <note> if at all. @@ +7247,3 @@ +any <application>glibmm</application> wrappers which use +<application>libsigc++</application> will not be. So for example, the +only thread which should call I'd move the the list to the end of the sentence instead of putting a list in the middle of a sentence. @@ +7257,3 @@ +and manipulate any <classname>sigc::connection</classname> object +returned by them, is the thread in which the main loop to which the +connection is made runs. (The connect*_once() variants could be made I would remove this connect_*_once() issue and just leave a <!-- TODO --> comment pointing to a separate bug report about it. @@ +7267,3 @@ +</orderedlist> + +<para> This overall message belongs near the top.
Created attachment 199924 [details] [review] Second revise of documentation patch "I would change "None of" to "That's because none of"" Done "is not thread safe -> are not thread safe." Done. "is protected -> are protected" Done "This sentence is too long and therefore confusing. Also, it might be best as a <note>." I have made this a footnote rather than a note, and shortened it a little but I think if any more is removed it is impossible to make the point. Your previous comments invited me to explain what was meant by my previous "will not usually work" and I can think of no way of explaining it in any meaningful way which is shorter. The alternative is to go back to my former shorter but unexplanatory version. I think we should stay with it as it is now: I think your previous suggestion was right. Since there is now a footnote I have also included a footnote for the second paragraph incorporating the text of the first draft. Those who want to know more can read the footnote (it is actually the core of the problem and does help show to the knowledgeable that we know what we are talking about); those that are content just to read the advice in the tutorial needn't look at the footnote. "I think that any mention of trackable or slot needs to make clearer that it's a libsigc++ thing. Using sigc:: would probably be the least-awkward way to do that." Done, unless "the/a slot" or "slots" refer back to a reference to "sigc::slot" in the immediately preceding text. I am not completely convinced that it improves clarity but four eyes are better than two. ""relevant" is a bit odd here. It might be fine without it." Done. A competent programmer will implicitly insert "relevant". A less competent one perhaps should have a more absolute rule. ""sequence" is less ambiguous than "order"" Done. "I'd remove the "As it happens", and I'd take this out of brackets." I have taken all the words in brackets out as part of the shortening exercise. "We can probably remove "Unless ... is employed" with no great loss of meaning. And "any particular" can be replaced with "A"" Done. Given that we are deprecating the use of the GDK global lock, and the global lock is the only way to apply external synchronisation to the signal of a gtkmm widget, I think we can lose it. "I'd remove the "without additional synchronization" here too." As above. "I'd restructure this, to put "Any Gtk::Widget signal should only be used in the main GUI thread. And any sigc::trackable used by a Gtk::Widget signal should..." at the start of the paragraph, moving the explanation after that." I don't think "Gtk::Widget signal" is synonymous with "any signal object provided by a gtkmm widget", because the latter (my version) covers the additional signals of derived objects. Aside from that, I think your version really does obscure the meaning: what does "any ... signal should only be used" mean? It will only be "used" in the main GUI thead in one sense, because that is the one which emits. It is all the other things that can be done with a signal object which is the problem. I do think "operated on" is a better description and ties in with the second sentence of this paragraph. "Again, I'd remove the "unless .. is employed"." As above. "called the method" here is not gramatically correct, but I don't understand what it should be instead. I have reworded this. "Is this "non-static" distinction really useful, even if correct?" It is correct and I think it is useful. Trackability does not apply to static methods, because there is no object associated with them. I would not want to discourage starting threads with static methods: that may often be the best solution given sigc::trackable's lack of thread safety. That is, start the thread with a static method and if you need to pass in the object concerned to the thread function do so by binding in a pointer to it. "I would break the sentence instead of putting part of it in brackets." Done. "I'd make that sentence shorter, focusing on the use of mem_fun()." I think doing it that way risks the user failing to understand the point at issue. If it needs shortening, I would prefer to omit the reference to sigc::mem_fun(). "thread_init() is now deprecated and unnecessary, in glib 3.1/3.2.. Let's mention it only in a <note> if at all." I have omitted it. "I'd move the the list [of connection methods] to the end of the sentence instead of putting a list in the middle of a sentence." Done "I would remove this connect_*_once() issue and just leave a <!-- TODO --> comment pointing to a separate bug report about it." Done. "This overall message belongs near the top." I think I preferred the original, but not by much, so done.
Created attachment 199926 [details] [review] Third revise of documentation Sorry, I attached an old version for revision 2 which did not move the "overall message" to the start of "The Rules" section. (I was mulling it over in my mind and then failed to regenerate the patch.) Apart from that it is identical. Please look at this revision (revision 3) and not revision 2.
Review of attachment 199926 [details] [review]: It's looking pretty good. ::: gtkmm-documentation-3.0.3/docs/tutorial/C/gtkmm-tutorial-in.xml.orig @@ +7150,3 @@ +However, care is required when writing programs based on >kmm; using +multiple threads of execution, arising from the fact that +<application>libsigc++</application>, and in particular Doesn't GTK+ itself strongly discourage use of GTK+'s API from from non-GUI threads? Not mentioning that makes it looks like it's all our fault. Also, is there some bug or discussion somewhere about why libsigc++ can't be made to be thread-safe, so we can link to that? @@ +7162,3 @@ +keeping track of slots representing any of its non-static methods +(more particularly it keeps a list of callbacks which will null the +connected slots on its destruction). Each I've just noticed that you use double spaces after the period. That's unnecessary. Please fix them. @@ +7284,3 @@ +case where the slot does not relate to a non-static method of a class +deriving from <classname>sigc::trackable</classname>, but that has not +yet been implemented. Update this when that has happened. --> That should have a link to a bug report, so that it could happen, and so that you have a way to know later if it has happened, when reading that comment later.
Created attachment 201915 [details] [review] Fourth revise of documentation "Doesn't GTK+ itself strongly discourage use of GTK+'s API from from non-GUI threads? Not mentioning that makes it looks like it's all our fault. Also, is there some bug or discussion somewhere about why libsigc++ can't be made to be thread-safe, so we can link to that?" I have not seen use of the GDK global lock actively discouraged with GTK+, and http://developer.gnome.org/gdk/stable/gdk-Threads.html does not do so, nor is any special mention given to using g_idle_add() and cognates as the alternative. Employing the global lock can be useful under X11 with, for example, some graphics heavy uses. Some high performance users definitely invoke the global lock and there are a few cases where (if I happened not to be using gtkmm) I would do so also. The problem with thread safety is not gtkmm's "fault" but it does arise from libsigc++'s lack of thread safety which in turn springs from libsigc++ automating a lot of stuff behind the user's back. Design simplicity and minimisation of side effects are important for writing thread-safe code, and libsigc++ has, for its own reasons (mainly because it was conceived at a time when use of threading in desktop applications was not widespread) gone in the opposite direction. It has in my opinion become the single biggest blemish with gtkmm. libsigc++ could be made thread safe but it would involve work throughout that library. The trackability model adopted by libsigc++ makes it difficult and, because of the locking that would be required even within slot objects, somewhat less than ideally efficient: but it is certainly not impossible to make it adequately thread-safe while maintaining the API. There was some discussion of this on the message thread beginning with http://mail.gnome.org/archives/gtkmm-list/2010-July/msg00087.html but I doubt the value in linking to it here. The conclusion I reached, reported at http://mail.gnome.org/archives/gtkmm-list/2010-July/msg00119.html in that message thread, was that because of its design the whole libsigc++ library would need attention rather than just sigc::trackable, which I think was also the conclusion of Paul Davis and Krzysztof Kosiński. "I've just noticed that you use double spaces after the period. That's unnecessary. Please fix them." Done. "That should have a link to a bug report, so that it could happen, and so that you have a way to know later if it has happened, when reading that comment later." Done.
(In reply to comment #32) > Created an attachment (id=201915) [details] [review] > Fourth revise of documentation > > "Doesn't GTK+ itself strongly discourage use of GTK+'s API from from non-GUI > threads? Not mentioning that makes it looks like it's all our fault. Also, is > there some bug or discussion somewhere about why libsigc++ can't be made to be > thread-safe, so we can link to that?" > > I have not seen use of the GDK global lock actively discouraged with GTK+, and > http://developer.gnome.org/gdk/stable/gdk-Threads.html does not do so, nor is > any special mention given to using g_idle_add() and cognates as the > alternative. Employing the global lock can be useful under X11 with, for > example, some graphics heavy uses. Some high performance users definitely > invoke the global lock and there are a few cases where (if I happened not to be > using gtkmm) I would do so also. Doesn't this confirm this? http://mail.gnome.org/archives/gtk-devel-list/2012-March/msg00005.html
Comment #33: Presumably so, when gtk+-3.4 is released (if deprecation is to arise then). Where do you want to go with all this? The question of the gdk global lock is somewhat tangential to the thread-safety issues dealt with in the tutorial text, which is concerned principally with libsigc++'s thread safety. The text with respect to which you raise this point - the second paragraph - has no relevance to the gdk global lock. (The deprecation of the gdk global lock might call for the omission of footnote 2, but that is not the text in relation to which you raised the point: by all means cut it if you wish.)
(In reply to comment #34) > Comment #33: Presumably so, when gtk+-3.4 is released (if deprecation is to > arise then). gdk_threads_enter() isn't deprecated yet: http://developer.gnome.org/gdk/unstable/gdk-Threads.html#gdk-threads-enter though it still seems to be planned. > Where do you want to go with all this? The question of the gdk global lock is > somewhat tangential to the thread-safety issues dealt with in the tutorial > text, which is concerned principally with libsigc++'s thread safety. The text > with respect to which you raise this point - the second paragraph - has no > relevance to the gdk global lock. (The deprecation of the gdk global lock > might call for the omission of footnote 2, but that is not the text in relation > to which you raised the point: by all means cut it if you wish.) So maybe you want to omit mention of that API, at least for now.
What a pity if this discussion of a new "Multi-threaded programs" chapter dies here, when Chris has made such an effort! I can push this new chapter to the gtkmm tutorial, if no one objects. But some details have changed since the discussion started. 1. gdk_threads_enter() and gdk_threads_leave() have been deprecated in gtk+ 3.6. See code http://git.gnome.org/browse/gtk+/tree/gdk/gdk.c. (They are not yet marked deprecated at http://developer.gnome.org/gdk/unstable/gdk-Threads.html#gdk-threads-enter which is a bit odd.) These deprecated functions should not be mentioned in the tutorial, i.e. remove footnote 2 in the "The rules" section, as suggested in comment 35. 2. Change Glib::Thread to Glib::Threads::Thread. 3. "The connect*_once() variants could be made thread-safe for any case where the slot does not relate to a non-static method of a class deriving from sigc::trackable, but that has not yet been implemented: Bug 396958." I suppose the reference should have been to bug 396963. That bug has been fixed. The only remaining thread-unsafety in the connect*_once() methods relates to non-static methods of a class deriving from sigc::trackable. 4. "One other point to note is that the user code must ensure that any Glib::Dispatcher object remains in existence from the time it is emitted on by a worker thread to the time that the receiver thread completes executing its connected slots for that emission." This sentence can be removed, because bug 651942 has been fixed. I can make these changes, if Chris does not want to work with this bug any more.
The original problem in this bug, the thread-unsafety of Glib::Thread:: create(), can most easily be fixed by adding static Thread* create(const std::function<void()>& func, const std::string& name = std::string()); to Glib::Threads::Thread, as suggested in comment 10 and comment 18. It requires a C++11 capable compiler. If must be protected by some kind of #ifdef. This was discussed on the gtkmm-list in connection with move constructors, starting at https://mail.gnome.org/archives/gtkmm-list/2012-November/msg00072.html We need to decide how to handle new code in glibmm and gtkmm that uses C++11 features. I can make a patch for the Thread::create() case, so we have something to comment on.
(In reply to comment #36) > 1. gdk_threads_enter() and gdk_threads_leave() have been deprecated in > gtk+ 3.6. See code http://git.gnome.org/browse/gtk+/tree/gdk/gdk.c. > (They are not yet marked deprecated at > http://developer.gnome.org/gdk/unstable/gdk-Threads.html#gdk-threads-enter > which is a bit odd.) Oops, I didn't notice that it's a link to gdk2. Here's a better link: http://developer.gnome.org/gdk3/stable/gdk3-Threads.html#gdk-threads-enter
Comment #36 and #38: I agree with all your changes. If you have commit access I should make them (I had rather assumed this was destined for the great bit bucket in the sky and that I was wasting my time), and take joint credit. To be honest you take more interest in glibmm than I do nowadays. Comment #37: Using std::function requires either C++03-TR1, or C++11. I suggest having the functor argument as a templatized type which will work (after a fashion) with any version of C++, as an overload of the existing create functions. The nice thing in C++11 is that you can use collapsible references to provide perfect forwarding. In C++98/03 you can at least take it by reference to const of the template type, and it would take any callable object. I cannot see that this could be less efficient than taking a sigc::slot argument. It would probably be necessary to keep the existing thread create functions as sigc::slot specializations. The overload resolution rules (which prefer matching non-template functions to a templated version) will ensure everything works correctly. However this point also applies elsewhere. I have done some work on task composition using thread pools with asynchronous result handling of late, and I noticed that glibmm's thread pool suffers from the same problem as do the thread creation functions. (So I ended up not using glibmm's thread pool.)
(In reply to comment #39) > Comment #36 and #38: I agree with all your changes. If you have commit > access I should make them. Yes, I have commit access. I can push it to the git repository, when you have made the necessary changes. From my comment 36: "The only remaining thread-unsafety in the connect*_once() methods relates to non-static methods of a class deriving from sigc::trackable." This is slightly misleading. It's thread-unsafe if you use sigc::mem_fun(), whether the method is static or non-static. If it's static, you can use sigc::ptr_fun() instead. Then it will be thread-safe, because no object is involved. > Comment #37: Using std::function requires either C++03-TR1, or C++11. I > suggest having the functor argument as a templatized type which will work > (after a fashion) with any version of C++, as an overload of the existing > create functions. ...... Can you show with some code what you mean? Either a patch, or a sketch where I can fill in the details. One possibility that does not require much work would be to take a copy of all libsigc++ code that relates to sigc::slot, call it something else, and remove everything that has to do with sigc::trackable. I just don't like the idea of adding new code that mimics something that C++11 provides (in this case std::function).
On comment #37 I suggest the functor is passed in a similar way to how the standard algorithms take their operators or predicates. Similar, but not the same, because the standard algorithms keep their copy of the functor as an argument and so take it by value (if they took it by reference to const they could only execute operator()() const). glibmm does its own copying (on the heap) of the slot argument and passes the heap allocated slot as the data argument to g_thread_new(). This means the functor can be taken by reference to const. In C++11 it could be taken by collapsing reference, as does the equivalent std::thread constructor. (I use the expression "functor" in lieu of "function object" to avoid confusion with std::function, a particular type of function object. As our thread functor has void return type technically it is not a functor in the strict sense). So I would envisage an overload for Threads::Thread::create() as follows: template <class F> static Thread* Glib::Threads::Thread::create(const F& func); documenting that F is a callable object which takes no argument (which could of course be a std::function<void()> object, but could also be any function object not taking an argument including, in C++11, the result of std::bind or a lambda expression). In C++11 this could comprise (but doesn't have to): template <class F> static Thread* Glib::Threads::Thread::create(F&& func); which will take both lvalues and rvalues and if an rvalue carry out a move operation if a move operator is available, assuming appropriate use of std::forward. (The C++11 standard states that the functor object created by a lambda expression must have a copy constructor and may have a move constructor: I think it probably says the same about the return value of std::bind although I have not looked that up.) The overload for sigc::slot must still be retained, even though the function template could take it, as otherwise ABI compatibility would be broken - the template function is not compiled into the library, so a sigc::slot version is still required which is compiled in. Taking a template argument is more efficient than using std::function. std::function requires an additional allocation and an additional virtual dispatch. That is why the equivalent C++11 library functions always take their callable objects as a template parameter not as a std::function object, and that is very much the recommended approach. std::function is really for cases where you cannot for some reason or other use static polymorphism via templates. On comment #36 I would much prefer it if you were to make the changes concerned. We both agree what they are.
Actually I now see the fundamental problem with my suggestion in comment #41: you need to do something with the object pointed to by the data argument of g_thread_new(), and you cannot if you do not know what it is. So use std::function, as you suggest.
I suppose I ought to add for completeness that, as I am sure you know, you can get around the g_thread_new() problem using type erasure, which is almost certainly what std::thread does if handing off to pthread_create(), and what std::function does internally, viz: struct Eraser { virtual void exec() = 0; // not const so we can execute mutable lambdas }; template <class F> class Wrapper: public Eraser { public: void exec() {func();} template <class Arg> Wrapper(Arg&& arg): func(std::forward<Arg>(arg)) {} private: F func; }; template <class F> Eraser* make_eraser(F&& func) {return new Wrapper<F>{std::forward<F>(func)};} Glib::Threads::Thread::create() can then call make_eraser() and pass the return value to g_thread_new()'s data argument. g_thread_new()'s thread function can then call static_cast<Eraser*>(data)->exec(). This will save one allocation and one level of indirection compared with using std::function to do the type erasure for you, and is probably not worth the effort. E&OE on the example code: it is based on some of my own classes but not copy-and-pasted, so may contain typos.
I have pushed your new chapter to the gtkmm tutorial with the changes we agreed on. http://git.gnome.org/browse/gtkmm-documentation/commit/?id=3ba65250fb2313ba2662b55f0e23b136ee73c0ff The second sentence in comment 39 (If you have commit access I should make them) made me believe that you wanted to do the changes yourself. Perhaps I misunderstood. When I add static Thread* create(const std::function<void()>& func, const std::string& name = std::string()); normal calls to Glib::Threads::Thread::create() become ambiguous. This method must probably have a different name.
Ah, the ambiguities of modal auxiliaries used to construct subjunctive expressions in English. My apologies. On the second ambiguity (overload resolution), in glibmm-2.34.1, Glib::Threads::Thread::create() has no second argument. I guess you are using the development version in git. However, I do not think that affects your issue. I cannot see how an ambiguity arises if you actually pass a std::function object since there is then a perfect match, but perhaps you are passing a lambda expression or the result of std::bind expecting implicit conversion, in which case there would be an ambiguity if sigc::slot has a conversion operator for these also. The latest libsigc++ documentation says that it "supports arbitrary functor types", and inspecting the source shows that sigc::slot does indeed now have an overload for its constructor which is templated and takes any callable object (which is not dissimilar to the approach I mentioned above for callable objects). Hence the ambiguity. In which case a change of name may be inevitable, since it would be desirable to permit implicit conversion from a lambda or std::bind-produced function object to the std::function argument. What a bummer. However, the template and type erasure approach may provide the answer: on overload resolution, I think a template match takes precedence over a template match with implicit conversion. If that is right, then the constructor taking any arbitrary function object would be preferred except where a sigc::slot object is passed (for a sigc::slot object the non-template match is preferred over a template match), so that would appear to solve the problem without having to rename the function. By the way, if you do do it that way and so use type erasure, my Eraser class should have a virtual destructor: my example omitted that by mistake. It would also be trivial to convert the type eraser to using reference to const for argument passing instead of collapsible references in order to cover C++98/03, albeit at the expense of losing the possibility of move construction of rvalues in C++11 to avoid the copy operation. If you would like to test this approach and need help in implementing this, let me know.
A further point arises here. Since sigc::slot now takes any function object for its argument, it means presumably that in C++11 any slot can be constructed safely for a non-static member function of an object deriving from sigc::trackable by using std::bind instead of sigc::mem_fun, so as to eliminate the tracking. It may be that that is what users should be recommended to do. This means changing the new text in the gtkmm tutorial so that: * instead of referring to slots representing non-static member functions of classes deriving from sigc::trackable, it refers to slots constructed with sigc::mem_fun representing non-static member functions of classes deriving from sigc::trackable. * it points out that the problems with the thread safety of sigc::trackable can be avoided in C++11 by constructing the slot by using std::bind or a lambda expression. This probably means it is not necessary to add further overloads for Glib::Threads::Thread::create(). But this would need to be tested first.
Created attachment 236487 [details] Test case for std::bind and sigc::trackable Here is a test case which shows that std::bind appears to do all that is necessary to work around sigc::trackable. In this test case, the third printed call to Test::non_static_func() is technically undefined behaviour, because the Test object has been destroyed at that point, but that doesn't in practice matter because the method does not operate on any data members of Test. It demonstrates that trackability is ignored where std::bind is employed.
Created attachment 236488 [details] Test case for std::bind and Glib::Threads::Thread::create() And here for completeness is a test case for std::bind used with Glib::Threads::Thread::create(). On comment #40, I was surprised when you mentioned that slots for static member functions could be created with sigc::mem_fun, and when I tried that in the previous test case it won't compile (so it is commented out). Are you sure you are right about that? I think your added paragraph on this should perhaps be omitted. If you would like me to amend the tutorial text now in git to cover std::bind, let me know. If so, can you give me the tutorial's git repository address for cloning?
Created attachment 236589 [details] Modified test cases for std::bind and lambda expressions. (In reply to comment #45) > Glib::Threads::Thread::create() has no second argument. I guess you are using > the development version in git. Yes, I'm using the development version in git. It has a second create() method with two arguments. (Bug 689863) > I cannot see how an ambiguity > arises if you actually pass a std::function object I did not pass a std::function object. I compiled the example program http://git.gnome.org/browse/glibmm/tree/examples/hread/thread.cc, with the calls Glib::Threads::Thread *const producer = Glib::Threads::Thread::create( sigc::mem_fun(queue, &MessageQueue::producer)); Glib::Threads::Thread *const consumer = Glib::Threads::Thread::create( sigc::mem_fun(queue, &MessageQueue::consumer)); We must not make changes that break such programs. (In reply to comment #46) > Since sigc::slot now takes any function object for > its argument, it means presumably that in C++11 any slot can be constructed > safely for a non-static member function of an object deriving from > sigc::trackable by using std::bind instead of sigc::mem_fun, so as to > eliminate the tracking. > > It may be that that is what users should be recommended to do. Yes, I think this is a good idea. Even if your type erasure in comment 43 should work, it's too much job for this problem. Next time ABI can be broken in glibmm, all supported compilers probably understand std::function, and then the present create() methods can be replaced with static Thread* create(const std::function<void()>& func, const std::string& name = std::string()); However I've experimented with your sigc.cpp program, and I noticed a problem with std::bind() when combined with the SIGC_FUNCTORS_DEDUCE_RESULT_TYPE_WITH_DECLTYPE macro, which is necessary if the program connects std::bind() expressions or C++11 lambda expressions to a sigc::slot that does not return void. The program did not compile. I don't know why. I've seen similar problems in the test program http://git.gnome.org/browse/libsigc++2/tree/tests/test_cpp11_lambda.cc. Some calls with std::bind() are commented out there, because they did not compile. C++11 lambda expressions can be used in such cases. Both possibilities (std::bind() and lambda expressions) should be mentioned in the gtkmm tutorial and in the documentation of Glib::Threads::Thread::create(). I've attached modified versions of your sigc.cpp and thread.cpp files. (In reply to comment #48) > I was surprised when you mentioned that slots for static member > functions could be created with sigc::mem_fun. I was wrong. I had not tested it. I assumed that because static and non-static methods can be called as test->any_method(), sigc::mem_fun() would not notice any difference between a static and a non-static method. But that's wrong. Even if the syntax can be the same, the calls are not identical. > I think your added paragraph on this should perhaps be omitted. Yes, it shall be omitted. > If you would like me to amend the tutorial text now in git to cover > std::bind, let me know. If so, can you give me the tutorial's git repository > address for cloning? git clone git://git.gnome.org/gtkmm-documentation If you have a git account: git clone ssh://[login@]git.gnome.org/git/gtkmm-documentation
Created attachment 236660 [details] [review] Tutorial patch for std::bind/boost::bind Here is a patch to cover the use of std::bind() and boost::bind() to safely call Glib::Threads::Thread::create(). I notice that the example file you refer to (thread.cc) displays the very incorrect code that this documentation warns about. It starts threads for the MessageQueue::producer() and MessageQueue::consumer() methods from slots constructed with sigc::mem_fun(), despite MessageQueue deriving from sigc::trackable. It shows how insidious this problem is: it has no doubt passed the inspection of several experienced programmers, yet is not thread safe.
I should say that I could not get git's gtkmm-documentation to build. It seems to produce a defective configure file (perhaps my installed autoconf is insufficiently recent). It would be helpful if you could do a test build to make sure it is syntactically correct.
Created attachment 236809 [details] [review] patch: Amend the "Multi-threaded programs" chapter. No problem with your patch. Still, I've made two small changes: 1. Mention C++11 lambda expressions as a third alternative. 2. Modified the paragraph on the connect*_once() methods. If I'm not mistaken, they can also be made thread-safe by avoiding sigc::mem_fun(). If you have no objections, I'll push this patch. I'll also change the documentation of Glib::Threads::Thread::create() and the connect*_once() methods, and remove the inheritance from sigc::trackable in http://git.gnome.org/browse/glibmm/tree/examples/thread/thread.cc. Then, at last, I can close this bug.
Yes that is excellent: it looks fine. Thanks for your help on that. You might want to add yourself to the authors, but I leave that to you. You could leave the bug open to remind yourself to move to std::function on an API/ABI break, but again that is a matter for you.
I've pushed the patch in comment 52, and a patch that updates the description of Glib::Threads::Thread::create(). The glibmm patch also adds a TODO comment in threads.hg, recommending a change to std::function at the next ABI break. I think that's enough of a reminder, so I close this bug. The descriptions of Glib::SignalTimeout::connect(), connect_once(), etc. should also be updated, but that's more closely related to bug 396958 and bug 396963.
Great. The description of Glib::ThreadPool::push() also needs to be updated. The same rule applies to that function as to Glib::Threads::Thread::create() and *connect_once() and so forth.
I've updated the description of Glib::ThreadPool::push(). http://git.gnome.org/browse/glibmm/commit/?id=52009d77cc4247ce40f0bb4caaeab69d8bb983f4
Please see https://bugzilla.gnome.org/show_bug.cgi?id=755091 about updating this for C++11 instead of using Glib::Threads::*.
Also, now that C++11 makes it easier, maybe we should start an ABI-breaking libsigc++ that is thread-safe, so people don't need to create their own. I forget the name(s) of the thread-safe versions of libsigc++ that are out there.
For reference: https://mail.gnome.org/archives/libsigc-list/2014-September/msg00001.html "[sigc] remarks on using (or not using) sigc++"
Comment #58: There is no thread safe version of libsigc++ out there. (There is an old now unmaintained library called libSigCX, aka libSigC++ extras, based I believe on libsigc++-1.2, but it was not really thread safe - it suffered from the same deficiencies as libsigc++. What it in effect did was to combine libsigc++ with a Glib::Dispatcher type interface to execute connected slots in a program's main loop.) The only well known thread-safe signal/slot implementation with trackability which is out there is boost::signals2. The documentation claims that boost::signals2::signal and boost::signals2::connection and its associated trackability are thread safe, although Paul Davis said in the attachment to comment #59 that he found that some aspects were not thread safe after all (note: boost::signals is not thread safe, but boost::signals2 says it is). Leaving that aside boost::signals2 has a reputation for being somewhat heavyweight, and its major deficiency in my view is that, where a slot represents an object method, then for a connection to be trackable it requires the object concerned to be constructed on free store. There are various mini-implementations out there. Paul Davis said he has written one for ardour. I have a minimal one with SafeEmitter and Releaser ( http://cxx-gtk-utils.sourceforge.net/2.2/emitter_8h.html ). When I last looked at libsigc++ to think about how to resolve this issue, I came to the conclusion that making it thread safe would require a complete redesign, and in particular trackability should be a property of a particular connection and not of the object from which a slot happens to be created.