GNOME Bugzilla – Bug 620099
Make it safe to call into Glib from foreign threads
Last modified: 2012-02-19 15:28:30 UTC
Currently, when a foreign thread calls into Glib (e.g., invoking a signal handler or a callback), we use PERL_SET_CONTEXT to install a pointer to the main thread's perl interpreter into the current thread's local storage. A few bug reports and some experimentation suggest that this is not safe: one cannot use a single perl interpreter from multiple threads concurrently. I'll attach a set of patches that work towards fixing this by creating a clone of the master interpreter whenever we are called from a new foreign thread. In my limited testing, they seem to work fine. But they are not finished yet. For example, one of the changes is pthread-specific, and that might be problematic. But more importantly, I'm not convinced yet that we really want to take this approach, because creating clones is really expensive. Opinions appreciated.
Created attachment 162317 [details] [review] Fix a threading bug in dGPERL_CLOSURE_MARSHAL_ARGS Previously, dGPERL_CLOSURE_MARSHAL_ARGS used the dSP macro to declare the stack pointer. But dSP involves accessing the my_perl pointer which might be invalid if we've been called from a thread other than the main one. To avoid this, manually declare SV **sp first and let GPERL_CLOSURE_MARSHAL_INIT initialize it after it has set up my_perl via PERL_SET_CONTEXT.
Created attachment 162318 [details] [review] Clone the master perl interpreter if called from another thread Rename GPERL_SET_CONTEXT to GPERL_ENSURE_CONTEXT and make it clone the master interpreter if we find that the current thread doesn't have an interpreter associated with it. Previously, we would just install the master interpreter itself. And according to L<perlguts/"Should I do anything special if I call perl from multiple threads?">, this should be all there is to it. But apparently, this is not safe if we actually call into the interpreter concurrently. The discussion at <http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2005-12/msg00452.html> suggests that we need to create a clone. And when a thread already has an interpreter installed, don't replace it. We cannot use one thread's interpreter in another thread. To make this have an effect on closure invocation, change GPERL_CLOSURE_MARSHAL_INIT to use GPERL_ENSURE_CONTEXT instead of PERL_SET_CONTEXT. Also, add a few debug prints to the CLONE machinery in Glib::Object.
Created attachment 162320 [details] [review] Turn the macro GPERL_ENSURE_CONTEXT into a function Turn the macro GPERL_ENSURE_CONTEXT into the function gperl_ensure_context, and make it accept NULL as an argument to indicate that the master interpreter should be used. This allows us to make the _gperl_set_master_interp and _gperl_get_master_interp static. Also, make the thread-safety code conditional on !GPERL_DISABLE_THREADSAFE.
Created attachment 162321 [details] [review] Free interpreter clones when their threads exit
Created attachment 162322 [details] [review] Make the GObject ↔ SV association thread-safe Instead of associating to every GObject just one SV, use a hash table of perl interpreter → SV pairs so that we can store and hand out a thread-specific SV wrapper. Previously, when running in a new thread, we would still hand out the SV wrapper that was created in the main thread and which thus belongs to the main thread's interpreter.
Created attachment 187851 [details] [review] Make signal marshalling thread-safe When a signal that has Perl handlers is emitted from a non-main non-Perl thread, we used to simply use PERL_SET_CONTEXT to associated this thread with the main Perl interpreter. But since it is not safe to call into one Perl interpreter concurrently, this does not work. Instead we now, upon noticing that we are being called from a thread without asoociated Perl interpreter, hand the marshalling request over to the main loop which in turn later wakes up the main thread and lets it handle the request. dGPERL_CLOSURE_MARSHAL_ARGS also needed to be adjusted not to use dSP since that dereferences the current thread's Perl interpreter without NULL check.
Alright, I've finally convinced myself that my previous approach cannot work. So I caved in and implemented what muppet has been suggesting all along: when invoked from a non-main thread, hand the request over to the main loop so that it gets executed on the main thread. This works fine in the following common case: * main thread connects handler to a signal, then goes into main loop * non-main thread gets created somewhere to do some work, then emits said signal * our marshallers hands the request over to the main loop * main loop executes the request on the main thread It does not work, however, if the main thread is busy doing something else. In this case, the main loop does not run, and so the signal invocation is blocked until the main thread returns control to the main loop. The attached patch, still containing some debug spew and FIXME question, implements this approach in our normal signal marshaller, gperl_closure_invoke. It does not fix custom signal marshallers, or stuff that uses direct callbacks. Comments, criticism, suggestions welcome.
Thanks a lot. I did only a few tests so far, but it seems good enough for using playbin2 and the about_to_finish signal in my app. I did a few quick tests with other sources of crashes related to gstreamer, and it is much better, for example, computing replaygain info in my app using the gstreamer plugin was extremely unstable before, with this patch it looks very stable. I also tested connecting to the notify signal of the playbin2 gstreamer element to get a signal when its volume property change (the volume of the playbin2 element is linked to the master volume when using pulseaudio), and though a simple callback as {warn "@_"} seems to work (it crashes when changing the volume without the patch), a more elaborate callback such as : my ($object,$property)=@_; my $name=$property->get_name; warn "$object property $name changed : ".$object->get($name)."\n"; causes the app to hang when it starts playing (it crashes without the patch) after printing a few "property changed" messages, it ends with the "*** GPerl asked to invoke callback from another thread, handing callback over to the main loop" message from the patch. I also got at least 2 similar hanging while testing the visual gstreamer plugin (plugin that does some drawing with the music) So it is a great improvement, though it seems there is still some cases where it will hang. I don't know if it is possible to fix that, but even as is I hope it will be committed soon.
(In reply to comment #8) > I also tested connecting to the notify signal of the playbin2 gstreamer element > to get a signal when its volume property change (the volume of the playbin2 > element is linked to the master volume when using pulseaudio), and though a > simple callback as {warn "@_"} seems to work (it crashes when changing the > volume without the patch), a more elaborate callback such as : > my ($object,$property)=@_; > my $name=$property->get_name; > warn "$object property $name changed : ".$object->get($name)."\n"; > causes the app to hang when it starts playing (it crashes without the patch) > after printing a few "property changed" messages, it ends with the "*** GPerl > asked to invoke callback from another thread, handing callback over to the main > loop" message from the patch. > > I also got at least 2 similar hanging while testing the visual gstreamer plugin > (plugin that does some drawing with the music) Can you provide test programs that exhibit these hangs?
Created attachment 189418 [details] test case exhibing the hanging I talked about in comment #8 I managed to reproduce it in this small hang.pl script that simply play a file given on the command line. I tested this on a xubuntu 11.04, the hanging happens every time with the more elaborated notify callback and with the line setting the volume to 1. If any of these are removed, everything works correctly. example output : GStreamer::Pipeline=HASH(0xa248c00) volume : 1 Use of uninitialized value in concatenation (.) or string at hang.pl line 14. GStreamer::Pipeline=HASH(0xa248c00) uri : Playing: /home/squentin/Music/Miles Davis/Kind of Blue/01 - So What.ogg GStreamer::Pipeline=HASH(0xa248c00) source : Glib::Object::_Unregistered::GstFileSrc=HASH(0xa248f60) *** GPerl asked to invoke callback from another thread, handing callback over to the main loop Without the patch it doesn't hang or crash, but with an earlier and more complex version of this script (that printed the time in a gtk window) I once got a segmentation fault.
(In reply to comment #10) > I managed to reproduce it in this small hang.pl script that simply play a file > given on the command line. > I tested this on a xubuntu 11.04, the hanging happens every time with the more > elaborated notify callback and with the line setting the volume to 1. If any of > these are removed, everything works correctly. I can reproduce the hang, and that's why I've been holding off on this patch for so long. The basic outline of this kind of hang is the following: • There are two threads involved: the main thread running the event loop and all Perl code, and an auxiliary thread constructed by gstreamer to manage the audio stream. • The call "$player->set(volume => 1)" reconfigures part of the pipeline and causes the auxiliary thread to change the "volume" property of the internal sink element while holding a lock. Among other things this triggers the "notify::volume" signal, for which we have a handler. But since this is in a non-main thread, the patch shovels the invocation over to the main loop and /blocks the auxiliary thread until the signal handler has returned/. • The signal handler calls get("volume") which ends up in the sink element's volume accessor, which also tries to acquire the lock mentioned above. Deadlock. A way to avoid the deadlock is to wrap in Glib::Idle->add those parts of the signal handler that access object properties: $player->signal_connect(notify => sub { my ($object,$property)=@_; Glib::Idle->add (sub { my $name=$property->get_name; warn "$object $name : ".$object->get($name)."\n"; Glib::SOURCE_REMOVE; }); }); Since this workaround exists, the patch fixes otherwise unsolvable problems while introducing new but solvable problems. So I'll try to get the patch into Glib today.
Alright, slightly modified patch committed. Please test extensively.
*** Bug 668713 has been marked as a duplicate of this bug. ***