After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 620099 - Make it safe to call into Glib from foreign threads
Make it safe to call into Glib from foreign threads
Status: RESOLVED FIXED
Product: gnome-perl
Classification: Bindings
Component: Glib
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtk2-perl-bugs
gtk2-perl-bugs
: 668713 (view as bug list)
Depends on:
Blocks: 614650
 
 
Reported: 2010-05-30 15:11 UTC by Torsten Schoenfeld
Modified: 2012-02-19 15:28 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Fix a threading bug in dGPERL_CLOSURE_MARSHAL_ARGS (2.00 KB, patch)
2010-05-30 15:17 UTC, Torsten Schoenfeld
none Details | Review
Clone the master perl interpreter if called from another thread (6.29 KB, patch)
2010-05-30 15:17 UTC, Torsten Schoenfeld
none Details | Review
Turn the macro GPERL_ENSURE_CONTEXT into a function (6.34 KB, patch)
2010-05-30 15:17 UTC, Torsten Schoenfeld
none Details | Review
Free interpreter clones when their threads exit (2.23 KB, patch)
2010-05-30 15:18 UTC, Torsten Schoenfeld
none Details | Review
Make the GObject ↔ SV association thread-safe (6.47 KB, patch)
2010-05-30 15:18 UTC, Torsten Schoenfeld
none Details | Review
Make signal marshalling thread-safe (5.33 KB, patch)
2011-05-15 15:46 UTC, Torsten Schoenfeld
committed Details | Review
test case exhibing the hanging I talked about in comment #8 (761 bytes, application/x-perl)
2011-06-07 16:34 UTC, Quentin Sculo
  Details

Description Torsten Schoenfeld 2010-05-30 15:11:12 UTC
Currently, when a foreign thread calls into Glib (e.g., invoking a signal handler or a callback), we use PERL_SET_CONTEXT to install a pointer to the main thread's perl interpreter into the current thread's local storage.  A few bug reports and some experimentation suggest that this is not safe: one cannot use a single perl interpreter from multiple threads concurrently.

I'll attach a set of patches that work towards fixing this by creating a clone of the master interpreter whenever we are called from a new foreign thread.  In my limited testing, they seem to work fine.  But they are not finished yet.  For example, one of the changes is pthread-specific, and that might be problematic.  But more importantly, I'm not convinced yet that we really want to take this approach, because creating clones is really expensive.

Opinions appreciated.
Comment 1 Torsten Schoenfeld 2010-05-30 15:17:03 UTC
Created attachment 162317 [details] [review]
Fix a threading bug in dGPERL_CLOSURE_MARSHAL_ARGS

Previously, dGPERL_CLOSURE_MARSHAL_ARGS used the dSP macro to declare
the stack pointer.  But dSP involves accessing the my_perl pointer which
might be invalid if we've been called from a thread other than the main
one.  To avoid this, manually declare SV **sp first and let
GPERL_CLOSURE_MARSHAL_INIT initialize it after it has set up my_perl via
PERL_SET_CONTEXT.
Comment 2 Torsten Schoenfeld 2010-05-30 15:17:24 UTC
Created attachment 162318 [details] [review]
Clone the master perl interpreter if called from another thread

Rename GPERL_SET_CONTEXT to GPERL_ENSURE_CONTEXT and make it clone the master
interpreter if we find that the current thread doesn't have an interpreter
associated with it.  Previously, we would just install the master interpreter
itself.  And according to L<perlguts/"Should I do anything special if I call
perl from multiple threads?">, this should be all there is to it.  But
apparently, this is not safe if we actually call into the interpreter
concurrently.  The discussion at
<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2005-12/msg00452.html>
suggests that we need to create a clone.  And when a thread already has an
interpreter installed, don't replace it.  We cannot use one thread's
interpreter in another thread.

To make this have an effect on closure invocation, change
GPERL_CLOSURE_MARSHAL_INIT to use GPERL_ENSURE_CONTEXT instead of
PERL_SET_CONTEXT.

Also, add a few debug prints to the CLONE machinery in Glib::Object.
Comment 3 Torsten Schoenfeld 2010-05-30 15:17:36 UTC
Created attachment 162320 [details] [review]
Turn the macro GPERL_ENSURE_CONTEXT into a function

Turn the macro GPERL_ENSURE_CONTEXT into the function gperl_ensure_context, and
make it accept NULL as an argument to indicate that the master interpreter
should be used.  This allows us to make the _gperl_set_master_interp and
_gperl_get_master_interp static.

Also, make the thread-safety code conditional on !GPERL_DISABLE_THREADSAFE.
Comment 4 Torsten Schoenfeld 2010-05-30 15:18:06 UTC
Created attachment 162321 [details] [review]
Free interpreter clones when their threads exit
Comment 5 Torsten Schoenfeld 2010-05-30 15:18:21 UTC
Created attachment 162322 [details] [review]
Make the GObject ↔ SV association thread-safe

Instead of associating to every GObject just one SV, use a hash table of
perl interpreter → SV pairs so that we can store and hand out a
thread-specific SV wrapper.

Previously, when running in a new thread, we would still hand out the SV
wrapper that was created in the main thread and which thus belongs to the main
thread's interpreter.
Comment 6 Torsten Schoenfeld 2011-05-15 15:46:35 UTC
Created attachment 187851 [details] [review]
Make signal marshalling thread-safe

When a signal that has Perl handlers is emitted from a non-main non-Perl
thread, we used to simply use PERL_SET_CONTEXT to associated this thread
with the main Perl interpreter.  But since it is not safe to call into
one Perl interpreter concurrently, this does not work.

Instead we now, upon noticing that we are being called from a thread
without asoociated Perl interpreter, hand the marshalling request over
to the main loop which in turn later wakes up the main thread and lets
it handle the request.

dGPERL_CLOSURE_MARSHAL_ARGS also needed to be adjusted not to use dSP
since that dereferences the current thread's Perl interpreter without
NULL check.
Comment 7 Torsten Schoenfeld 2011-05-15 15:52:57 UTC
Alright, I've finally convinced myself that my previous approach cannot work.

So I caved in and implemented what muppet has been suggesting all along: when invoked from a non-main thread, hand the request over to the main loop so that it gets executed on the main thread.  This works fine in the following common case:

* main thread connects handler to a signal, then goes into main loop
* non-main thread gets created somewhere to do some work, then emits said signal
* our marshallers hands the request over to the main loop
* main loop executes the request on the main thread

It does not work, however, if the main thread is busy doing something else.  In this case, the main loop does not run, and so the signal invocation is blocked until the main thread returns control to the main loop.

The attached patch, still containing some debug spew and FIXME question, implements this approach in our normal signal marshaller, gperl_closure_invoke.  It does not fix custom signal marshallers, or stuff that uses direct callbacks.

Comments, criticism, suggestions welcome.
Comment 8 Quentin Sculo 2011-05-18 20:07:43 UTC
Thanks a lot.
I did only a few tests so far, but it seems good enough for using playbin2 and the about_to_finish signal in my app.

I did a few quick tests with other sources of crashes related to gstreamer, and it is much better, for example, computing replaygain info in my app using the gstreamer plugin was extremely unstable before, with this patch it looks very stable.

I also tested connecting to the notify signal of the playbin2 gstreamer element to get a signal when its volume property change (the volume of the playbin2 element is linked to the master volume when using pulseaudio), and though a simple callback as {warn "@_"} seems to work (it crashes when changing the volume without the patch), a more elaborate callback such as :
 my ($object,$property)=@_;
 my $name=$property->get_name;
 warn "$object property $name changed : ".$object->get($name)."\n";
causes the app to hang when it starts playing (it crashes without the patch)
after printing a few "property changed" messages, it ends with the "*** GPerl asked to invoke callback from another thread, handing callback over to the main loop" message from the patch.

I also got at least 2 similar hanging while testing the visual gstreamer plugin (plugin that does some drawing with the music)

So it is a great improvement, though it seems there is still some cases where it will hang. I don't know if it is possible to fix that, but even as is I hope it will be committed soon.
Comment 9 Torsten Schoenfeld 2011-06-05 11:49:13 UTC
(In reply to comment #8)
> I also tested connecting to the notify signal of the playbin2 gstreamer element
> to get a signal when its volume property change (the volume of the playbin2
> element is linked to the master volume when using pulseaudio), and though a
> simple callback as {warn "@_"} seems to work (it crashes when changing the
> volume without the patch), a more elaborate callback such as :
>  my ($object,$property)=@_;
>  my $name=$property->get_name;
>  warn "$object property $name changed : ".$object->get($name)."\n";
> causes the app to hang when it starts playing (it crashes without the patch)
> after printing a few "property changed" messages, it ends with the "*** GPerl
> asked to invoke callback from another thread, handing callback over to the main
> loop" message from the patch.
> 
> I also got at least 2 similar hanging while testing the visual gstreamer plugin
> (plugin that does some drawing with the music)

Can you provide test programs that exhibit these hangs?
Comment 10 Quentin Sculo 2011-06-07 16:34:17 UTC
Created attachment 189418 [details]
test case exhibing the hanging I talked about in comment #8

I managed to reproduce it in this small hang.pl script that simply play a file given on the command line.
I tested this on a xubuntu 11.04, the hanging happens every time with the more elaborated notify callback and with the line setting the volume to 1. If any of these are removed, everything works correctly.

example output :
GStreamer::Pipeline=HASH(0xa248c00) volume : 1
Use of uninitialized value in concatenation (.) or string at hang.pl line 14.
GStreamer::Pipeline=HASH(0xa248c00) uri : 
Playing: /home/squentin/Music/Miles Davis/Kind of Blue/01 - So What.ogg
GStreamer::Pipeline=HASH(0xa248c00) source : Glib::Object::_Unregistered::GstFileSrc=HASH(0xa248f60)
*** GPerl asked to invoke callback from another thread, handing callback over to the main loop

Without the patch it doesn't hang or crash, but with an earlier and more complex version of this script (that printed the time in a gtk window) I once got a segmentation fault.
Comment 11 Torsten Schoenfeld 2012-02-19 14:39:14 UTC
(In reply to comment #10)
> I managed to reproduce it in this small hang.pl script that simply play a file
> given on the command line.
> I tested this on a xubuntu 11.04, the hanging happens every time with the more
> elaborated notify callback and with the line setting the volume to 1. If any of
> these are removed, everything works correctly.

I can reproduce the hang, and that's why I've been holding off on this patch for so long.  The basic outline of this kind of hang is the following:

• There are two threads involved: the main thread running the event loop and all Perl code, and an auxiliary thread constructed by gstreamer to manage the audio stream.

• The call "$player->set(volume => 1)" reconfigures part of the pipeline and causes the auxiliary thread to change the "volume" property of the internal sink element while holding a lock.  Among other things this triggers the "notify::volume" signal, for which we have a handler.  But since this is in a non-main thread, the patch shovels the invocation over to the main loop and /blocks the auxiliary thread until the signal handler has returned/.

• The signal handler calls get("volume") which ends up in the sink element's volume accessor, which also tries to acquire the lock mentioned above.  Deadlock.

A way to avoid the deadlock is to wrap in Glib::Idle->add those parts of the signal handler that access object properties:

  $player->signal_connect(notify => sub {
    my ($object,$property)=@_;
    Glib::Idle->add (sub {
      my $name=$property->get_name;
      warn "$object $name : ".$object->get($name)."\n";
      Glib::SOURCE_REMOVE;
    });
  });

Since this workaround exists, the patch fixes otherwise unsolvable problems while introducing new but solvable problems.  So I'll try to get the patch into Glib today.
Comment 12 Torsten Schoenfeld 2012-02-19 15:25:11 UTC
Alright, slightly modified patch committed.  Please test extensively.
Comment 13 Torsten Schoenfeld 2012-02-19 15:28:30 UTC
*** Bug 668713 has been marked as a duplicate of this bug. ***