GNOME Bugzilla – Bug 624338
[pulsesink] Handle pulse context separate from the ringbuffers and share them
Last modified: 2010-09-13 18:00:30 UTC
Created attachment 165873 [details] stack trace If I use 2 playbin2 in the same process and if PulseAudio is not installed, a double-free exception occurs. It doesn't happen 100% of the time. Try the attached test-case with a URI in parameter.
Created attachment 165874 [details] test-case
Also reported on PA Trac: http://pulseaudio.org/ticket/841
Sometimes I also get this one. Registration of already existing GstPulseSinkRingBuffer: test2.py:18: Warning: cannot register existing type `GstPulseSinkRingBuffer' loop.run() GLib-GObject-CRITICAL **: g_object_new: assertion `G_TYPE_IS_OBJECT (object_type)' failed aborting... Program received signal SIGABRT, Aborted.
+ Trace 222852
Thread 140737152808720 (LWP 20355)
Yes, type registration of the ring buffer is not threadsafe. Patch coming soon
Created attachment 165945 [details] [review] pulsesink: use G_TYPE_DEFINE
Out of curiosity: why doesn't the g_type_class_ref (GST_TYPE_PULSERING_BUFFER) in gst_pulseringbuffer_class_init() take care of this already?
(In reply to comment #6) > Out of curiosity: why doesn't the g_type_class_ref (GST_TYPE_PULSERING_BUFFER) > in gst_pulseringbuffer_class_init() take care of this already? In a normal case get_type() is first called and then class_init() triggers a new call to get_type() via that g_type_class_ref (GST_TYPE_PULSERING_BUFFER), right? I think the race is occurring while the 2 threads are calling the first get_type() so the g_type_class_ref() happens too late. I'm quite new to this, I'm probably all wrong anyway ;)
D'oh. That class_ref() should've been in gst_pulsesink_class_init() of course, not gst_pulseringbuffer_class_init() :)
Created attachment 165956 [details] [review] pulsesink: use G_TYPE_DEFINE The previous get_type() implementation was racy. Also removed the g_type_class_ref() hack which was misplaced anyway ;) See bug 624338
Hi there, Does the above patch fix this bug completely or just the "Registration of already existing GstPulseSinkRingBuffer:" bit of it?
Committed this, but changed the commit message: commit 864a52d8aa938fbc308a5bf027cbc4bd8a23d45e Author: Philippe Normand <pnormand@igalia.com> Date: Thu Jul 15 10:10:31 2010 +0200 pulsesink: use G_TYPE_DEFINE to define ring buffer type The existing get_type() implementation is racy, and the g_type_class_ref() workaround didn't actually work because it was in the wrong function. Since class creation in GObject is thread-safe these days (since 2.16), the class_ref workaround is no longer needed and it is sufficient to ensure the _get_type() function is thread-safe, which G_TYPE_DEFINE does. https://bugzilla.gnome.org/show_bug.cgi?id=624338 The only reason this is actually ok as-is is that now class creation in gobject is thread-safe, which wasn't the case when the g_type_class_ref() was added (IIRC) ;-) Setting NEEDINFO for answer to comment #10, although the stack trace looks like there is something else going on.
Philipe said on IRC that this doesn't fix the real problem
I will check again today if I can reproduce this bug but IIRC the committed patch was only a first step ;)
Created attachment 167933 [details] [review] pulsesink: share the PA context between all clients with the same name Avoid to create a new PA context for each new client by using a hash table containing the list of ring-buffers and the shared PA context for each client. Doing this will improve application memory usage in the cases where multiple pipelines involving multiple pulsesink elements are used. Fixes bug #624338.
Looks good in general but you should protected the hash table with a mutex... everywhere. Then it might make sense to add a property to set the pulse client name. E.g. in webkit you might want to have all pulsesinks of the same tab in the same pulse client but not all pulsesinks of all tabs. Also, gst_pulse_client_name() should maybe be improved and return something like "GStreamer ($PID)" when the other two ways to get a name fail.
(In reply to comment #15) > Looks good in general but you should protected the hash table with a mutex... > everywhere. > Right, I did it only at one place, sorry ;) > Then it might make sense to add a property to set the pulse client name. E.g. > in webkit you might want to have all pulsesinks of the same tab in the same > pulse client but not all pulsesinks of all tabs. > Good idea indeed, could that go in a separate bug/patch? I'd be fine with doing it here but the current patch is already big I think. > Also, gst_pulse_client_name() should maybe be improved and return something > like "GStreamer ($PID)" when the other two ways to get a name fail. Ok, will do that in next iteration of the patch.
Awesome, thanks for your work here guys. I've already tested the first version and I'm quietly hoping it'll help in some KDE Phonon related bugs I have that seem to relate to similar cases... perhaps they wont, but I live in hope and it seemed to help in my test setup. I look forward to the next iteration of the patch :)
(In reply to comment #16) > > Then it might make sense to add a property to set the pulse client name. E.g. > > in webkit you might want to have all pulsesinks of the same tab in the same > > pulse client but not all pulsesinks of all tabs. > > > > Good idea indeed, could that go in a separate bug/patch? I'd be fine with doing > it here but the current patch is already big I think. No, do it as a separate patch. The patch, after fixing the thread safety issue, is already big enough as you said. I always prefer many small patches over one large. > > Also, gst_pulse_client_name() should maybe be improved and return something > > like "GStreamer ($PID)" when the other two ways to get a name fail. > > Ok, will do that in next iteration of the patch. But do it as a separate patch too please. It's related to this change but not really the same
Created attachment 167977 [details] [review] pulsesink: share the PA context between all clients with the same name Avoid to create a new PA context for each new client by using a hash table containing the list of ring-buffers and the shared PA context for each client. Doing this will improve application memory usage in the cases where multiple pipelines involving multiple pulsesink elements are used. Fixes bug #624338.
Hopefully the thread-safety issues are fixed now, please test :)
Created attachment 167980 [details] [review] pulsesink: share the PA context between all clients with the same name Avoid to create a new PA context for each new client by using a hash table containing the list of ring-buffers and the shared PA context for each client. Doing this will improve application memory usage in the cases where multiple pipelines involving multiple pulsesink elements are used. Fixes bug #624338. Added a gst_pulsering_get_context() function to avoid copy/pastes.
Hmm, I'm not sure it's related, but if I apply this patch, then I can trigger the assertion in the gst_pulsesink_change_state() function: switch (transition) { case GST_STATE_CHANGE_NULL_TO_READY: g_assert (pulsesink->mainloop == NULL); By sharing contexts, are mainloops somehow shared too? (or are they meant to be)? If you are on IRC, I'm coling, I'll try and be around this evening if you want to debug (or if this is totally unrelated and just triggered as a coincidence, you can tell me to stop bothering you :D)
Can you please post a test-case triggering this? If not it'd be nice to at least have a stack-trace. I didn't change anything related to the mainloop, only the pa_context should be shared. So I'd be happy to debug this potential regression, tomorrow though ;)
I'll post a proper stack trace in a bit - I'm using a slightly patched up version of 0.10.24 (your two patches on top) so I'll do it from git master to be sure of a clean line number reference etc. The test case is trivial but the loops it jumps through are not! I'm technically using the command: kdialog --error "Foo" & kdialog --error "Bar" to trigger the error. This goes via dbus to knotifyd (a daemon), which then uses KDE's Phonon library which then uses a gstreamer backend which then uses pulsesink.... The bug could easily be in phonon-gstreamer code, but only exposed by this change, but not knowing gstreamer overly well I'm not really qualified to comment on that possibility. On potential issue that does crop up is the fact that I also have another PA context running in the knotifyd daemon. This is my "control" connection to PA that I establish to do all the housekeeping work that is needed to get nice PA integration with Phonon. Obviously your patch is all about reducing the number of contexts, but my control connection will have it's own one which could mess things up a bit. Anyway, like I say I'll get a full stack-trace soon. Thanks again :)
Created attachment 168013 [details] First backtrace when using the latest patch Here is some debug output and a backtrace showing the problem reported. As I said previously, the error may be in phonon-gstreamer code and only working currently due to the alignment of the planets being just right, or it could be indicative of a real problem. The test case (as previously reported) is easy to trigger, but the whole stack is a bit complex if you're not running it all already!
Created attachment 168014 [details] Second backtrace when using the latest patch Here is a second backtrace. Different set of circumstances leading up to it, but same crash point.
In http://gitorious.org/phonon/phonon/blobs/master/gstreamer/devicemanager.cpp#line144 the first set_state call can fail (but the pulsesink mainloop is created and not freed) and the second set_state call triggers the assertion error, I think ;) Can you confirm this?
That was certainly my guess at what was going on, but I hadn't quite managed to work out the place where the mainloop needed to be free'd in such a scenario. I'll try and do some tests to see why the patch makes a difference. Perhaps the old code is such that the first state transition always succeeds and the crash trigger would have always been there, but now, the first transition can fail for some reason and that then means the code that causes the crash is run? As you didn't touch the mainloop code, I'm guessing that this is actually where the issue lies.... but I could be totally wrong :D
Just to confirm this behaviour, I put a filthy great bit assert() right after that initial call, and prior to your patch, it was never hit with my testcase. After applying your patch, it is hit. So my suspicions above are correct, in that your patch in some way makes this state change fail (although I strongly suspect the same actual gst assert could be hit by the phonon code with the current gst-good version in the event that the first state transition would fail - I guess that just hasn't happened before!) So not sure what to do next. Any ideas?
Can you please provide GST_DEBUG=pulse:5 debug output of both runs?
Created attachment 168051 [details] Debug output before the patch
Created attachment 168052 [details] Debug output after the patch
Created attachment 168055 [details] [review] pulsesink: clear the PA mainloop if baseaudiosink failed to open the ring_buffer If the application requests a state-change and pulsesink fails to open the ring_buffer device the mainloop attribute of the sink should be cleaned up to avoid future state-change (NULL->READY) failures. Colin, can you test this patch please? It should probably go in a separate bugreport but I first want to confirm it fixes your issue.
I wonder why the reused pa_context fails, will investigate some more. I think that with the above patch only one sound will be played.
Found another issue with the hash-table patch. The server name is not taken in account so in the same process I can have 2 pulsesinks that will share the same pa_context and try to connect to different PA servers. This is probably not a good idea ;) Will work some more on the patch.
Ahh yes, different servers would likely break it. Not a common case, but still should be supported I guess! As for your other patch, yes, this seems to resolve the issue for me. I get both sounds (or at least I think I do, I guess the first could be stopped by the second - kinda hard to tell from just listening as my test trigger case is just relatively short "bing-bongs") but either way I'm happy :D While testing I did manage to trigger a second bug that is unrelated to this, but which has been causing me major headaches for several months. FYI the other bug is this one https://bugs.kde.org/show_bug.cgi?id=232068 and the crash happens in alsasink, not pulsesink: for some reason PA is not detected as running (something that in this case is done outside of gst for other reasons) but the use of the alsasink with the alsa-pulse plugin ultimately cases a crash... not sure why. Anyway, that is very much off-topic for this bug, but after this is out of the way I'd really appreciate any insights you may have to the issue, but I'll leave that for another day! :) Many thanks again :)
Created attachment 168068 [details] [review] pulsesink: share the PA context between all clients with the same name Avoid to create a new PA context for each new client by using a hash table containing the list of ring-buffers and the shared PA context for each client. Doing this will improve application memory usage in the cases where multiple pipelines involving multiple pulsesink elements are used. Fixes bug #624338. The hash table keys are now context_name@server_name. So this should ensure that we can have shared multiple contexts connected to multiple servers (potentially).
Created attachment 168102 [details] [review] pulsesink: share the PA context between all clients with the same name Avoid to create a new PA context for each new client by using a hash table containing the list of ring-buffers and the shared PA context for each client. Doing this will improve application memory usage in the cases where multiple pipelines involving multiple pulsesink elements are used. Fixes bug #624338. Simpler patch, also fixes invalid memory access when destroy the context.
(In reply to comment #15) > > Then it might make sense to add a property to set the pulse client name. E.g. > in webkit you might want to have all pulsesinks of the same tab in the same > pulse client but not all pulsesinks of all tabs. > See bug 627174 > Also, gst_pulse_client_name() should maybe be improved and return something > like "GStreamer ($PID)" when the other two ways to get a name fail. See bug 627162
Comment on attachment 168102 [details] [review] pulsesink: share the PA context between all clients with the same name There's a small memory leak here, you have to free the GstPulseContext. I've fixed this locally and will push your changes after the release
commit 69a397c32f4baf07a7b2937c610f9e8f383e9ae9 Author: Philippe Normand <pnormand@igalia.com> Date: Mon Aug 16 09:12:04 2010 +0200 pulsesink: share the PA context between all clients with the same name Avoid to create a new PA context for each new client by using a hash table containing the list of ring-buffers and the shared PA context for each client. Doing this will improve application memory usage in the cases where multiple pipelines involving multiple pulsesink elements are used. Fixes bug #624338.
(In reply to comment #38) > Created an attachment (id=168102) [details] [review] > pulsesink: share the PA context between all clients with the same name > > Avoid to create a new PA context for each new client by using a hash > table containing the list of ring-buffers and the shared PA context > for each client. Doing this will improve application memory usage in > the cases where multiple pipelines involving multiple pulsesink > elements are used. > > Fixes bug #624338. > > Simpler patch, also fixes invalid memory access when destroy the context. Have you done any measurements on the memory saving? I am a bit lost what problem the patch actually tried to solve. I have reworked the patch under bug #628996 - please check that the original goals are still covered.
I checked that on PA-side the daemon was reusing the same context and thus not allowing extra shared memory. I will test the latest changes you commited, sorry again for the trouble.