After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 614765 - pulsesink: racy stream status leave implementation
pulsesink: racy stream status leave implementation
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal critical
: 0.10.36
Assigned To: GStreamer Maintainers
GStreamer Maintainers
: 620341 623591 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2010-04-03 20:17 UTC by Stefan Sauer (gstreamer, gtkdoc dev)
Modified: 2011-07-05 14:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
pulsesink: fix racy shutdown (1.01 KB, patch)
2010-04-03 20:45 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
committed Details | Review
First cleanup patch (2.25 KB, patch)
2011-07-04 06:41 UTC, René Stadler
committed Details | Review
Second cleanup patch (2.52 KB, patch)
2011-07-04 06:41 UTC, René Stadler
committed Details | Review
Proposed fix (3.60 KB, patch)
2011-07-04 06:42 UTC, René Stadler
committed Details | Review

Description Stefan Sauer (gstreamer, gtkdoc dev) 2010-04-03 20:17:06 UTC
commit dbb8a331 added stream status support to pulsesink. I probably needs to ref pulsesink before pa_mainloop_api_once() and releasing the ref() there.

GLib-GObject-CRITICAL **: g_object_ref: assertion `object->ref_count > 0' failed
aborting...

Program received signal SIGABRT, Aborted.

Thread 3034962800 (LWP 26577)

  • #0 __kernel_vsyscall
  • #1 raise
    from /lib/libc.so.6
  • #2 abort
    from /lib/libc.so.6
  • #3 g_logv
    from /usr/lib/libglib-2.0.so.0
  • #4 g_log
    from /usr/lib/libglib-2.0.so.0
  • #5 g_return_if_fail_warning
    from /usr/lib/libglib-2.0.so.0
  • #6 g_object_ref
    from /usr/lib/libgobject-2.0.so.0
  • #7 ??
    from /usr/lib/libgobject-2.0.so.0
  • #8 gst_structure_id_set_valist
    at gststructure.c line 595
  • #9 gst_structure_id_new
    at gststructure.c line 638
  • #10 gst_message_new_stream_status
    at gstmessage.c line 1552
  • #11 mainloop_leave_defer_cb
    at pulsesink.c line 1021
  • #12 ??
    from /usr/lib/libpulse.so.0
  • #13 pa_mainloop_dispatch
    from /usr/lib/libpulse.so.0
  • #14 pa_mainloop_iterate
    from /usr/lib/libpulse.so.0
  • #15 pa_mainloop_run
    from /usr/lib/libpulse.so.0
  • #16 ??
    from /usr/lib/libpulse.so.0
  • #17 ??
    from /usr/lib/libpulsecommon-0.9.21.so
  • #18 start_thread
    at pthread_create.c line 297
  • #19 ??
    at pthread_create.c line 216

Comment 1 Stefan Sauer (gstreamer, gtkdoc dev) 2010-04-03 20:45:33 UTC
Created attachment 157837 [details] [review]
pulsesink: fix racy shutdown

Keep a ref of pulsesink for deferred mainloop invocation. Fixes #614765
Comment 2 Sebastian Dröge (slomo) 2010-04-04 18:25:56 UTC
Comment on attachment 157837 [details] [review]
pulsesink: fix racy shutdown

Looks good, please push :)
Comment 3 Wim Taymans 2010-04-06 15:23:32 UTC
commit 7cf9967e0b65fcf264487344dadfdacf33778ad3
Author: Stefan Kost <ensonic@users.sf.net>
Date:   Sat Apr 3 23:39:20 2010 +0300

    pulsesink: fix racy shutdown
    
    Keep a ref of pulsesink for deferred mainloop invocation. Fixes #614765
Comment 4 Benjamin Otte (Company) 2010-05-06 11:32:05 UTC
This is still crashing with 0.10.22 - see https://bugzilla.redhat.com/show_bug.cgi?id=589455 for details and a stack trace. The error is:

Assertion '!in_worker(m)' failed at pulse/thread-mainloop.c:161, function pa_threaded_mainloop_stop(). Aborting.

To reproduce:
while gst-launch-0.10 filesrc location=/usr/share/mail-notification/new-mail.wav ! decodebin ! audioconvert ! pulsesink; do true; done
and wait a bit.
Comment 5 Sebastian Dröge (slomo) 2010-05-06 11:53:44 UTC
commit 5332287e2d8f9dc7b32da0f6618a0730bd98513a
Author: Sebastian Dröge <sebastian.droege@collabora.co.uk>
Date:   Thu May 6 13:51:59 2010 +0200

    pulsesink: Create and free the PA mainloop in NULL->READY/READY->NULL
    
    This fixes a race condition, when stopping the mainloop during finalization
    is done from a mainloop callback.
    
    Fixes bugs #614765 and #590662.
Comment 6 Benjamin Otte (Company) 2010-06-02 10:52:27 UTC
*** Bug 620341 has been marked as a duplicate of this bug. ***
Comment 7 Sebastian Dröge (slomo) 2010-07-05 11:47:14 UTC
*** Bug 623591 has been marked as a duplicate of this bug. ***
Comment 8 Niklas Schnelle 2010-07-17 11:12:18 UTC
Didn't happen to me since the update so it seems indeed to be fixed.
Comment 9 Sebastian Dröge (slomo) 2010-07-28 05:40:44 UTC
*** Bug 625442 has been marked as a duplicate of this bug. ***
Comment 10 gudake 2010-07-28 19:52:48 UTC
I saw the same "Assertion '!in_worker(m)' failed "  that Benjamin found  - bug 625442

It's rare though.

if the pulse callback function is the LAST one being called,  it
will finalize the pulsesink and calling pa_threaded_mainloop_stop() from the
pathread,  which is prohibited, because if you take a look of
pa_threaded_mainloop_stop(),  it calls pthread_join(),  you CANNOT
pthread_join() within the same thread.
Comment 11 René Stadler 2011-07-04 06:40:22 UTC
Re-opening: The shutdown phase is still racy. Passing the reference to the callback is an incomplete solution, as it is still possible that the mainloop is stopped and cleaned up before the callback got a chance to run. In this case, the reference is leaked.

This showed up in Nokia N9 as a shortage of file descriptors, because libpulse keeps the control sockets around.

The correct solution is to wait for the callback(s) to be fired (libpulse has no DestroyNotify mechanism for the userdata argument). As comment #10 points out, the reference passing can be eliminated altogether since an actual cleanup from the callback would just deadlock/assert anyways.

Attaching proposed patches...
Comment 12 René Stadler 2011-07-04 06:41:09 UTC
Created attachment 191210 [details] [review]
First cleanup patch
Comment 13 René Stadler 2011-07-04 06:41:35 UTC
Created attachment 191211 [details] [review]
Second cleanup patch
Comment 14 René Stadler 2011-07-04 06:42:00 UTC
Created attachment 191212 [details] [review]
Proposed fix
Comment 15 Mark Nauwelaerts 2011-07-05 10:06:45 UTC
Looks fine.  Just not clear about the commenting w.r.t. to the possibility that a cleanup could occur from a callback, which seems currently not possible since the pa_mainloop is stopped going to NULL ---and so no more callbacks--- before sink can be finalized.
Comment 16 René Stadler 2011-07-05 10:41:26 UTC
Yeah, I thought that would explain the assertion failure from that other bug.

Either way, the reference passing is worthless since the callbacks are being waited for again.
Comment 17 René Stadler 2011-07-05 10:42:13 UTC
I think it can really happen if you have multiple sinks. The mainloop only gets stopped if the last sink is cleaning up.
Comment 18 Mark Nauwelaerts 2011-07-05 14:40:34 UTC
Indeed. Thanks.

commit ae87731de5dc9a1f39edd2b4dd12db30ad1ff0fd
Author: René Stadler <rene.stadler@nokia.com>
Date:   Wed Jun 29 20:59:26 2011 +0300

    pulsesink: prevent race condition causing ref leak
    
    Since commit 8bfd80, gst_pulseringbuffer_stop doesn't wait for the
    deferred call to be run before returning. This causes a race when
    READY->NULL is executed shortly after, which stops the mainloop. This
    leaks the element reference which is passed as userdata for the callback
    (introduced in commit 7cf996, bug #614765).
    
    The correct fix is to wait in READY->NULL for all outstanding calls to
    be fired (since libpulse doesn't provide a DestroyNotify for the
    userdata). We get rid of the reference passing from 7cf996 altogether,
    since finalization from the callback would anyways lead to a deadlock.
    
    Re-fixes bug #614765.

commit f8456e2a1aaa9ae3a146ffb2a4510bff581eb88d
Author: René Stadler <rene.stadler@nokia.com>
Date:   Mon Jul 4 08:58:14 2011 +0300

    pulsesink: small cleanup of copy-paste code

commit 3589cee762e8bb6bf4da6bb52a58554b61de66a0
Author: René Stadler <rene.stadler@nokia.com>
Date:   Wed Jun 29 19:50:42 2011 +0300

    pulsesink: remove unused member variable and misleading log message
    
    Wim changed it in commit 8bfd80 so that pa_defer_ran is not read
    anywhere.
    
    The log message used to annotate a mainloop_wait call which is gone.