After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 725809 - ghostpad: rare crash because of missing reference count on its target pad
ghostpad: rare crash because of missing reference count on its target pad
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
git master
Other Linux
: Normal normal
: 1.2.4
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-03-06 11:04 UTC by George Kiagiadakis
Modified: 2014-04-12 13:18 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
proposed patch (1.05 KB, patch)
2014-03-06 11:04 UTC, George Kiagiadakis
committed Details | Review
same fix for gst_proxy_pad_get_target() (973 bytes, patch)
2014-03-06 11:09 UTC, George Kiagiadakis
committed Details | Review

Description George Kiagiadakis 2014-03-06 11:04:51 UTC
Created attachment 271082 [details] [review]
proposed patch

Hello,

GstGhostPad has a bug in gst_ghost_pad_set_target() where it unlinks its target pad from its internal proxy pad without holding a reference to the target pad. This can lead to a crash in the rare case that another thread destroys the target pad at the same time.

This has been observed to happen with rtspsrc in the 0.10 series and I believe it can also be observed in 1.0 before this patch was applied:
http://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/gst/rtsp/gstrtspsrc.c?id=d9bc48edc977f86b39aabe9e1f125603c82bec6a

Before this patch was applied, rtspsrc would internally create and add elements to itself (being a GstBin) without locking their state, which would lead to the following happening in its set_state(NULL) sequence:

- change_state() gets called with GST_STATE_CHANGE_PAUSED_TO_READY
- rtspsrc sends a command to its thread to close the connections and stop
- change_state() returns immediately and later gets called again with GST_STATE_CHANGE_READY_TO_NULL
- the rtspsrc implementation of change_state() calls the parent's (GstBin's) implementation, which sets all children to NULL state, which causes rtpbin (one of the children) to destroy its dynamic srcpad that is proxied in rtspsrc with a ghost pad
- meanwhile, rtspsrc's other thread is closing the connections and destroying its resources, including the ghost pad that proxies rtpbin's srcpad
- if rtpbin manages to destroy the srcpad while gst_pad_unlink(), called from gst_ghost_pad_set_target(), is running on the other thread -> crash!

Of course, the problem with rtspsrc is no longer relevant. However, the fact that GstGhostPad does not hold a reference to the pad while it is calling a method on it is still relevant.

The attached patch should solve the issue. Note that I am using gst_pad_get_peer(internal) because I believe it is essential to hold the internal pad's lock, NOT the ghost pad's lock. The target pad can still be destroyed with the ghost pad locked, but it cannot be destroyed with the internal pad locked, because the target's dispose() function would have to unlink it and therefore take the lock of both pads (internal & target). For this reason, I believe that gst_proxy_pad_get_target() also suffers from the same bug; it locks the ghost pad, which is irrelevant, and tries to take the peer of the internal pad.
Comment 1 George Kiagiadakis 2014-03-06 11:09:29 UTC
Created attachment 271084 [details] [review]
same fix for gst_proxy_pad_get_target()
Comment 2 Nicolas Dufresne (ndufresne) 2014-03-06 15:04:34 UTC
Can you right a test ?  It looks like a sensible change to me, and writing a test should be much of a problem (no need to involve any rtp stuff here).
Comment 3 Nicolas Dufresne (ndufresne) 2014-03-06 15:05:33 UTC
You may also *write* a test ;-P (and should *not* be a problem)
Comment 4 Sebastian Dröge (slomo) 2014-03-06 19:47:46 UTC
A test would of course still be good

commit 5973f8c2cafc40e1d3d00f6b00e39a2886db9655
Author: George Kiagiadakis <george.kiagiadakis@collabora.com>
Date:   Thu Mar 6 13:01:40 2014 +0200

    ghostpad: use gst_pad_get_peer to acquire a reference to the target pad
    
    This ensures that the lock of the internal pad is held while referencing
    it's peer (= the target pad), which ensures that the peer is not
    going to be unlinked/destroyed in the meantime.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=725809

commit 4fc671e801c64d8ea329bf2a29485240db3f0bc3
Author: George Kiagiadakis <george.kiagiadakis@collabora.com>
Date:   Thu Mar 6 12:40:23 2014 +0200

    ghostpad: hold a reference to the target pad while unlinking it
    
    https://bugzilla.gnome.org/show_bug.cgi?id=725809
Comment 5 George Kiagiadakis 2014-03-07 10:39:14 UTC
I have tried, but a unit test is very difficult to write in a way that it reproduces the problem at least most of the times. To reproduce it, the other thread needs to fully destroy the target pad while the thread running gst_ghost_pad_set_target() is between the acquisition of the target pad pointer and its unlinking from the internal pad. This probably means that the OS scheduler needs to unschedule the thread running gst_ghost_pad_set_target() for a big amount of time in between these two events.

The application that this bug was originally found in would usually crash once every 30 to 40 hours (sometimes even more!) of running a continuous loop that creates and destroys a full pipeline with rtspsrc every 5 seconds.
Comment 6 Nicolas Dufresne (ndufresne) 2014-03-07 13:48:09 UTC
Ok then, understood, If anything trick comes to my mind I'll let you know.