After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 650916 - REGRESSION: ssrcdemux causing FLOW_NOT_LINKED
REGRESSION: ssrcdemux causing FLOW_NOT_LINKED
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other All
: Normal blocker
: 0.10.31
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2011-05-23 21:56 UTC by Håvard Graff (hgr)
Modified: 2011-07-28 13:06 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
rtpssrcdemux: Use PADs lock (1.47 KB, patch)
2011-05-24 08:32 UTC, Olivier Crête
none Details | Review
rtpssrcdemux: Keep a ref on the src pad while using it (4.12 KB, patch)
2011-05-24 08:32 UTC, Olivier Crête
none Details | Review
rtpssrcdemux: Make the pads lock recursive and hold it across the signal emit (2.65 KB, patch)
2011-05-24 08:32 UTC, Olivier Crête
committed Details | Review
rtpssrcdemux: Use PADs lock (1.56 KB, patch)
2011-05-24 08:52 UTC, Olivier Crête
none Details | Review
rtpssrcdemux: Use PADs lock (1.31 KB, patch)
2011-05-24 08:54 UTC, Olivier Crête
committed Details | Review
rtpssrcdemux: Keep a ref on the src pad while using it (4.15 KB, patch)
2011-05-24 08:54 UTC, Olivier Crête
none Details | Review
rtpssrcdemux: Keep a ref on the src pad while using it (4.12 KB, patch)
2011-05-24 14:59 UTC, Olivier Crête
none Details | Review

Description Håvard Graff (hgr) 2011-05-23 21:56:46 UTC
Basically the problem is this one:

1bf94a92b04e4b0ff83fd015daa46e880cc4d920
If the lock is not released before emitting a signal, it may cause a deadlock if any other function in the element is called. Also removed an unused timestamp parameter https://bugzilla.gnome.org/show_bug.cgi?id=649617

What happens:

1. One rtp and one rtcp packet arrives at the same time inside the ssrcdemux
2. Both call find_or_create_demux_pad_for_ssrc (), but only the "rtcp-thread" proceeds to create the pad, because of the Lock
3. The "rtcp-thread" that created the pad takes a little break just after releasing the lock.
4. This gives the "rtp-thread" time to take the lock, check for the now existing pad, and go on to push on this pad.
5. However, since the linking of the pad to the rtpjitterbuffer happens inside the g_signal_emit of SIGNAL_NEW_SSRC_PAD, and this has not yet happened in the "rtcp-thread" (remember it is still on a little break), the push will fail with UNLINKED

Also, I am puzzled that the designated GST_PAD_LOCK is swapped for GST_OBJECT_LOCK inside the find_or_create_demux_pad_for_ssrc () function? This means that the demux->srcpads list can have elements added to it while it is being iterated another place = dangerous stuff.

My suggestion would be to swap GST_OBJECT_LOCK for GST_PAD_LOCK and keep holding it until the pad has been linked. (pretty much what the old behavior was)

Is it an actual observed deadlock that has taken place that these changes are trying to fix?

Our tests went blood red after syncing this patch, so I believe this is something that all users of RtpBin will experience sooner or later.
Comment 1 Olivier Crête 2011-05-24 01:27:31 UTC
The problem the original patch wants to fix is that if an event is pushed through the pad while the signal is emitted, then it will deadlock.. Maybe we should hold the STREAM lock of the new pad while emitting the signal. 

That said, you're also clearly right that this should use the PAD lock instead of the OBJECT lock.

While reading the code, I'm always worried that if in the chain functions the clear_ssrc function is called between the time when the lock is released and the time where the pad_push function is called.
Comment 2 Håvard Graff (hgr) 2011-05-24 04:21:55 UTC
(In reply to comment #1)
> The problem the original patch wants to fix is that if an event is pushed
> through the pad while the signal is emitted, then it will deadlock.. Maybe we
> should hold the STREAM lock of the new pad while emitting the signal. 
> 

With the old code, AFAICT, if an event arrived while signal was emitted, the event would be waiting for the PAD_LOCK. Now, unless the event was directly and blockingly triggered by the emitted signal, I don't see that there should be a deadlock?

> 
> While reading the code, I'm always worried that if in the chain functions the
> clear_ssrc function is called between the time when the lock is released and
> the time where the pad_push function is called.

Did not the original implementation hold the PAD_LOCK right through the push?
Comment 3 Olivier Crête 2011-05-24 04:39:32 UTC
No, if one emits an event during the signal, well you still hold the pad lock, and you try to take it again in the chain function..Maybe the pad lock should just be made recursive.
Comment 4 Olivier Crête 2011-05-24 08:32:40 UTC
Created attachment 188438 [details] [review]
rtpssrcdemux: Use PADs lock
Comment 5 Olivier Crête 2011-05-24 08:32:45 UTC
Created attachment 188439 [details] [review]
rtpssrcdemux: Keep a ref on the src pad while using it

Prevent a possible race if clear_ssrc() is called between getting the pad and
doing the push.
Comment 6 Olivier Crête 2011-05-24 08:32:49 UTC
Created attachment 188440 [details] [review]
rtpssrcdemux: Make the pads lock recursive and hold it across the signal emit

We need to keep the lock held because we don't want a push before the "new-ssrc-pad"
handler has completed. But we may want to push an event from inside that handler, hence
the recursive mutex.
Comment 7 Olivier Crête 2011-05-24 08:37:07 UTC
Hopefully these 3 patches should fix the problems.. Sorry for the f*up with the last  patch, please review these 3 times and test them to make sure I got it right this time.
Comment 8 Haakon Sporsheim (ieei) 2011-05-24 08:40:40 UTC
You missed the GST_OBJECT_UNLOCK on line 174!?
Comment 9 Olivier Crête 2011-05-24 08:52:20 UTC
Created attachment 188441 [details] [review]
rtpssrcdemux: Use PADs lock

Thanks, now lets review it 3 times again!
Comment 10 Olivier Crête 2011-05-24 08:54:12 UTC
Created attachment 188442 [details] [review]
rtpssrcdemux: Use PADs lock

Arg, without changing common...
Comment 11 Olivier Crête 2011-05-24 08:54:21 UTC
Created attachment 188443 [details] [review]
rtpssrcdemux: Keep a ref on the src pad while using it

Prevent a possible race if clear_ssrc() is called between getting the pad and
doing the push.
Comment 12 Olivier Crête 2011-05-24 14:59:46 UTC
Created attachment 188467 [details] [review]
rtpssrcdemux: Keep a ref on the src pad while using it

Prevent a possible race if clear_ssrc() is called between getting the pad and
doing the push.
Comment 13 Mark Nauwelaerts 2011-07-28 13:06:11 UTC
Thanks.  Committed the following, where the "ref on pad" uses a (IMHO) somewhat simpler and less invasive approach.

commit e26b5391c2d9259bb85f031e0760f05f037fe9bc
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Tue May 24 11:17:25 2011 +0300

    rtpssrcdemux: Use PADs lock
    
    https://bugzilla.gnome.org/show_bug.cgi?id=650916

commit e26b5391c2d9259bb85f031e0760f05f037fe9bc
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Tue May 24 11:17:25 2011 +0300

    rtpssrcdemux: Use PADs lock
    
    https://bugzilla.gnome.org/show_bug.cgi?id=650916

commit 3a98f6f0fdd50d90f4881d096eade2e1d9da61a8
Author: Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk>
Date:   Thu Jul 28 14:44:57 2011 +0200

    rtpssrcdemux: keep a ref on the src pad while using it
    
    Prevent a possible race if clear_ssrc() is called between getting the pad and
    doing the push.
    
    Based on patch by <olivier.crete@collabora.com>
    
    https://bugzilla.gnome.org/show_bug.cgi?id=650916