GNOME Bugzilla – Bug 650916
REGRESSION: ssrcdemux causing FLOW_NOT_LINKED
Last modified: 2011-07-28 13:06:58 UTC
Basically the problem is this one: 1bf94a92b04e4b0ff83fd015daa46e880cc4d920 If the lock is not released before emitting a signal, it may cause a deadlock if any other function in the element is called. Also removed an unused timestamp parameter https://bugzilla.gnome.org/show_bug.cgi?id=649617 What happens: 1. One rtp and one rtcp packet arrives at the same time inside the ssrcdemux 2. Both call find_or_create_demux_pad_for_ssrc (), but only the "rtcp-thread" proceeds to create the pad, because of the Lock 3. The "rtcp-thread" that created the pad takes a little break just after releasing the lock. 4. This gives the "rtp-thread" time to take the lock, check for the now existing pad, and go on to push on this pad. 5. However, since the linking of the pad to the rtpjitterbuffer happens inside the g_signal_emit of SIGNAL_NEW_SSRC_PAD, and this has not yet happened in the "rtcp-thread" (remember it is still on a little break), the push will fail with UNLINKED Also, I am puzzled that the designated GST_PAD_LOCK is swapped for GST_OBJECT_LOCK inside the find_or_create_demux_pad_for_ssrc () function? This means that the demux->srcpads list can have elements added to it while it is being iterated another place = dangerous stuff. My suggestion would be to swap GST_OBJECT_LOCK for GST_PAD_LOCK and keep holding it until the pad has been linked. (pretty much what the old behavior was) Is it an actual observed deadlock that has taken place that these changes are trying to fix? Our tests went blood red after syncing this patch, so I believe this is something that all users of RtpBin will experience sooner or later.
The problem the original patch wants to fix is that if an event is pushed through the pad while the signal is emitted, then it will deadlock.. Maybe we should hold the STREAM lock of the new pad while emitting the signal. That said, you're also clearly right that this should use the PAD lock instead of the OBJECT lock. While reading the code, I'm always worried that if in the chain functions the clear_ssrc function is called between the time when the lock is released and the time where the pad_push function is called.
(In reply to comment #1) > The problem the original patch wants to fix is that if an event is pushed > through the pad while the signal is emitted, then it will deadlock.. Maybe we > should hold the STREAM lock of the new pad while emitting the signal. > With the old code, AFAICT, if an event arrived while signal was emitted, the event would be waiting for the PAD_LOCK. Now, unless the event was directly and blockingly triggered by the emitted signal, I don't see that there should be a deadlock? > > While reading the code, I'm always worried that if in the chain functions the > clear_ssrc function is called between the time when the lock is released and > the time where the pad_push function is called. Did not the original implementation hold the PAD_LOCK right through the push?
No, if one emits an event during the signal, well you still hold the pad lock, and you try to take it again in the chain function..Maybe the pad lock should just be made recursive.
Created attachment 188438 [details] [review] rtpssrcdemux: Use PADs lock
Created attachment 188439 [details] [review] rtpssrcdemux: Keep a ref on the src pad while using it Prevent a possible race if clear_ssrc() is called between getting the pad and doing the push.
Created attachment 188440 [details] [review] rtpssrcdemux: Make the pads lock recursive and hold it across the signal emit We need to keep the lock held because we don't want a push before the "new-ssrc-pad" handler has completed. But we may want to push an event from inside that handler, hence the recursive mutex.
Hopefully these 3 patches should fix the problems.. Sorry for the f*up with the last patch, please review these 3 times and test them to make sure I got it right this time.
You missed the GST_OBJECT_UNLOCK on line 174!?
Created attachment 188441 [details] [review] rtpssrcdemux: Use PADs lock Thanks, now lets review it 3 times again!
Created attachment 188442 [details] [review] rtpssrcdemux: Use PADs lock Arg, without changing common...
Created attachment 188443 [details] [review] rtpssrcdemux: Keep a ref on the src pad while using it Prevent a possible race if clear_ssrc() is called between getting the pad and doing the push.
Created attachment 188467 [details] [review] rtpssrcdemux: Keep a ref on the src pad while using it Prevent a possible race if clear_ssrc() is called between getting the pad and doing the push.
Thanks. Committed the following, where the "ref on pad" uses a (IMHO) somewhat simpler and less invasive approach. commit e26b5391c2d9259bb85f031e0760f05f037fe9bc Author: Olivier Crête <olivier.crete@collabora.com> Date: Tue May 24 11:17:25 2011 +0300 rtpssrcdemux: Use PADs lock https://bugzilla.gnome.org/show_bug.cgi?id=650916 commit e26b5391c2d9259bb85f031e0760f05f037fe9bc Author: Olivier Crête <olivier.crete@collabora.com> Date: Tue May 24 11:17:25 2011 +0300 rtpssrcdemux: Use PADs lock https://bugzilla.gnome.org/show_bug.cgi?id=650916 commit 3a98f6f0fdd50d90f4881d096eade2e1d9da61a8 Author: Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk> Date: Thu Jul 28 14:44:57 2011 +0200 rtpssrcdemux: keep a ref on the src pad while using it Prevent a possible race if clear_ssrc() is called between getting the pad and doing the push. Based on patch by <olivier.crete@collabora.com> https://bugzilla.gnome.org/show_bug.cgi?id=650916