GNOME Bugzilla – Bug 596366
proxy_getcaps reverses direction of getcaps
Last modified: 2009-12-11 09:17:19 UTC
Created attachment 144026 [details] backtrace for the lockup
+ Trace 217849
problem is that gst_proxy_pad_get_target() in #4 tries to GST_PROXY_LOCK (0x8188058);, which is already locked by gst_ghost_pad_set_target() in #218 just releasing the lock for the gst_pad_link() is not enough.
Is this a regression? If not, I'd like to punt it and make releases... now.
Stefan, do you see another possibility to fix this other than making the GST_PROXY_LOCK recursive?
It worked before. I hit it when recording a song in buzzard. I tried git bisecting it, but failed as things fail to run for older core versions. I am now trying to write a unit test to narrow it down. If I don't hit it in buzztard for very special reasons, I would consider it as quite critical.
The lokup happens when the ghostpad is linked. This is a workaround: if((peer_pad=gst_pad_get_peer(ghost_pad))) { gst_pad_unlink(peer_pad,ghost_pad); } if(!gst_ghost_pad_set_target(GST_GHOST_PAD(ghost_pad),new_target_pad)) { GST_WARNING("failed to link internal pads"); } if(peer_pad) { gst_pad_link(peer_pad,ghost_pad); gst_object_unref(peer_pad); }
Created attachment 144087 [details] [review] add ghostpad test for setting target again after pad is linked This test doe not prove the bug yet. In my backtrace one sees lots of basetransform elements and e.g. adders. I think the issue is caused by gst_pad_proxy_getcaps() checking the downstream caps again.
(In reply to comment #2) > Stefan, do you see another possibility to fix this other than making the > GST_PROXY_LOCK recursive? It a GMutex and imho there is no recursive variant for that (in glib).
Created attachment 144130 [details] [review] add ghostpad test for setting target again after pad is linked Add a test that actually builds
Created attachment 144131 [details] standalone test that uses adder too core tests cannot use adder. here is a standalone test app that uses adder and it still does not reproduce the issue. I also checked adder changes and they are not causing it.
(In reply to comment #6) > (In reply to comment #2) > > Stefan, do you see another possibility to fix this other than making the > > GST_PROXY_LOCK recursive? > > It a GMutex and imho there is no recursive variant for that (in glib). Right, but we could use a GStaticRecMutex instead. (Edward will love that :) ) But really, why is there no GRecMutex in GLib? That really shouldn't be hard to implement...
Created attachment 144200 [details] [review] possible patch The locking was a bit too eager, we can release them before we attempt to relink the new target.
Stefan, could you test this patch with Buzztard?
Created attachment 144257 [details] backtrace for the deeper lockup I still locks up, but now its a different lock :/
+ Trace 217937
gst_pad_link_prepare() in #216 takes the object lock on the pad and gst_pad_get_caps() in #4 tries to take it again. What I really don't understand is that why is the chain of _get_caps calls 'comming back' to the ghostpad.
Created attachment 144442 [details] test case to reproduce the problem simplified the problem is : src ! tee ! queue ! fx ! adder ! sink ! queue ! tee uses gst_pad_proxy_getcaps() and that send the getcaps back and hits sink again.
tee does not have substantial changes. I will try bisecting adder changes next.
Its not adder either. I bisected all changes of this year. I was trying this in sgstpad.c, but its causing more pain :/ GstCaps * gst_pad_get_caps (GstPad * pad) { GstCaps *result = NULL; g_return_val_if_fail (GST_IS_PAD (pad), NULL); + if (G_UNLIKELY (GST_OBJECT_FLAG_SET (pad, GST_PAD_IN_GETCAPS))) { + return gst_caps_new_empty (); + }
Created attachment 144598 [details] [review] break the getcaps cycle If we allow gst_pad_proxy_getcaps to do getcaps on all other pads, we need a way to flag all pads on our way (GST_PAD_IN_GETCAPS). Then we need to check for that (ideally without taking a lock). If the flag is not set take the lock, if the flag is set return ANY caps. I see no way to do the flag check and taking the lock. We would need another lock for the flag. The attached patch simply does a TRYLOCK on the pad and returns ANY if the lock cannot be taken. It fixes the testapp and also my bigger case. It does not break any tests in core and base for me. Still, is this the right thing to do?
I don't think that's OK, you really want to take the lock there and wait in most situations. Only in case of the cycles you don't want that... so that situation has to be flagged somehow, yes.
Any possible scenario where this could cause problems? Ans any idea how to fix it beside having another lock (e.g. capslock) and lock both with lock order. I mean do a trylock for the capslock and if we got it take the object lock too. If taking the capslock fails, return any caps.
commit 3ac4a08383216a3a0908aaac8b68b21c050fb8e7 Author: Wim Taymans <wim.taymans@collabora.co.uk> Date: Mon Sep 28 22:43:51 2009 +0200 ghostpad: take locks around smaller section We don't need the hold the proxy mutex locked for getting the internal pad and for linking the new target pad when we retarget. So take the lock a little later and release it earlier. Fixes #596366
a getcaps should never reverse the direction. if proxy_getcaps does that then it needs fixing. For the tee case, the caps comming out of the particular pad only depend on the upstream allowed caps, not on the downstream ones of the other branch.
gst_pad_proxy_get_caps() did this before latest GIT, I don't see how gst_proxy_pad_do_getcaps() could do this.
Should be fixed by this, the direction inversion seem to come from tee using gst_pad_proxy_get_caps(), which is now fixed. http://cgit.freedesktop.org/gstreamer/gstreamer/commit/?id=c6f2a9477750be536924bf8e70a830ddec4c1389
Confirmed, its fixed. Many thanks!