GNOME Bugzilla – Bug 752213
tee: Avoid race condition while forwarding sticky events
Last modified: 2018-11-03 12:29:01 UTC
Created attachment 307207 [details] [review] tee: Avoid race condition while forwarding sticky events I have found that in some circumstances a source pad receives data without having a segment event, printing this error: Unexpected critical/warning: gstpad.c:4258:gst_pad_push_data:<tee10:src_1> Got data flow before segment event It is a race condition, because sink pad did not have segment event while adding the new source pad and it is received before the source pad is added to the tee but after the sticky events are forwarded, so this event is not in the new src pad. Previous pads are not failing, just the new one. And this happens rarely 1 on 500 or so. The solution I found is to lock the sink stream while adding the pad and once it is added forward the sticky events. With this patch I can not reproduce the issue.
Comment on attachment 307207 [details] [review] tee: Avoid race condition while forwarding sticky events I'm not much in favour of using the stream lock here. It might block the application or whoever requested the pad for a while. I think the best would be to remember when we last sent sticky events, and check in the chain function if we need to update them. Some counter/cookie thing should work for that too.
I've tried to push sticky events on chain function, using a flag to check if they were synchronized or not, but the patch becomes too complicated, because if the next we receive is not data but an event we have to forward events too (otherwise we can have the warning of receiving sticky events missordered), but I think that is too much overhead to check all the source pads sticky events whenever we receive an event at sink pad. I think that getting the stream mutex is less harmful than have to increase complexity whenever we are receiving events and/or buffers.
I've been reading the code a little bit more and one of the bigger problems I see is that the event function should be re-implemented and gst_pad_forward could not be used. Instead we may need a function like the one pushing the buffers, checking for each pad if it needs the sticky events to be forwarded.
Created attachment 307224 [details] [review] tee: Avoid race condition while forwarding sticky events This patch does not get the stream mutex while requesting the pad, but does not solve the issue completely, just reduces fail to 1 on 700. Do you think if there is something missing in the patch? I still get the error, but this time not in tee source pad pad but on its peer: Unexpected critical/warning: gstpad.c:4013:gst_pad_chain_data_unchecked:<queue15:sink> Got data flow before segment event
Created attachment 307225 [details] [review] tee: Avoid race condition while forwarding sticky events Sorry, wrong patch This patch does not get the stream mutex while requesting the pad, but does not solve the issue completely, just reduces fail to 1 on 700. Do you think if there is something missing in the patch? I still get the error, but this time not in tee source pad pad but on its peer: Unexpected critical/warning: gstpad.c:4013:gst_pad_chain_data_unchecked:<queue15:sink> Got data flow before segment event
Do you think you could whip up a unit test that reproduces the problem (doesn't have to be 100% reliable, but if it fails often enough that'd be useful)?
It's being a while since I found this, I'll try to recover the test I did.
I recover the test and with gstreamer 1.6.0 but there are issues (not checket yet) on 1.7.1
Created attachment 319432 [details] Test I'ts not a unit test but a simple program that shows issues with 1.7.1
Created attachment 324280 [details] [review] tee: fix race condition while forwarding sticky events
The patch uploaded fixes the race condition in a easy and efficient way without changing the code a lot.
Review of attachment 324280 [details] [review]: ::: plugins/elements/gsttee.c @@ +420,3 @@ /* Forward sticky events to the new srcpad */ gst_pad_sticky_events_foreach (tee->sinkpad, forward_sticky_events, srcpad); GST_OBJECT_FLAG_SET (srcpad, GST_PAD_FLAG_PROXY_CAPS); Can we set this flag before entering the critical section (before forwarding sticky events)?
It seems weird to set it afterwards... It's part of the pad setup and should be done before actually doing anything with the pad. So yes, feel free to move it please :)
Created attachment 324415 [details] [review] tee: fix race condition while forwarding sticky events Now, the flag of srcpad is set before entering the critical section, so it is the minimum possible.
Review of attachment 324415 [details] [review]: Please add some more explanation to the commit message about what exactly the problem here is. Why is this needed at all? All sticky events are going to be copied over before the pad is added to the element, once the pad is added to the element all events will be passed through as usual. Where is the race? ::: plugins/elements/gsttee.c @@ +306,3 @@ gst_tee_init (GstTee * tee) { + g_rec_mutex_init (&tee->mutex_events); Why does this need a recursive mutex?
Created attachment 324416 [details] [review] tee: fix race condition while forwarding sticky events Add an explanation of the race condition in the commit message.
Review of attachment 324415 [details] [review]: ::: plugins/elements/gsttee.c @@ +306,3 @@ gst_tee_init (GstTee * tee) { + g_rec_mutex_init (&tee->mutex_events); It is needed because if the app creates a new src pad from the same thread of gst_pad_event_default (in gst_tee_sink_event), a deadlock will happen.
Comment on attachment 324416 [details] [review] tee: fix race condition while forwarding sticky events I think instead of all this it would then be better to add some kind of events cookie (e.g. a simple counter) and before pushing a buffer always check if it's up to date and otherwise update the sticky events before pushing. That works without yet another (recursive!) mutex and should be simpler. The same bug is probably in other elements too, e.g. output-selector
Which is the problem of using a recursive mutex? I think that all these kinds of problems that GStreamer has is because of not using a recursive mutex.
They are expensive and we already have a lot of these, and in this specific case a mutex seems not even like the simpler solution.
OK, so if you think that your solution is easier and cheaper, could you upload a patch with your proposal please? So, we will be able to discuss over the code ;).
The main problem is btw not that it's expensive, but that every unneeded mutex is a liability that might cause your future self a deadlock later.
For now, we are going to use this patch in our own fork. If we detect some deadlock or another problem, we will report it here.
Created attachment 324923 [details] [review] tee: Set GST_PAD_FLAG_PROXY_CAPS before forwarding sticky events
Created attachment 324924 [details] [review] tee: Prevent race condition between adding pads and forwarding sticky events Between copying all sticky events to the new pad and adding it to the element, a new sticky event might arrive that would then never be forwarded to the new pad. To be forwarded automatically it would have to be added to the element already. Instead just remember when we have to recheck sending sticky events and do so in the chain function if needed.
Miguel, can you test this one with your testcase? Thanks
Created attachment 324939 [details] [review] tee: Prevent race condition between adding pads and forwarding sticky events Patch based on the Sebastian's idea but simpler taking into account that the race condition can only take place for new src pads.
Created attachment 324940 [details] [review] tee: Prevent race condition between adding pads and forwarding sticky events I overwritten the author, in this patch I put Sebastian as author because he had the cookie idea.
(In reply to Sebastian Dröge (slomo) from comment #26) > Miguel, can you test this one with your testcase? Thanks It works fine, or at least I haven't found a way of breaking it ;). See the new patch following the idea of cookie please.
Created attachment 324944 [details] [review] tee: force Sticky event misordering (In reply to Miguel París Díaz from comment #29) > (In reply to Sebastian Dröge (slomo) from comment #26) > > Miguel, can you test this one with your testcase? Thanks > > It works fine, or at least I haven't found a way of breaking it ;). > See the new patch following the idea of cookie please. I think that with the solution based on cookies we can have "Sticky event misordering" problems. The race condition to see this is: 1 - New src pad when the sink pad does not have any sticky event 2 - Forward sticky events from sink pad to src pad (but there is not any event) 3 - The 'stream-start' event arrives to the sink pad (but it is not forwarded to the src pad because it has not been added yet). 4 - The src pad is added to the tee element. 5 - The 'caps' event arrives to the sink pad and it is forwarded to the src pad, so we have the 'Sticky event misordering' error (because the 'stream-start' haven't been forwarded from the chain function yet. With the related patch I have forced the problem: gstpad.c:5031:store_sticky_event:<tee3:src_1> Sticky event misordering, got 'caps' before 'stream-start' Does this way of forcing this situation make sense?
Review of attachment 324940 [details] [review]: ::: plugins/elements/gsttee.c @@ +567,3 @@ gboolean res; + res = gst_pad_event_default (pad, parent, event); You can't use gst_pad_event_default() here, that's racy and why I didn't do that :) But in the forward function you will *also* need to check the current pad cookie and if it's wrong you additionally need to forward all events there. That should fix the sticky event misordering problem
OK, I am doing a patch to solve the misordering problem. Moreover, why using gst_pad_event_default is racy?
Actually that shouldn't matter here. But you still can't use it because you need to do checks before forwarding the event ;)
Created attachment 325078 [details] [review] tee: Prevent race condition between adding pads and forwarding sticky events
Created attachment 325079 [details] [review] tee: Prevent race condition between adding pads and forwarding sticky events
Created attachment 325080 [details] [review] tee: do not override query function for sink pad
Review of attachment 325080 [details] [review]: We don't do this Change-Id stuff in the commit messages :) Please also explain why you remove it in the commit message while you're editing it anyway and add a link to the bug report there (e.g. use git-bz)
Review of attachment 325079 [details] [review]: ::: plugins/elements/gsttee.c @@ +141,3 @@ gboolean removed; + + gboolean do_fw_sticky_events; Maybe "initial_sticky_events_pending" is more descriptive. This is not about general forwarding of them, but of the very initial ones. @@ -416,3 @@ GST_OBJECT_FLAG_SET (srcpad, GST_PAD_FLAG_PROXY_CAPS); - /* Forward sticky events to the new srcpad */ - gst_pad_sticky_events_foreach (tee->sinkpad, forward_sticky_events, srcpad); By *not* copying the sticky events here, the tee srcpad is not going to have any caps/segment/etc anymore. This might break some application assumptions.
(In reply to Sebastian Dröge (slomo) from comment #38) > Review of attachment 325079 [details] [review] [review]: > @@ -416,3 @@ > GST_OBJECT_FLAG_SET (srcpad, GST_PAD_FLAG_PROXY_CAPS); > - /* Forward sticky events to the new srcpad */ > - gst_pad_sticky_events_foreach (tee->sinkpad, forward_sticky_events, > srcpad); > > By *not* copying the sticky events here, the tee srcpad is not going to have > any caps/segment/etc anymore. This might break some application assumptions. It is done when the sink pad receives an event (event_forward_func) or a buffer (gst_tee_handle_data). Could be any problem?
Yes, because the pad won't have caps immediately when requesting a pad. But only after some random time.
Yes, but this is the point and the race condition that we want to solve. Forwarding sticky events before adding the src pad does not ensure that this pad has the caps, because the caps can arrive between forwarding sticky events (probably any event) and the pad is added, so we are in the same situation. The only way of ensuring this is that forwarding sticky events and adding the pad was done in a critical section. Anyway, what could happen if the src pad does not have the caps until *just before* a new event or a buffer arrives to the sink pad?
For demuxers, applications generally assume that they can get caps immediately when the pad is available. There might be applications assuming the same for the case when requesting a new pad from an already running pipeline. Those applications would fail now.
OK, I see the point and you are right, in these cases with the last patch it would *always* fail. But with the current implementation and even with the other patches proposed in in this thread (based on flags and cookies), it would *sometimes* fail if the non desirable race condition happens. This race condition is: 1 - New src pad when the sink pad does not have any sticky event 2 - Forward sticky events from sink pad to src pad (but there is not any event) 3 - A new sticky event (eg: caps) arrives to the sink pad (but it is not forwarded to the src pad because it has not been added yet). 4 - The src pad is added to the tee element. 5 - Demuxers or apps fail because there is not any caps in the src pad. To solve this problem the forwarding sticky events and adding the pad must be done in a critical section, the same that the event function of the sink pad. This critical section could be performed with the patch https://bug752213.bugzilla-attachments.gnome.org/attachment.cgi?id=324416, but: > The main problem is btw not that it's expensive, but that every unneeded > mutex is a liability that might cause your future self a deadlock later If I am not wrong, this only can happen if this mutex (mutex_events) is locked into another section locked by a mutex-X && mutex-X is locked into another section locked by this mutex (mutex_events). Where mutex-X only can be: - GST_OBJECT_LOCK (pad) - GST_OBJECT_LOCK (tee) So, can happen that - a new src pad of the tee is requeste - or gst_tee_sink_event is called with mutex-X locked? Other situations that can cause a deadlock? Could we do a better solution taking into account that the cookies and flags ones did not fix the problem?
Not sure why you need a recursive mutex here, or any mutex at all. Why doesn't any of the existing solutions work if you just forward the sticky events that are currently there in request_new_pad() before the pad is added... and forward all pending sticky events on the next serialized event/buffer?
(In reply to Sebastian Dröge (slomo) from comment #44) > Not sure why you need a recursive mutex here, or any mutex at all. Why > doesn't any of the existing solutions work if you just forward the sticky > events that are currently there in request_new_pad() before the pad is > added... and forward all pending sticky events on the next serialized > event/buffer? Because the race condition that I told you in my previous comment #43 causes the situation you told me in the comment #40 > > By *not* copying the sticky events here, the tee srcpad is not going to have > > any caps/segment/etc anymore. This might break some application assumptions. > > Yes, because the pad won't have caps immediately when requesting a pad. But > > only after some random time. These both things can happen if the race condition takes place. Refs [1] https://bugzilla.gnome.org/show_bug.cgi?id=752213#c43 [2] https://bugzilla.gnome.org/show_bug.cgi?id=752213#c40
Created attachment 330260 [details] [review] tee: do not override query function for sink pad
Created attachment 330267 [details] Simple program to force sticky event misordeing in tee src's peer pad
Created attachment 330268 [details] [review] pad: simple patch to force race conditions
Hello, I have found another race condition due not only the tee but the pad too. I have done a simple app (attachment 330267 [details]) and a simple commit in gstpad.c (attachment 330268 [details] [review]) to force this race condition, which provokes a sticky event misordering in the peer pad of the tee src pads (in our case in the sink pad of the second fakesink) There are 2 threads: - streaming_thread: managed by the task of the SrcPad of a FakeSrc - app_thread: performs the creation and linkage of the tee src pads The race condition is: 1 - [app_thread] tee0 and fakesink0 are linked 2 - [streaming_thread] stream-start event arrives to the tee0:sink pad 2.1 - it is forwarded to tee0:src_0 and fakesink0:sink 3 - [app_thread] Just: - after forwarding the event to all tee src pads - and bedore storing the sticky event in tee0:sink pad a new tee src pad is added (tee:src_1) - The stream-start is NOT forwarded because the forwarding iteration has already finished - the stream start is NOT stored because tee0:sink has stored the event yer 4 - [streaming_thread] caps event arrives to the tee0:sink pad 4.1 - it is forwarded to all tee src pads and to fakesink0:sink and fakesink:1:sink pads So, fakesink1:sink receives the caps event without having the stream-start event 5 - [app_thread] 5.1 - fakesink1:sink is unlinked from tee:src_1 5.2 - tee:src_1 is released 5.3 - fakesink1:sink is linked to a new tee src pad (tee:src_2) 5.3.1 - stream-start event is stored in tee:src_2 5.3.2 - stream-start event is tried to store into fakesik1:sink Here we have the misordering error
Created attachment 330537 [details] [review] tee: fix race conditions while forwarding sticky events Fix the problems caused by the race conditions in tee and in pad.
Created attachment 330571 [details] [review] tee: fix race conditions while forwarding sticky events
Comment on attachment 324923 [details] [review] tee: Set GST_PAD_FLAG_PROXY_CAPS before forwarding sticky events Attachment 324923 [details] pushed as 25bf63d - tee: Set GST_PAD_FLAG_PROXY_CAPS before forwarding sticky events
Created attachment 369659 [details] [review] tee: fix race conditions while forwarding sticky events Replace to fix event refcount issue.
Review of attachment 369659 [details] [review]: Does it passes all tests without deadlock ? Did you run gst-validate ? Can you provide a unit test that triggers the race ? This patch is very scary, specially that the is no doc in the commit to explain what it is fixing and why it won't cause issues.
Review of attachment 369659 [details] [review]: This has the same problem as slomo's patches adding the stream lock. ::: plugins/elements/gsttee.c @@ +561,3 @@ + + GST_TEE_EVENTS_LOCK (GST_PAD_PARENT (pad)); + res = gst_pad_event_default (pad, parent, event); I think this could block at the sink on a preroll if it's a GAP event
Hello @stormer and @ocrete, first of all thanks for you feedback. Answering @stormer questions: > Does it passes all tests without deadlock? Yes, unit tests pass in all projects (gstreamer, gst-plugins-XYZ, etc.) > Did you run gst-validate? How I can run it? > Can you provide a unit test that triggers the race? Make a deterministic unit test is quite complex, but if you take a look at my explanation about the race condition, it is quite clear that it exist. > This patch is very scary, specially that the is no doc in the commit to explain what it is fixing and why it won't cause issues. First of all we all need to agree about the solution. For now you can take this issue thread as doc, where all details are discussed. Answering @ocrete: > I think this could block at the sink on a preroll if it's a GAP event Could you please put an example? In addition, I would like to mention that this patch is running in our production environment (since more than one year ago) fixing the race condition and we didn't notice any deadlock. Anyway, it might have a deadlock that is not reproducible in our use case.
Created attachment 373443 [details] Simple program to force deadlock Hi folks!! A rare but real use case has been found that shows that attachment 369659 [details] [review] provokes a deadlock. I have invested some efforts to understand it and I achieved the simple program attached that reproduces the deadlock: There are 3 threads: - app_thread: performs the initial pipeline creation and waits until the expected actions are done - src_streaming_thread: managed by the task of the SrcPad of a FakeSrc - queue_streaming_thread: managed by the task of the SrcPad of a Queue Requirements to force the deadlock: - Make the fakesrc to push 2 buffers and the EOS - Make the queue to be filled with 1 buffer The race condition that provokes the deadlock is: 1 - [app_thread] tee0, queue0 and fakesink0 are linked 1.1 - waits until the buffer from (5.3) is pushed. 2 - [src_streaming_thread] push the first buffer 2.1 - it is forwarded to tee0:src_0 and queue0:sink 3 - [queue_streaming_thread] push the first buffer 3.1 - it is blocked at buffers_probe WAITING to link tee and fakesink1 4 - [src_streaming_thread] push the second buffer 4.1 - it is forwarded to tee0:src_0 and queue0:sink 4.2 - as queue_streaming_thread is blocked, the queue is filled 5 - [src_streaming_thread] push the EOS 5.1 - tee: hold GST_TEE_EVENTS_LOCK and push EOS to tee:src_0 5.2 - SIGNAL queue_streaming_thread to link tee and fakesink1 5.3 - push a new buffer downstream 5.4 - LOCKED because the queue is OVERRUN 6 - [queue_streaming_thread] perform tee and fakesink1 linking 6.1 - LOCKED at gst_tee_request_new_pad because GST_TEE_EVENTS_LOCK is hold by the src_streaming_thread The deadlock happens due to (5.4) and (6.1)
Created attachment 373444 [details] [review] tee: fix race conditions while forwarding sticky events Fix sticky events misordering avoiding deadlock. In other words this new patch satisfies the test apps attachment 330267 [details] and attachment 373443 [details].
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/122.