GNOME Bugzilla – Bug 701110
videomixer: sinkpads GSList is not protected for multi-threading
Last modified: 2014-07-02 12:37:06 UTC
While fixing a z-order issue, I realized that the mixer sinkpads GSList is not protected. If a pad is created, a zorder is changed, or a pad is removed while the streaming thread is iterating the list, it may crash due to pointer being modified in parallel. Considering this, there might be many other threading issues in this element.
Is this still the case? The sinkpads list seems to be protected by GST_VIDEO_MIXER2_LOCK (mix) everywhere afaict.
Created attachment 267052 [details] example that causes segmentation fault on the videomixer Talking with Sebastian on the list: http://lists.freedesktop.org/archives/gstreamer-devel/2014-January/045768.html This example have a pipe with videotestsrc ! capsfilter ! videomxier ! xvimagesink. Then on PLAYING_STATE I do this: while (TRUE){ sleep(1); add_mixer_input(); //adds a new bin to the mixer sleep(1); remove_mixer_input(); // removes the previously added bin. } The program causes segfault, and always crashed in the same point. If I run the program with GST_DEBUG=4 the program crashes more quickly. One more thing, if I don't call release_request_pad on mixer, the program seems to doesn't blow up.
I've seen that the videomixer blows up beacuse the function ptr to the function that does the blending is NULL. This pointer was reseted when the caps are reseted. I can try to fix the bug if you give me some help :) I've already made an ugly fix to the videomixer. Basically, if the function's pointer is NULL I reject the buffer. This produces a green frame on the output but the pipeline doesn't blow up. But now the pipeline freezes beacuse a deadlock(I think that is another bug). The deadlock was produced by the chain function of one sink and the relase_request_pad on another sink. Here is the sequence of the deadlock: 1) gst_collect_pads_chain function acquires GST_COLLECT_PADS_STREAM_LOCK. 2) gst_videomixer2_release_pad function acquires GST_VIDEO_MIXER2_LOCK. 2 )gst_collect_pads_chain function calls to gst_videomixer2_collected() function. 3) gst_videomixer2_collected() function tries to acquire GST_VIDEO_MIXER2_LOCK. 4) gst_videomixer2_release_pad() function tries to acquire GST_COLLECT_PADS_STREAM_LOCK. So, gst_videomixer2_release_pad waits on COLLECT_PADS_STREAM_LOCK (acquired by gst_videomixer2_collected) and gst_videomixer2_collected waits on GST_VIDEO_MIXER2_LOCK(acquired by gst_videomixer2_release_pad).
If it's what you have described, it's a locking order issue. The correct order is STREAM_LOCK follow by OBJECT_LOCK. If you have OBJECT_LOCK already and need STREAM_LOCK, you need to unlock the OBJECT_LOCK first, then take the two locks in the right order (in this case you'll need to check your state for changes). Hopes that will help you make a fix, would be really appreciated. gst_videomixer2_release_pad() would be the one in fault.
Created attachment 271223 [details] workaround for the segfault and a patch for the freezing bug. Both are for videomixer Nicolas, here is the patch. I've tested that works in the following way: 1) If I run the example without the patch it freezes after one hour of running aproximately. If I run 3 instances of the example at the same time, one of them freezes after 5 minutes at most. 2) I ran 4 instances of the example with the patch applied and it was running for 5 days without freezing. Apparently seems that the patch works. Please confirm if you want me to change something. If you want to test the patch, first at all you will have to apply another patch that is a workaround for the segfault that I describe on comment 2 and 3. The real patch is for the freezing case. Until now I've not had the time to investigate why the ptr to the mix function was seted to NULL (I think it is because a caps's renegotiation when I remove the pad).
(In reply to comment #5) > Created an attachment (id=271223) [details] [review] > workaround for the segfault and a patch for the freezing bug. Both are for > videomixer Ok, I found why I couldn't review this patch. Patch need to be attach with the patch checkbox, and in raw text, no zip file. In order to produce a patch in the format we expect, so this, assuming 1 patch: git format-patch -1 This will create a 00001-Description.path file, just attach this file. We have a tool on Bugzilla to review and comment about the code. Thanks for your time !
For the reference: >+ GST_VIDEO_MIXER2_UNLOCK (mix); > GST_COLLECT_PADS_STREAM_LOCK (mix->collect); >+ GST_VIDEO_MIXER2_LOCK (mix); This looks appropriate, and match the description I've made. The other patch seems unrelated, maybe file another bug, or just explain why you need this null checking (usually we try and find the source, and avoid NULL check if possible).
Note though, it might not be fixing what is described in this bug, which is about the GSList protection, so don't close until the locking has been reviewed.
(In reply to comment #6) > (In reply to comment #5) > > Created an attachment (id=271223) [details] [details] [review] > > workaround for the segfault and a patch for the freezing bug. Both are for > > videomixer > > Ok, I found why I couldn't review this patch. Patch need to be attach with the > patch checkbox, and in raw text, no zip file. > > In order to produce a patch in the format we expect, so this, assuming 1 patch: > > git format-patch -1 > > This will create a 00001-Description.path file, just attach this file. We have > a tool on Bugzilla to review and comment about the code. Thanks for your time ! Sorry, I didn't know that. Do you want me to upload again the patch in the correct way? (In reply to comment #7) > For the reference: > > >+ GST_VIDEO_MIXER2_UNLOCK (mix); > > GST_COLLECT_PADS_STREAM_LOCK (mix->collect); > >+ GST_VIDEO_MIXER2_LOCK (mix); > > This looks appropriate, and match the description I've made. The other patch > seems unrelated, maybe file another bug, or just explain why you need this null > checking (usually we try and find the source, and avoid NULL check if > possible). I am completly agree with you. If I find the problem's source I will ask you how to continue :).
(In reply to comment #9) > I am completly agree with you. If I find the problem's source I will ask you > how to continue :). Yes please, so I don't have to craft the Author line myself, and while doing do, add a link to this bug on the last line of your commit message.
Created attachment 271268 [details] [review] Fix a locking order issue on videomixer I've attached the patch. If you have any comments please tell me.
Just a small update, yesterday I've ran the test script over night to make sure I could reproduce the crash. That is done, I'll apply the workaround and let it run to see if I can get the deadlock. Then apply the locking fix to see if it fixes it, or at least move the deadlock somewhere else.
Created attachment 271694 [details] [review] Patch to protect a critical variable values change I think that this patch solves the race condition setting src_caps. Sometimes, the function blend_buffers uses fill_color and its value is null because the function src_setcaps set its to NULL out the lock.
(In reply to comment #12) > Just a small update, yesterday I've ran the test script over night to make sure > I could reproduce the crash. That is done, I'll apply the workaround and let it > run to see if I can get the deadlock. Then apply the locking fix to see if it > fixes it, or at least move the deadlock somewhere else. Nicolas, could you get the deadlock issue? if you did, did the patch fix it?
Meanwhile a new mixer has been introduced, called compositor iirc. I wonder what we should do with this ?
compositor won't be in 1.4, so would be nice to still fix this.
Review of attachment 271268 [details] [review]: ::: gst/videomixer/videomixer2.c @@ +2160,2 @@ GST_COLLECT_PADS_STREAM_LOCK (mix->collect); + GST_VIDEO_MIXER2_LOCK (mix); Normally, because we have unlocked, we should be validating the state before continuing. Is there any other calls that should induce a different behaviour here ?
Review of attachment 271694 [details] [review]: Looks good, but the commit message should be more descriptive. Think of someone that hits the same issue as you did and then try to write a message that would help that person understanding this patch will fix what he see.
Created attachment 279223 [details] [review] Patch to avoid a segmentation fault when src caps are configured Hi Nicolas, Thank you for your comment. I have uploaded the patch changing the commit message. I also have added a short description.
Comment on attachment 279223 [details] [review] Patch to avoid a segmentation fault when src caps are configured commit 4ed74d3ab0ffd474d40728922f6277a1d90f69ad Author: David Fernandez <d.fernandezlop@gmail.com> Date: Thu Mar 13 10:35:30 2014 +0100 videomixer2: Solve segmentation fault when src caps are configured Change function pointers to NULL while holding the lock to avoid race conditions https://bugzilla.gnome.org/show_bug.cgi?id=701110
Thanks for the patch! Does this fix all the problems in this bug already?
I think so. I'm using the videomixer2 in a dynamic pipeline and it is more robust using this patch and this one http://lists.freedesktop.org/archives/gstreamer-commits/2014-March/077911.html
@David Fernandez: You should give a try to the new GstAggregator based videomixer called "compositor" in -plugins-bad. It fixes quite a few limitation of GstCollectPads and is going to replace videomixer in the mid term.
@Thibault Saunier: Thank you Thibault. I have taken a look and it seems work fine. The action to add or remove pads is faster than in the videomixer.