GNOME Bugzilla – Bug 755329
Race condition around finish_unprepare
Last modified: 2016-12-05 09:12:40 UTC
Created attachment 311737 [details] [review]
finish_unprepare race condition fix
finish_unprepare could be called either from default_handle_message or from gst_rtsp_media_unprepare. In first case it is not protected by the mutex. The patch seems to fix the problem but I'm not sure that moving gst_rtsp_media_set_status doesn't break anything else.
I am seeing a race condition in the finish_unprepare() function too. I am able to easily reproduce the issue by having multiple clients simultaneously disconnect from my server. Occasionally, this will result in two threads simultaneously entering the finish_unprepare() function, which results in one thread removing priv->rtpbin from the bin and setting it to NULL and the next process will then get errors on the bin_remove.
I think the problem stems from the unlock/lock that happens within finish_unprepare() :
/* release the lock on shutdown, otherwise pad_added_cb might try to
* acquire the lock and then we deadlock */
set_state (media, GST_STATE_NULL);
By unlocking, this gives other threads the ability to progress into the finish_unprepare function as well.
We could either revise the locking strategy, or add a check after the finish_unprepare acquires the the lock on priv->state_lock that could then short-circuit the remainder of the function.
As a quick fix, I've added added a check for the UNPREPARED state right after the g_rec_mutex_lock. Seems to have helped, but I am unsure if it will have any unintended consequences.
I forgot to mention all of my tests were with v1.8.1 on Linux.
Review of attachment 311737 [details] [review]:
Makes sense but does not seem complete yet :) Can you update?
Also please attach it as a "git format-patch" style patch with a proper commit message explaining what happens when and why
@@ +1890,3 @@
+ g_rec_mutex_unlock (&priv->state_lock);
Locking should be needed for the other messages too then with the same reasoning
Review of attachment 311737 [details] [review]:
The locking appears to be redundant as the mutex is already locked in bus_message function. Simply moving gst_rtsp_media_set_status should be sufficient.
Created attachment 333562 [details] [review]
finish_unprepare race condition fix 2
Created attachment 333563 [details] [review]
double rtpbin removing case
I reproduced this bug by sending eos message and launching gst_rtsp_media_unprepare at the same time. I made that case like gstcheck test case.
This is not unit test, but gstcheck system fails on error with it at least on my machine.
(In reply to Nikita Bobkov from comment #5)
> Created attachment 333562 [details] [review] [review]
> finish_unprepare race condition fix 2
Fix the problem, at least in my case.
Created attachment 338497 [details] [review]
GST_RTSP_MEDIA_STATUS_UNPREPARING status check inside finish_unprepare
> (In reply to Nikita Bobkov from comment #5)
> > Created attachment 333562 [details] [review] [review] [review]
> > finish_unprepare race condition fix 2
> Fix the problem, at least in my case.
Ok, it wasn't the solution. The problem of gst_rtsp_media_set_status moving is that it makes the sigfault because of the double source removing.
I suggest to do a status check to GST_RTSP_MEDIA_STATUS_UNPREPARING after lock inside "finish_unprepare".
Why is GST_RTSP_MEDIA_STATUS_UNPREPARING instead of GST_RTSP_MEDIA_STATUS_UNPREPARED like was suggested in comment #1?
Because, besides unprepare functions, we have "gst_rtsp_media_prepare" function. And it is possible that we will have following case:
1) "finish_unprepare" from "default_handle_message" happens and changes status to GST_RTSP_MEDIA_STATUS_UNPREPARED.
2) we go to "gst_rtsp_media_unprepare" and if the status is GST_RTSP_MEDIA_STATUS_UNPREPARED (it is) then go to "finish_unprepare".
3) we have unlock mutex inside "finish_unprepare".
4) we go to "gst_rtsp_media_prepare" and lock the mutex, then we check for GST_RTSP_MEDIA_STATUS_UNPREPARED (still it is) and set status to GST_RTSP_MEDIA_STATUS_PREPARING, unlock the mutex.
5) we lock back the mutex inside "finish_unprepare" (see (3)) and check for GST_RTSP_MEDIA_STATUS_UNPREPARED (it's already GST_RTSP_MEDIA_STATUS_PREPARING (4)) and make some unpreparing things with working media.
6) we have a sigfault and many CRITICAL errors before it.
I forgot to say that the version is 1.8.2
Is the last patch the only thing needed here and the others are obsolete, or are the others still needed in one way or another?
(In reply to Sebastian Dröge (slomo) from comment #10)
> Is the last patch the only thing needed here and the others are obsolete, or
> are the others still needed in one way or another?
The latest patch is sufficient.
Review of attachment 338497 [details] [review]:
@@ +3021,1 @@
AFAIU this has the problem that multiple calls to unprepare will not work well.
What would happen is that the second call would immediately go here and call finish_unprepare()... which is not really what you want (the we might have to wait for EOS first for example).
Maybe if we're currently unpreparing, just do nothing at all here?
Author: Kseniia Vasilchuk <firstname.lastname@example.org>
Date: Tue Oct 25 15:41:28 2016 +0300
media: Fix race condition around finish_unprepare() if called multiple time
It actually makes sense after checking again, thanks! It keeps previous behaviour, while removing the race condition that causes crashes and other interesting effects.