Bug 755329 – Race condition around finish_unprepare

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 755329 - Race condition around finish_unprepare


Summary:	Race condition around finish_unprepare


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-rtsp-server
Version:	1.4.5
Hardware:	Other Windows

Importance:	Normal normal
Target Milestone:	1.10.3
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-09-21 09:04 UTC by Nikita Bobkov
Modified:	2016-12-05 09:12 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
finish_unprepare race condition fix (1.14 KB, patch) 2015-09-21 09:04 UTC, Nikita Bobkov	none	Details \| Review
finish_unprepare race condition fix 2 (1.32 KB, patch) 2016-08-18 15:20 UTC, Nikita Bobkov	none	Details \| Review
double rtpbin removing case (2.12 KB, patch) 2016-08-18 15:41 UTC, Kseniya Vasilchuk	none	Details \| Review
GST_RTSP_MEDIA_STATUS_UNPREPARING status check inside finish_unprepare (1.24 KB, patch) 2016-10-26 10:34 UTC, Kseniya Vasilchuk	committed	Details \| Review

Description Nikita Bobkov 2015-09-21 09:04:40 UTC

Created attachment 311737 [details] [review]
finish_unprepare race condition fix

finish_unprepare could be called either from default_handle_message or from gst_rtsp_media_unprepare. In first case it is not protected by the mutex. The patch seems to fix the problem but I'm not sure that moving gst_rtsp_media_set_status doesn't break anything else.

Comment 1 Jake Foytik 2016-06-16 13:25:24 UTC

I am seeing a race condition in the finish_unprepare() function too. I am able to easily reproduce the issue by having multiple clients simultaneously disconnect from my server. Occasionally, this will result in two threads simultaneously entering the finish_unprepare() function, which results in one thread removing priv->rtpbin from the bin and setting it to NULL and the next process will then get errors on the bin_remove.

I think the problem stems from the unlock/lock that happens within finish_unprepare() :
  /* release the lock on shutdown, otherwise pad_added_cb might try to
   * acquire the lock and then we deadlock */
  g_rec_mutex_unlock (&priv->state_lock);
  set_state (media, GST_STATE_NULL);
  g_rec_mutex_lock (&priv->state_lock);

By unlocking, this gives other threads the ability to progress into the finish_unprepare function as well.

We could either revise the locking strategy, or add a check after the finish_unprepare acquires the the lock on priv->state_lock that could then short-circuit the remainder of the function.

As a quick fix, I've added added a check for the UNPREPARED state right after the g_rec_mutex_lock. Seems to have helped, but I am unsure if it will have any unintended consequences.

Comment 2 Jake Foytik 2016-06-16 13:47:05 UTC

I forgot to mention all of my tests were with v1.8.1 on Linux.

Comment 3 Sebastian Dröge (slomo) 2016-06-17 10:05:39 UTC

Review of attachment 311737 [details] [review]:

Makes sense but does not seem complete yet :) Can you update?

Also please attach it as a "git format-patch" style patch with a proper commit message explaining what happens when and why

Thanks!

::: gst/rtsp-server/rtsp-media.c
@@ +1890,3 @@
         finish_unprepare (media);
       }
+      g_rec_mutex_unlock (&priv->state_lock);

Locking should be needed for the other messages too then with the same reasoning

Comment 4 Nikita Bobkov 2016-08-18 14:17:40 UTC

Review of attachment 311737 [details] [review]:

The locking appears to be redundant as the mutex is already locked in bus_message function. Simply moving gst_rtsp_media_set_status should be sufficient.

Comment 5 Nikita Bobkov 2016-08-18 15:20:57 UTC

Created attachment 333562 [details] [review]
finish_unprepare race condition fix 2

Comment 6 Kseniya Vasilchuk 2016-08-18 15:41:13 UTC

Created attachment 333563 [details] [review]
double rtpbin removing case

I reproduced this bug by sending eos message and launching gst_rtsp_media_unprepare at the same time. I made that case like gstcheck test case.
This is not unit test, but gstcheck system fails on error with it at least on my machine.

Comment 7 Kseniya Vasilchuk 2016-08-18 15:59:07 UTC

(In reply to Nikita Bobkov from comment #5)
> Created attachment 333562 [details] [review] [review]
> finish_unprepare race condition fix 2

Fix the problem, at least in my case.

Comment 8 Kseniya Vasilchuk 2016-10-26 10:34:24 UTC

Created attachment 338497 [details] [review]
GST_RTSP_MEDIA_STATUS_UNPREPARING status check inside finish_unprepare

> (In reply to Nikita Bobkov from comment #5)
> > Created attachment 333562 [details] [review] [review] [review]
> > finish_unprepare race condition fix 2
> 
> Fix the problem, at least in my case.

Ok, it wasn't the solution. The problem of gst_rtsp_media_set_status moving is that it makes the sigfault because of the double source removing.
I suggest to do a status check to GST_RTSP_MEDIA_STATUS_UNPREPARING after lock inside "finish_unprepare".

Why is GST_RTSP_MEDIA_STATUS_UNPREPARING instead of GST_RTSP_MEDIA_STATUS_UNPREPARED like was suggested in comment #1?
Because, besides unprepare functions, we have "gst_rtsp_media_prepare" function. And it is possible that we will have following case:

1) "finish_unprepare" from "default_handle_message" happens and changes status to GST_RTSP_MEDIA_STATUS_UNPREPARED.
2) we go to "gst_rtsp_media_unprepare" and if the status is GST_RTSP_MEDIA_STATUS_UNPREPARED (it is) then go to "finish_unprepare".
3) we have unlock mutex inside "finish_unprepare".
4) we go to "gst_rtsp_media_prepare" and lock the mutex, then we check for GST_RTSP_MEDIA_STATUS_UNPREPARED (still it is) and set status to GST_RTSP_MEDIA_STATUS_PREPARING, unlock the mutex.
5) we lock back the mutex inside "finish_unprepare" (see (3)) and check for GST_RTSP_MEDIA_STATUS_UNPREPARED (it's already GST_RTSP_MEDIA_STATUS_PREPARING  (4)) and make some unpreparing things with working media.
6) we have a sigfault and many CRITICAL errors before it.

Comment 9 Kseniya Vasilchuk 2016-10-28 14:48:13 UTC

I forgot to say that the version is 1.8.2

Comment 10 Sebastian Dröge (slomo) 2016-10-31 12:08:57 UTC

Is the last patch the only thing needed here and the others are obsolete, or are the others still needed in one way or another?

Comment 11 Kseniya Vasilchuk 2016-10-31 13:50:58 UTC

(In reply to Sebastian Dröge (slomo) from comment #10)
> Is the last patch the only thing needed here and the others are obsolete, or
> are the others still needed in one way or another?


The latest patch is sufficient.

Comment 12 Kseniya Vasilchuk 2016-11-16 16:34:09 UTC

ping

Comment 13 Sebastian Dröge (slomo) 2016-11-17 08:39:29 UTC

Review of attachment 338497 [details] [review]:

::: gst/rtsp-server/rtsp-media.c
@@ +3021,1 @@
     finish_unprepare (media);

AFAIU this has the problem that multiple calls to unprepare will not work well.

What would happen is that the second call would immediately go here and call finish_unprepare()... which is not really what you want (the we might have to wait for EOS first for example).


Maybe if we're currently unpreparing, just do nothing at all here?

Comment 14 Sebastian Dröge (slomo) 2016-12-01 14:39:59 UTC

commit 09e499387d3ffce24a09bf31c365fabb4354013f
Author: Kseniia Vasilchuk <vasilchukkseniia@gmail.com>
Date:   Tue Oct 25 15:41:28 2016 +0300

    media: Fix race condition around finish_unprepare() if called multiple time
    
    https://bugzilla.gnome.org/show_bug.cgi?id=755329

Comment 15 Sebastian Dröge (slomo) 2016-12-01 14:41:34 UTC

It actually makes sense after checking again, thanks! It keeps previous behaviour, while removing the race condition that causes crashes and other interesting effects.