After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 702705 - rtspsrc does not pause properly (race condition)
rtspsrc does not pause properly (race condition)
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal blocker
: 1.0.8
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2013-06-20 02:28 UTC by Youness Alaoui
Modified: 2013-06-25 18:37 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Youness Alaoui 2013-06-20 02:28:03 UTC
There is a race condition in rtspsrc when you change state from PLAYING to PAUSED, as it will not pause its internal elements correctly.
When going from PLAYING to PAUSED, the rtspsrc must pause the rtpbin and send the PAUSE request to the server, but this does not happen, causing various issues. In my use case, the server will pause the playback and notify the clients that they need to pause as well through a SET_PARAMETER. Without properly doing the PAUSE request, the internal rtpbin to rtspsrc does not change state, and does not stop its RTCP thread which makes it signal the 'on-timeout' error after 30 seconds, since the server stops transmitting, which causes the rtspsrc to error.

The issue is that the way the PAUSE request is sent is by calling gst_rtspsrc_loop_send_cmd (rtspsrc, CMD_PAUSE, CMD_LOOP) to tell the rtspsrc task to send the CMD_PAUSE and handle the pause command, but the rtspsrc is processing the "CMD_LOOP" command (to receive requests from the server), so for that, a CMD_WAIT is first sent in order to interrupt the loop and pause the rtspsrc task, then the CMD_PAUSE is sent. The race condition happens this way :
rtspsrc task : exeucting CMD_LOOP
change_state : send CMD_WAIT
rtspsrc task : CMD_LOOP interrupted (connection is set to flushing)
change_state : send CMD_PAUSE
rtspsrc task : 'pausing task, reason flushing' (connection is now non-flushing)
rtspsrc task : send CMD_WAIT (to pause the task)
send_cmd : cancelling CMD_PAUSE in favor of the newest command CMD_WAIT

and so the CMD_PAUSE is overwritten (pending_cmd) and is never executed.
Another issue with this is that if I do the set_state in the handle-request signal handler, then this has no way of working because the task that needs to be paused (CMD_LOOP that needs to be canceled) is the one calling my signal handler.
I was able to work around this issue temporarily by making the change_state function wait until the rtspsrc task is paused before sending the CMD_PAUSE command to it, but this will cause a deadlock if the state gets changed from within a signal handler that is called from within the task itself, which forced me to create a g_idle_add to do my state change.
Do you have any suggestions on how to fix this properly ?

Here is my hackish workaround (along with state change in an idler in my application) :
@@ -6969,6 +6975,13 @@ gst_rtspsrc_change_state (GstElement * element, GstStateChange transition)
     case GST_STATE_CHANGE_PLAYING_TO_PAUSED:
       /* unblock the tcp tasks and make the loop waiting */
       gst_rtspsrc_loop_send_cmd (rtspsrc, CMD_WAIT, CMD_LOOP);
+
+      GST_OBJECT_LOCK (rtspsrc);
+      while (rtspsrc->busy_cmd != CMD_WAIT) {
+        GST_OBJECT_UNLOCK (rtspsrc);
+        GST_OBJECT_LOCK (rtspsrc);
+      }
+      GST_OBJECT_UNLOCK (rtspsrc);
       break;
     case GST_STATE_CHANGE_PAUSED_TO_READY:
       break;

And here is another patch that was required to restore the CMD_LOOP functionality of the source after the CMD_PAUSE has been executed :
@@ -6833,7 +6839,7 @@ gst_rtspsrc_thread (GstRTSPSrc * src)
 
   GST_OBJECT_LOCK (src);
   cmd = src->pending_cmd;
-  if (cmd == CMD_RECONNECT || cmd == CMD_PLAY || cmd == CMD_LOOP)
+  if (cmd == CMD_RECONNECT || cmd == CMD_PLAY || cmd == CMD_PAUSE || cmd == CMD_LOOP)
     src->pending_cmd = CMD_LOOP;
   else
     src->pending_cmd = CMD_WAIT;
Comment 1 Wim Taymans 2013-06-20 12:49:00 UTC
Your analysis is spot on and the proposed solution can be simplified by simply taking and releasing the stream lock, like how the seek does it.

I don't quite understand what the second patch does, it is needed? when is it needed?

commit b96d931bf4ce5c3c8c5be4fa76f5c75bca85d0c4
Author: Wim Taymans <wim.taymans@collabora.co.uk>
Date:   Thu Jun 20 14:43:47 2013 +0200

    rtspsrc: fix race in state change to paused
    
    When we go to paused, we first flush the connection and then send the pause
    command. As a result of the flushing, the scheduled paused command can get
    lost. Wait until the connection is completely flushed and the rtsp task is
    waiting before issuing the paused or playing request.
    
    Fixes https://bugzilla.gnome.org/show_bug.cgi?id=702705
Comment 2 Youness Alaoui 2013-06-20 20:53:07 UTC
Thanks Wim, that does seem like a much better solution to ensuring the task is stopped!
For the second patch, it's because once the CMD_PAUSE is handled, the task is paused (CMD_WAIT) and it stops receiving requests from the server. The second patch makes it go into CMD_LOOP instead of CMD_WAIT once the CMD_PAUSE is processed, so we can receive requests from the server to start playing again.
Comment 3 Youness Alaoui 2013-06-20 22:22:21 UTC
By the way, I would suggest adding a comment/documentation stating not to change the element's state from the handle-request signal, since it will cause a deadlock.
Comment 4 Wim Taymans 2013-06-25 18:37:05 UTC
commit a4be0c6de36e14fd4cc409d17a98298cead7b1a4
Author: Wim Taymans <wim.taymans@collabora.co.uk>
Date:   Tue Jun 25 20:36:18 2013 +0200

    rtspsrc: add some more docs to handle-request signal
    
    See https://bugzilla.gnome.org/show_bug.cgi?id=702705