GNOME Bugzilla – Bug 736322
Unlimited queue for PAUSE and OPTION response
Last modified: 2018-05-04 09:52:56 UTC
Created attachment 285732 [details] [review] gst-rtsp-server patch When setting "drop-backlog" is false and the watch queue is full.Then there will be a deadlock when handling pause and options request. I assume that when "drop-backlog" is true and watch queue is full the responses will be lost. This is not tested. The solution is to temporary allow unlimited queue size for watch when handling pause and options requests.
Review of attachment 285732 [details] [review]: Just to be sure, the problem here is that if the send queue is full then the commands will be dropped? Which leads to the problem that the client will be confused as there's no response? ::: gst/rtsp-server/rtsp-client.c @@ +1110,3 @@ + /* Restore limit on watch queue */ + gst_rtsp_watch_set_send_backlog (priv->watch, 0, 100); Can you create a #define for this number somewhere at the top of the file?
Created attachment 286189 [details] [review] gst-rtsp-server patch New patch added. Just to be sure, the problem here is that if the send queue is full then the commands will be dropped? Which leads to the problem that the client will be confused as there's no response? Answer: No. If running with the settings "drop-backlog" set to TRUE I believe the above is correct But we are running with "drop-backlog" set to FALSE. Then we got a deadlock· when backlog queue is full and we got an incoming rtsp request. ( Not Teardown then queue is flushed )
+ Trace 234092
Thread 20 (Thread 3500)
Thread 19 (Thread 3499)
Thread 18 (Thread 3498)
Thread 9 (Thread 3348)
Thread 4 (Thread 3343)
Thread 2 (Thread 3488)
it looks like the deadlock is caused by one of your previous patches that added locking in stream-transport, can you try without?
Oh !! I mistakenly pasted the wrong backtrace.
+ Trace 234093
Thread 20 (Thread 3201)
Thread 19 (Thread 3200)
Thread 9 (Thread 3050)
Thread 4 (Thread 3045)
Thread 3 (Thread 3191)
Thread 2 (Thread 3189)
It's the send_lock that is causing trouble. It is taken when sending a message (from the appsink render callback). The callback is then blocking because the queue is full. When a pause request comes in, it tries to pause the appsink, which will deadlock because it is still blocked in the render callback. What should happen is that when pausing/stopping etc, we make sure no render method is blocking (by flushing it or unlocking it or so) before doing the state change. Increasing the queue limit is one way of doing that.
If flushing then we lose data that's OK for teardown and this is done already. But for other requests it is not so good to loose data. To temporary increase the limit for ctrl data is in my mind like prioritize ctrl data before other data and that's sounds like good idea to me. However there are theoretical risk that the queue will grow uncontrolled. The patch could maybe be redesigned so there is two limits. One for data and one for ctrl-data . I also think that this is a trouble in mode "drop-backlog" is true but then the ctrl data will be lost. To temporary increase queue size solves both cases. Note: Whit ctrl data I mean RTSP requests and it's responses.
Even with 2 limits this will not work, the reason that the appsink callback is blocking is because the limit for the data is reached, it's then trying to to a state change to PAUSED, waiting for the appsink to unblock, it's not even trying to send the PAUSED reply, it will do that after the pipeline paused. So the only thing that can happen is to unblock the wait_backlog function without losing the packet, which you can only do by adding it to the queue and increasing the queue length. What would be nice is that we could add only the currently blocking packet to the queue and then make appsink pause on the next packet but I don't see how that will work, we'll need to just remove the limit and hope not too many packets flood the queue (they should be rate limited with timestamps). In any case, we can restore the old limits again after we paused so the queue should not grow unbounded.
So if I understand you correctly you think attached patch is good enough ?
commit 23b9d8fbb04ceff21110086528591849c728ae48 Author: Göran Jönsson <goranjn@axis.com> Date: Tue Sep 16 11:41:52 2014 +0200 client: raise the backlog limits before pausing We need to raise the backlog limits before pausing the pipeline or else the appsink might be blocking in the render method in wait_backlog() and we would deadlock waiting for paused. Fixes https://bugzilla.gnome.org/show_bug.cgi?id=736322 commit ebd9be59fe63b242d6c7f3c28088c73392927b3b Author: Göran Jönsson <goranjn@axis.com> Date: Tue Sep 16 11:29:38 2014 +0200 client: make define for the WATCH_BACKLOG See https://bugzilla.gnome.org/show_bug.cgi?id=736322
yes, however: I did not include the patch for the options request yet. How does this deadlock? There is no state change there and the reply should just be appended to the current data in the backlog and be consumed by the client when it can.
The effect is now that if the client is a slow reader and it issues a pause request, it will first get all the queued RTP packets before it gets the pause reply. I guess that is expected. We could keep separate queues for RTSP and RTP packets and allow the RTSP to jump in front of the RTP messages. This would make it so that you get the paused reply quickly and then some more RTP packets until the RTP queue is drained. Not sure if that is much better.
Options: I have never seen the problem for Options. I was believing the problem was related to the fact the queue was full and the part with state changing i didn't think of. I was more thinking on that it was possible to have a full queue and receive a Options request. So I added that also. But for the case when "drop-backlog" set to TRUE is not the options case relevant ? ( We are running drop-backlog" set to FALSE) Quick response double queues: I believe that can mess up things for clients.
This has been fixed already. The bug can be closed.