GNOME Bugzilla – Bug 703088
rtspsrc parse error race condition
Last modified: 2013-06-26 13:16:15 UTC
(A bit hard to explain, bear with me)
When using rtspsrc which receives requests from the server, the rtsp_connection reads and parses the requests/responses. If rtspsrc suddenly decides to cancel a previous command to send a new one, then it will call rtsp_connection_flush which will set a GCancellable, which cancels the current rtsp_connection_send or rtsp_connection_receive if there was one. The connection's parser state machine gets reset to STATE_START and the next rtsp_connection_send will fail to read the response because it will start parsing data from the state STATE_START but the socket's content was not flushed, so it will actually be receiving bytes from the middle of the previous request that it got cancelled while parsing...
To make it more clear, I've added this line to build_next (inside the while(TRUE) ) in gstrtspconnection.c (from g-p-base) :
g_debug ("State is %d (%d): %s", builder->state, builder->offset, builder->buffer);
and in parse_request_line, I've added this line :
g_debug ("Parse request line : %s", bptr);
And I'm getting these debug messages when I test my app :
** (sa-client:5469): DEBUG: State is 0 (0):
** (sa-client:5469): DEBUG: State is 3 (1): S
** (sa-client:5469): DEBUG: Parse request line : SET_PARAMETER rtsp://127.0.0.1:8554/media RTSP/1.0
** (sa-client:5469): DEBUG: State is 3 (0): SET_PARAMETER rtsp://127.0.0.1:8554/media RTSP/1.0
** (sa-client:5469): DEBUG: State is 3 (0): CSeq
[... add this point, we do something (change state or other) that causes rtspsrc to want to send a request to the server, causing it to 'cancel previous request', enable the connection_flush, and later disable it again...]
0:00:03.767319868 5469 0x7ab770 DEBUG rtspsrc gstrtspsrc.c:4043:gst_rtspsrc_loop_udp:<rtspsrc0> doing receive with timeout 54 seconds
** (sa-client:5469): DEBUG: State is 0 (0):
** (sa-client:5469): DEBUG: State is 3 (1): :
** (sa-client:5469): DEBUG: Parse request line : : text/parameters
** (sa-client:5469): DEBUG: parse error protocol version
The "parse error protocol version" was added to the parse_protocol_version function, and it helped me find which of the "res = GST_RTSP_EPARSE" was causing me to get the parse errors...
This leads to this behavior :
0:00:03.767728667 5469 0x7ab770 WARN rtspsrc gstrtspsrc.c:4083:gst_rtspsrc_loop_udp:<rtspsrc0> warning: Unhandled return value -8.
0:00:03.767754330 5469 0x7ab770 WARN rtspsrc gstrtspsrc.c:4154:gst_rtspsrc_loop_udp:<rtspsrc0> error: Could not receive message. (Parse error)
0:00:03.767769012 5469 0x7ab770 DEBUG rtspsrc gstrtspsrc.c:4417:gst_rtspsrc_loop:<rtspsrc0> pausing task, reason error
0:00:03.767777070 5469 0x7ab770 WARN rtspsrc gstrtspsrc.c:4436:gst_rtspsrc_loop:<rtspsrc0> error: Internal data flow error.
0:00:03.767782291 5469 0x7ab770 WARN rtspsrc gstrtspsrc.c:4436:gst_rtspsrc_loop:<rtspsrc0> error: streaming task paused, reason error (-5)
And it's the cause for my rtspsrc to stop working.
As you can see, the fact that rtspsrc tries to send a command and calls rtsp_connection_flush will cause it to stop parsing a request/response (in this case in the middle of receiving the "Content-type" header, I've seen it happen all over the place, in the middle of the body as well).
With my code which sends SET_PARAMETER to notify clients of GstBuffering messages (which causes them to go to PAUSED state, hence trying to send the PAUSE request), I can get this error in under 5 seconds, very easy to reproduce and happens all the time and makes my application completely unusable.
I'm really not sure on how to best fix this... the way I see it, requests should be queued and should not get cancelled, because even if you cancel a request, the server will get an invalid state if you don't finish sending the entire request.. also, a response must be read completely otherwise you will get parse errors on the client side.. the connection_flush should actually only cancel the receive if it's blocked waiting for the timeout, not if it's currently processing data.
We can't flush the socket's data either in rtsp_connection_flush because we have no guarantee that at that point the socket contains an entire request/response, so we could be flushing half of a request just the same...
How would you suggest fixing this? can it be done without redesigning how rtspsrc works?
> We can't flush the socket's data either in rtsp_connection_flush because we
> have no guarantee that at that point the socket contains an entire
> request/response, so we could be flushing half of a request just the same...
> How would you suggest fixing this? can it be done without redesigning how
> rtspsrc works?
It should not cancel and reset when it's busy reading something. I think it did that but it might be broken again because of the rewrite to gio. I'll have a look.
Can you check if this works?
Author: Wim Taymans <firstname.lastname@example.org>
Date: Wed Jun 26 15:03:05 2013 +0200
rtsp: make read uncancelable when reading a message
When we start to read a message, we need to continue reading until the end of
the message or else we lose track and cause parse errors. Use a variable
may_cancel to avoid cancelation after we read the first byte until we have
the complete message.
Is that enough ? I'm just wondering what happens if you send a request, cancel, send another request. The server will send two replies, do we handle this ?
(In reply to comment #3)
> Is that enough ? I'm just wondering what happens if you send a request, cancel,
> send another request. The server will send two replies, do we handle this ?
Not at the moment, for this it would need to match the Seq number of the request with that of the reply.
It's all a bit theoretical. I'm willing to implement all this if you make a unit-test :)