After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 703088 - rtspsrc parse error race condition
rtspsrc parse error race condition
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: Normal blocker
: 1.1.2
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2013-06-25 23:31 UTC by Youness Alaoui
Modified: 2013-06-26 13:16 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Youness Alaoui 2013-06-25 23:31:28 UTC
(A bit hard to explain, bear with me)
When using rtspsrc which receives requests from the server, the rtsp_connection reads and parses the requests/responses. If rtspsrc suddenly decides to cancel a previous command to send a new one, then it will call rtsp_connection_flush which will set a GCancellable, which cancels the current rtsp_connection_send or rtsp_connection_receive if there was one. The connection's parser state machine gets reset to STATE_START and the next rtsp_connection_send will fail to read the response because it will start parsing data from the state STATE_START but the socket's content was not flushed, so it will actually be receiving bytes from the middle of the previous request that it got cancelled while parsing...
To make it more clear, I've added this line to build_next (inside the while(TRUE) ) in gstrtspconnection.c (from g-p-base) :
    g_debug ("State is %d (%d): %s", builder->state, builder->offset, builder->buffer);
and in parse_request_line, I've added this line :
  g_debug ("Parse request line : %s", bptr);

And I'm getting these debug messages when I test my app :
** (sa-client:5469): DEBUG: State is 0 (0): 
** (sa-client:5469): DEBUG: State is 3 (1): S
** (sa-client:5469): DEBUG: Parse request line : SET_PARAMETER rtsp://127.0.0.1:8554/media RTSP/1.0
** (sa-client:5469): DEBUG: State is 3 (0): SET_PARAMETER rtsp://127.0.0.1:8554/media RTSP/1.0
** (sa-client:5469): DEBUG: State is 3 (0): CSeq

[... add this point, we do something (change state or other) that causes rtspsrc to want to send a request to the server, causing it to 'cancel previous request', enable the connection_flush, and later disable it again...]

0:00:03.767319868  5469       0x7ab770 DEBUG                rtspsrc gstrtspsrc.c:4043:gst_rtspsrc_loop_udp:<rtspsrc0> doing receive with timeout 54 seconds
** (sa-client:5469): DEBUG: State is 0 (0): 
** (sa-client:5469): DEBUG: State is 3 (1): :
** (sa-client:5469): DEBUG: Parse request line : : text/parameters
** (sa-client:5469): DEBUG: parse error protocol version

The "parse error protocol version" was added to the parse_protocol_version function, and it helped me find which of the "res = GST_RTSP_EPARSE" was causing me to get the parse errors...
This leads to this behavior : 
0:00:03.767728667  5469       0x7ab770 WARN                 rtspsrc gstrtspsrc.c:4083:gst_rtspsrc_loop_udp:<rtspsrc0> warning: Unhandled return value -8.
0:00:03.767754330  5469       0x7ab770 WARN                 rtspsrc gstrtspsrc.c:4154:gst_rtspsrc_loop_udp:<rtspsrc0> error: Could not receive message. (Parse error)
0:00:03.767769012  5469       0x7ab770 DEBUG                rtspsrc gstrtspsrc.c:4417:gst_rtspsrc_loop:<rtspsrc0> pausing task, reason error
0:00:03.767777070  5469       0x7ab770 WARN                 rtspsrc gstrtspsrc.c:4436:gst_rtspsrc_loop:<rtspsrc0> error: Internal data flow error.
0:00:03.767782291  5469       0x7ab770 WARN                 rtspsrc gstrtspsrc.c:4436:gst_rtspsrc_loop:<rtspsrc0> error: streaming task paused, reason error (-5)

And it's the cause for my rtspsrc to stop working.

As you can see, the fact that rtspsrc tries to send a command and calls rtsp_connection_flush will cause it to stop parsing a request/response (in this case in the middle of receiving the "Content-type" header, I've seen it happen all over the place, in the middle of the body as well). 
With my code which sends SET_PARAMETER to notify clients of GstBuffering messages (which causes them to go to PAUSED state, hence trying to send the PAUSE request), I can get this error in under 5 seconds, very easy to reproduce and happens all the time and makes my application completely unusable.

I'm really not sure on how to best fix this... the way I see it, requests should be queued and should not get cancelled, because even if you cancel a request, the server will get an invalid state if you don't finish sending the entire request.. also, a response must be read completely otherwise you will get parse errors on the client side.. the connection_flush should actually only cancel the receive if it's blocked waiting for the timeout, not if it's currently processing data.
We can't flush the socket's data either in rtsp_connection_flush because we have no guarantee that at that point the socket contains an entire request/response, so we could be flushing half of a request just the same...
How would you suggest fixing this? can it be done without redesigning how rtspsrc works?

Thanks.
Comment 1 Wim Taymans 2013-06-26 07:46:30 UTC
> We can't flush the socket's data either in rtsp_connection_flush because we
> have no guarantee that at that point the socket contains an entire
> request/response, so we could be flushing half of a request just the same...
> How would you suggest fixing this? can it be done without redesigning how
> rtspsrc works?

It should not cancel and reset when it's busy reading something. I think it did that but it might be broken again because of the rewrite to gio. I'll have a look.

> 
> Thanks.
Comment 2 Wim Taymans 2013-06-26 13:08:20 UTC
Can you check if this works?

commit 32a1deb404d585a3ffb306371013d831ca095bdd
Author: Wim Taymans <wim.taymans@collabora.co.uk>
Date:   Wed Jun 26 15:03:05 2013 +0200

    rtsp: make read uncancelable when reading a message
    
    When we start to read a message, we need to continue reading until the end of
    the message or else we lose track and cause parse errors. Use a variable
    may_cancel to avoid cancelation after we read the first byte until we have
    the complete message.
    
    Fixes https://bugzilla.gnome.org/show_bug.cgi?id=703088
Comment 3 Nicolas Dufresne (ndufresne) 2013-06-26 13:11:50 UTC
Is that enough ? I'm just wondering what happens if you send a request, cancel, send another request. The server will send two replies, do we handle this ?
Comment 4 Wim Taymans 2013-06-26 13:16:15 UTC
(In reply to comment #3)
> Is that enough ? I'm just wondering what happens if you send a request, cancel,
> send another request. The server will send two replies, do we handle this ?

Not at the moment, for this it would need to match the Seq number of the request with that of the reply.

It's all a bit theoretical. I'm willing to implement all this if you make a unit-test :)