GNOME Bugzilla – Bug 770935
RTP/RTSP/TCP transport RTSP TIMEOUT should start at once when tcpi_ca_state is TCP_CA_Loss
Last modified: 2018-11-03 11:49:35 UTC
This is a bug-report related to bug 770934. The issue is to have a deterministic handle on the timeout on the platform. This works fine in UDP, but in TCP tunneling the timing is at the mercy of the persistent nature of TCP. It stand clear that the non-blocking-write socket will continue to act normally (as it has a buffer) even at network disconnect. Possibly also retransmit schemes, related timeouts, number of retries, and timeout back-off scheme add to this problem as well. The net result is a delay between the actual tcpi_ca_state in the TCP stack being set to TCP_CA_Loss (the first retransmit) and the user of the socket seeing RESOURCE TEMPORARILY UNAVAILABLE (EWOULDBLOCK) that amounts up to several seconds. In actual tests actually >10s (DEBUG mode runs though) The impact is that functionality depending on deterministic values of the time delay between the last frame of the lost stream and the timeout expiration, will have a hard time bridging that gap. (e.g. fail over file save locally) A patch has been prepared that addresses this issue.
Created attachment 334899 [details] [review] RTP/RTSP/TCP transport RTSP TIMEOUT should start at once when tcpi_ca_state is TCP_CA_Loss This patch depends on patch in https://bugzilla.gnome.org/show_bug.cgi?id=770934 Understanding that this likely is a Linux specific fix, I still put it up for review. Should perhaps be set in a platform independent API. Suggestions most welcome. The idea is to make sure that we capture the first possible moment when there is loss to trigger the RTSP TIMEOUT. Having the TCP stack indicating TCP_CA_Loss would be that moment. With this fix together with the previous one in 770934, the time gap is reduced in one case from 14s to 0s, making the TCP tunneling TIMEOUT-wise behave like the UDP case. A desirable feature.
Review of attachment 334899 [details] [review]: ::: gst-libs/gst/rtsp/gstrtspconnection.c @@ +3763,3 @@ + getsockopt(g_socket_get_fd (sock), + IPPROTO_TCP, TCP_INFO, &tcp_info, &optlen); + res = (tcp_info.tcpi_ca_state == TCP_CA_Loss); This all should be conditional in #ifdefs :) @@ +3820,3 @@ + treat same as WOULD_BLOCK. Need this to avoid delay. */ + if (check_tcp_ca_state_loss(watch->conn->write_socket)) + res = GST_RTSP_EINTR; The main difference between WOULD_BLOCK and this however is that WOULD_BLOCK means that no data was written, here all data was written (to the OS internal buffer). Is this handled correctly? Also this can happen basically all the time without there being an actual problem. TCP packets getting lost tell you nothing other than the link being not ideal
Yes, you are right. The intent is simply to leave things as they are, but allow the RTSP to start at the earliest possible moment. But true, if we set it to GST_RTSP_EINTR, then the BACKLOG will be exercised and we do not want that, because if the result of the send to the socket was OK, then the data is either in the window or in the socket buffer, and should be treated as *handled*. It worked out for my disconnect test scenarios for the obvious reason. I will contemplate these comments and get back with an updated patch. cheers.
Created attachment 334978 [details] [review] Updated version with idefs and better handling I used G_OS_UNIX ifdefs and put the use of it in a place where this feature will only be used if the send to socket was OK, and then terminates as if OK, but return generic error, if tcpi_ca_state == CA_STAT_Loss. Tested and proved to work the same for the disconnect/failover case i am working with. On the reasoning if we want to check that state flag, well, the RTSP timeout is by default 60s. The TCP timeout is set to 30-35 seconds, the retransmit timeout is small in the first rounds. If you over the span of 60s (65s as it turns out) does not see another state than Loss, well, then I think it is reasonable to trigger that timeout. The ambition is only to handle the RTSP timeout start more precise, not to otherwise change the handling of sending data.
Just adding a note to my timeout rant. Obviously the socket buffer (also if you add the BACKLOG) will be full long before that 60s has passed.
Weekly PING on this one as well. /Cheers
Weekly PING
Hi. Weekly ping. I know u guys are busy with release work, but it would be peachy if u could find time to review a couple of my patches. /Cheers
Just a remainder for review /Cheers
Please review
Weekly remainder. Please review.
Please review this patch
please review this patch
So if I understand correctly this would call the new check_tcp_ca_state_loss() doing a getsockopt for every single chunk of data sent successfully? I wonder if this has any impact on performance? As for the new ERROR return, the same comments apply as in the other bug :)
It is an external call, but it is a mem copy of a struct as far as I understand. I doubt it is more expensive than the socket send. To lessen the impact I could have a counter that made the call more seldom like #define N 2 /* inverted frequency of getsockopt calls to check TCP_CA_Loss) */ guint cnt; .. cnt++; if ( cnt % N == 0 ) { check_tcp_ca_state_loss(watch->conn->write_socket)); } I will try to do some timing checks with the call enabled and disabled.
Created attachment 345606 [details] [review] Due to change to 770934 this patch change too. Note that the patch in this report depends on the the patch in 770934, and cannot be applied separately.
plz review
review?
Comment on attachment 345606 [details] [review] Due to change to 770934 this patch change too. Was this patch supposed to be attached to bug #770934 instead? >+#ifdef G_OS_UNIX >+static gboolean >+check_tcp_ca_state_loss (GSocket * sock) >+{ >+ gboolean res; >+ struct tcp_info tcp_info; >+ socklen_t optlen = sizeof (tcp_info); >+ >+ getsockopt (g_socket_get_fd (sock), >+ IPPROTO_TCP, TCP_INFO, &tcp_info, &optlen); >+ res = (tcp_info.tcpi_ca_state == TCP_CA_Loss); >+ >+ GST_DEBUG ("tcpi_ca_state=%u, check_tcp_ca_state_loss=%d", >+ tcp_info.tcpi_ca_state, res); >+ >+ return res; >+} >+#endif Will this work on all common Unices? macOS? *BSD? If not, then we need to check what's available. Also, the includes needed for this are the other patch in the other bug (although not needed there I think). > static void > connloss_status (GstRTSPWatch * watch, GstRTSPResult result, > GSocket * write_socket, gpointer user_data) > { > GstRTSPClient *client = GST_RTSP_CLIENT (user_data); > GstRTSPClientPrivate *priv = client->priv; >+ gboolean is_eintr = (result == GST_RTSP_EINTR); > >- priv->connloss = (result == GST_RTSP_EINTR); >+ priv->connloss = is_eintr; >+ >+#ifdef G_OS_UNIX >+ /* If TCP stack state indicate loss, don't reset session TIMEOUT. >+ * Need this to avoid delay. Return generic error. >+ */ This comment sounds like it belongs elsewhere in the other patch that makes use of priv->connloss to skip the timeout reset. I don't understand the bit about 'Return generic error', what does it mean here? The comment here should be more descriptive as to what we're trying to do here and then why, e.g. 'Indicate potential connection loss as soon as we detect a write backlog, so that we stop resetting the timout from the moment any backlog is building up. This makes sure we can honour the actually requested timeout more accurately.' or somesuch. >+ if (!is_eintr && write_socket) >+ priv->connloss = check_tcp_ca_state_loss (write_socket); >+#endif > > GST_LOG ("client %p: connloss_status %s", client, > (priv->connloss ? "CONNECTION LOST" : "OK")); >-- >2.1.4 >
I think that this ticket and the related ticket 770934 shall be closed. Reason: The changes in the patch changed the default return code for gst_rtsp_watch_write_data to ERROR. The result of this change is that there can be unwanted (faulty) session timeouts when the client (or network) is slow in reading. The change to return ERROR when an item is added to the backlog queue is causing the timeout variables only to be reset when there are no items in the backlog queue.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/289.