GNOME Bugzilla – Bug 720393
Unable to Cancel or Abort tcpclientsink Timeout
Last modified: 2013-12-16 17:39:28 UTC
Created attachment 264146 [details] GST_DEBUG=4 log of the bug. I am working on an application that uses GStreamer to send a Motion JPEG video stream through a tcpclientsink element. The application works fine except if I disrupt the network by switching the connection from wired to wireless or wireless to wired. When that happens the tcpclientsink element waits 15 minutes before responding to messages. That becomes a problem if I try to shut down the application during this time. Here is what I've observed: 1. Start a Motion JPEG media stream with GStreamer using tcpclientsink as the sink. The code pushing the video runs in its own thread. 2. While the media stream is running, disrupt the connection by switching the type of network connection. 3. Start shutting down the application. Call gst_bus_post(bus, gst_message_new_eos(NULL)). This is ignored. The same is true for calls to gst_message_new_error() and my attempts to set the state of the pipeline to GST_STATE_NULL. 4. Call pthread_join to wait for the video thread to exit. It does not respond for up to 15 minutes. When I look at the GST_DEBUG messages, I can see that the GStreamer tcpclientsink hit an error while writing. It waited 15 minutes before stopping retry attempts.
I was able to recreate this problem using gst-launch-0.10. This might be more complicated than necessary, but it worked for me: Launch three scripts: 1. The following relays the data between the consumer and the producer while [ 1 ] do gst-launch-0.10 tcpserversrc host=0 port=${PORT_IN} ! jpegdec ! jpegenc ! tcpserversink port=${PORT_OUT} done 2. The following is the script for the consumer gst-launch-0.10 tcpclientsrc host=${IP_ADDR} port=${PORT_OUT} ! jpegdec ! ffmpegcolorspace ! ximagesink 3. The following is the script for the producer gst-launch-0.10 ximagesrc ! videoscale ! video/x-raw-rgb,framerate=1/1,width=640,height=320 ! ffmpegcolorspace ! jpegenc ! tcpclientsink host=${IP_ADDR} port=${PORT_IN} I ran the first two scripts on one machine and the third script on a second machine. When I switched the network connection on the second machine from wired to wireless, it took 15+ minutes for the tcpclientsink to report an error.
Could you try with 1.x please? The network elements were all rewritten more or less completely. If it still happens with 1.x: Which errors do you see in the debug logs from tcpclientsink, that already happen before these 15 minutes timeout? And where it then retries over and over again? Does it also fail to stop the pipeline for 15 minutes if you (instead of the message stuff, what are you trying to do with that?) just set the state of the pipeline to READY or NULL once the network link is broken? Does it do something more sensible if you call g_socket_set_timeout() with e.g. 30 seconds in gsttcpclientsink.c:gst_tcp_client_sink_start() after the call to g_socket_new() and before the call to g_socket_connect()?
Also note that 0.10 is unmaintained since a long time now and generally not supported anymore.
Unfortunately it's not easy for me to try with 1.x. I would have to integrate that into my application and I'm not sure how much effort that would take. After looking at the source code for 1.2.1, I suspect it would work though. I fixed the problem for my application by setting the send timeout in the gst_tcp_client_sink_start() function of gsttcpclientsink.c: ---------------------------------------- /* create a socket for sending to remote machine */ static gboolean gst_tcp_client_sink_start (GstTCPClientSink * this) { int ret; gchar *ip; struct timeval timeout; timeout.tv_sec = 60; timeout.tv_usec = 0; if (GST_OBJECT_FLAG_IS_SET (this, GST_TCP_CLIENT_SINK_OPEN)) return TRUE; /* reset caps_sent flag */ this->caps_sent = FALSE; /* create sending client socket */ GST_DEBUG_OBJECT (this, "opening sending client socket to %s:%d", this->host, this->port); if ((this->sock_fd.fd = socket (AF_INET, SOCK_STREAM, 0)) == -1) { GST_ELEMENT_ERROR (this, RESOURCE, OPEN_WRITE, (NULL), GST_ERROR_SYSTEM); return FALSE; } setsockopt (this->sock_fd.fd, SOL_SOCKET, SO_SNDTIMEO, (char *)&timeout, sizeof(timeout)); GST_DEBUG_OBJECT (this, "opened sending client socket with fd %d", this->sock_fd.fd); ... ---------------------------------------- Now my application is capable of shutting down within approximately one minute (acceptable for my situation). I understand 0.10 is no longer maintained and that you probably have no interested in this change, but I thought I would pass it along.
Thanks for sharing the fix, I suppose people will find it via google if necessary. If you can reproduce it with gst-launch-0.10, you could probably just have tried gst-launch-1.0 without changing your application (0.10 and 1.0 can be installed in parallel after all). Let's just close it as obsolete then - if it's still a problem, someone will file a bug again sooner or later.