GNOME Bugzilla – Bug 786961
rtspclientsink: deadlock bug & possible fix
Last modified: 2018-05-14 14:06:28 UTC
Created attachment 358654 [details] File shows pipeline which stuck and appears to be in deadlock We have been chasing this bug for quite some time now, it appears to be a deadlock in rtspclientsink. This bug is very hard to catch, since it happens only on highly loaded machine and after long time. Moreover, it never revealed itself with high log level. This bug is very severe since it block pipeline, and prevent pipeline from de-allocation which leads to memory leaks. In attached log file (truncated, after removing successful de-allocated rtspcliensinks) we have run about 1000 streams to discover this bug. The problematic pipeline is number 834. We can see from attached log file, rtspclientsink834 last log is: Line 15038: gstrtspclientsink.c:3307:gst_rtsp_client_sink_collect_streams:<rtspclientsink834> [00m Waiting for preroll before continuing Looking in the code we see the g_mutex_lock (&sink->preroll_lock); while (!sink->prerolled && !sink->conninfo.flushing) { GST_LOG_OBJECT (sink, "Waiting for preroll before continuing"); g_cond_wait (&sink->preroll_cond, &sink->preroll_lock); } GST_LOG_OBJECT (sink, "Marking streams as collected"); sink->streams_collected = TRUE; g_mutex_unlock (&sink->preroll_lock); Which means we wait for sink->preroll_cond which never happens. If we go backwards we should never reach this line. Since we are flushing. Line 14996: gstrtspclientsink.c:1756:gst_rtsp_client_sink_connection_flush:<rtspclientsink834> [00m set flushing 1 But flushing was never set properly because connection was closed (sink->conninfo.connection == NULL). GST_DEBUG_OBJECT (sink, "set flushing %d", flush); g_mutex_lock (&sink->preroll_lock); if (sink->conninfo.connection && sink->conninfo.flushing != flush) { GST_DEBUG_OBJECT (sink, "connection flush"); gst_rtsp_connection_flush (sink->conninfo.connection, flush); sink->conninfo.flushing = flush; } Connection was closed, as we see from the log at line: Line 14967: gstrtspclientsink.c:1730:gst_rtsp_conninfo_close:<rtspclientsink834> [00m freeing connection... The code at this line (info is sink->conninfo from the caller function) if (free && info->connection) { /* free connection */ GST_DEBUG_OBJECT (sink, "freeing connection..."); gst_rtsp_connection_free (info->connection); info->connection = NULL; } Was called from: if (sink->conninfo.connection) { GST_DEBUG_OBJECT (sink, "free connection"); gst_rtsp_conninfo_close (sink, &sink->conninfo, TRUE); } Which means sink->conninfo.connection is null. Because of this flush is not done properly. Since, flush is not updated properly we stuck at g_cond_wait and wait for preroll_cond which will never happen. To fix the problem we propose add check if connection is non-null, and not entering running g_cond_wait at all. while (!sink->prerolled && !sink->conninfo.flushing && sink->conninfo.connection) { GST_LOG_OBJECT (sink, "Waiting for preroll before continuing"); g_cond_wait (&sink->preroll_cond, &sink->preroll_lock); } We have tested the solution (and will continue testing it) so far the bug is not reproduced.
Can you attach a patch in "git format-patch" format here for easier reviewing?
Closing this bug report as no further information has been provided. Please feel free to reopen this bug report if you can provide the information that was asked for in a previous comment. Thanks!
Thanks for reporting. Fixed in commit b3a4df7ab8e5ef07e83438cb3ad041bc04253525 Author: Jan Schmidt <jan@centricular.com> Date: Wed May 9 04:09:02 2018 +1000 rtspclientsink: Don't deadlock in preroll on early close If the connection is closed very early, the flushing marker might not get set and rtspclientsink can get deadlocked waiting for preroll forever. https://bugzilla.gnome.org/show_bug.cgi?id=786961