After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 786961 - rtspclientsink: deadlock bug & possible fix
rtspclientsink: deadlock bug & possible fix
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-rtsp-server
1.12.x
Other Linux
: Normal major
: 1.14.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-08-29 08:09 UTC by EduardS
Modified: 2018-05-14 14:06 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
File shows pipeline which stuck and appears to be in deadlock (2.02 MB, text/plain)
2017-08-29 08:09 UTC, EduardS
Details

Description EduardS 2017-08-29 08:09:26 UTC
Created attachment 358654 [details]
File shows pipeline which stuck and appears to be in deadlock

We have been chasing this bug for quite some time now, it appears to be a deadlock in rtspclientsink. This bug is very hard to catch, since it happens only on highly loaded machine and after long time. Moreover, it never revealed itself with high log level.

This bug is very severe since it block pipeline, and prevent pipeline from de-allocation which leads to memory leaks. 

In attached log file (truncated, after removing successful de-allocated rtspcliensinks) we have run about 1000 streams to discover this bug. The problematic pipeline is number 834.
We can see from attached log file, rtspclientsink834 last log is:
	Line 15038:  gstrtspclientsink.c:3307:gst_rtsp_client_sink_collect_streams:<rtspclientsink834> [00m Waiting for preroll before continuing

Looking in the code we see the 

g_mutex_lock (&sink->preroll_lock);
  while (!sink->prerolled && !sink->conninfo.flushing) {
    GST_LOG_OBJECT (sink, "Waiting for preroll before continuing");
    g_cond_wait (&sink->preroll_cond, &sink->preroll_lock);
  }
  GST_LOG_OBJECT (sink, "Marking streams as collected");
  sink->streams_collected = TRUE;
  g_mutex_unlock (&sink->preroll_lock);

Which means we wait for sink->preroll_cond which never happens. If we go backwards we should never reach this line. Since we are flushing.
Line 14996: gstrtspclientsink.c:1756:gst_rtsp_client_sink_connection_flush:<rtspclientsink834> [00m set flushing 1

But flushing was never set properly because connection was closed (sink->conninfo.connection == NULL).

  GST_DEBUG_OBJECT (sink, "set flushing %d", flush);
  g_mutex_lock (&sink->preroll_lock);
  if (sink->conninfo.connection && sink->conninfo.flushing != flush) {
    GST_DEBUG_OBJECT (sink, "connection flush");
    gst_rtsp_connection_flush (sink->conninfo.connection, flush);
    sink->conninfo.flushing = flush;
  }

Connection was closed, as we see from the log at line:
 Line 14967: gstrtspclientsink.c:1730:gst_rtsp_conninfo_close:<rtspclientsink834> [00m freeing connection... 

The code at this line (info is sink->conninfo from the caller function)

  if (free && info->connection) {
    /* free connection */
    GST_DEBUG_OBJECT (sink, "freeing connection...");
    gst_rtsp_connection_free (info->connection);
    info->connection = NULL;
  }

Was called from:
 if (sink->conninfo.connection) {
      GST_DEBUG_OBJECT (sink, "free connection");
      gst_rtsp_conninfo_close (sink, &sink->conninfo, TRUE);
    }


Which means sink->conninfo.connection is null. Because of this flush is not done properly. 
Since, flush is not updated properly we stuck at g_cond_wait and wait for preroll_cond which will never happen.



To fix the problem we propose add check if connection is non-null, and not entering running g_cond_wait at all.
  while (!sink->prerolled && !sink->conninfo.flushing  && sink->conninfo.connection) {
    GST_LOG_OBJECT (sink, "Waiting for preroll before continuing");
    g_cond_wait (&sink->preroll_cond, &sink->preroll_lock);
  }

We have tested the solution (and will continue testing it) so far the bug is not reproduced.
Comment 1 Sebastian Dröge (slomo) 2017-08-29 10:12:44 UTC
Can you attach a patch in "git format-patch" format here for easier reviewing?
Comment 2 Sebastian Dröge (slomo) 2018-05-07 15:59:34 UTC
Closing this bug report as no further information has been provided. Please feel free to reopen this bug report if you can provide the information that was asked for in a previous comment.
Thanks!
Comment 3 Jan Schmidt 2018-05-08 18:12:02 UTC
Thanks for reporting.

Fixed in commit b3a4df7ab8e5ef07e83438cb3ad041bc04253525

Author: Jan Schmidt <jan@centricular.com>
Date:   Wed May 9 04:09:02 2018 +1000

    rtspclientsink: Don't deadlock in preroll on early close
    
    If the connection is closed very early, the flushing
    marker might not get set and rtspclientsink can get
    deadlocked waiting for preroll forever.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=786961