GNOME Bugzilla – Bug 737752
rtsp-client: crash when cleaning up session
Last modified: 2014-10-24 19:43:34 UTC
If the client watch is being destroyed and client_watch_notify is called while client_session_removed (via gst_rtsp_session_pool_cleanup) is also being called we might end with a crash. This is because client_watch_notify is called just after everything in the watch has been freed (mutex, etc.) in gst_rtsp_source_finalize (gstrtspconnection.c). If client_session_removed is executed just at that time client->watch might still have a valid pointer but its internal data might be invalid. This is the error: GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_lock': Invalid argument. Aborting. And this is the backtrace:
+ Trace 234158
Thread 1 (Thread 0x7fcbd24e5700 (LWP 36192))
Created attachment 287546 [details] [review] rtspconnection: call notify before releasing watch resources This is a first attempt to fix this. It is a patch for gstrtspconnection.c (gst-plugins-base). If we call the watch notify function before releasing the watch resources in gst_rtsp_source_finalize, the notify callback can lock and set priv->watch to NULL so other functions can finish properly. In the particular case of this bug, this is what would happen (with this patch): client_session_removed is called with or without priv->watch==NULL. If not NULL, watch backlog will be set with watch being valid. It is valid because client_watch_notify might have been executing at the same time but is blocked in the client lock (notify hasn't finished, so watch resources are still valid). I have to say that I'm not fully sure about this one. It seems to me, someone should keep a reference to the watch, but I don't see who or how.
With patch from bug 737690 I could run my test for more than 12 hours. The test consists of 6 clients connecting to different streams each one and disconnecting after a few random seconds and reconnecting. After the 12 hours more than 70000 connections/disconnections succeeded but I ended up with this crash. Let's see if I can go a bit further with patch in comment 1.
(In reply to comment #2) > With patch from bug 737690 I could run my test for more than 12 hours. The test > consists of 6 clients connecting to different streams each one and > disconnecting after a few random seconds and reconnecting. After the 12 hours > more than 70000 connections/disconnections succeeded but I ended up with this > crash. Let's see if I can go a bit further with patch in comment 1. Sorry, that is 6 streams and 3 clients per stream. So 18 clients in total connecting and disconnecting randomly. Streams have media sharing enabled.
With patch in comment 1 and using queue instead of multiqueue (see bug 737794). The server has been running flawlessly after 20 hours and ~100k connections/disconnections.
commit 66abee92b081a1f74c0e3ca5600b99e78fb3b569 Author: Aleix Conchillo Flaqué <aleix@oblong.com> Date: Wed Oct 1 15:04:09 2014 -0700 rtspconnection: call watch notify before freeing any watch resources This gives control to the notify function allowing it to finish other watch related functionality. https://bugzilla.gnome.org/show_bug.cgi?id=737752