GNOME Bugzilla – Bug 737752
rtsp-client: crash when cleaning up session
Last modified: 2014-10-24 19:43:34 UTC
If the client watch is being destroyed and client_watch_notify is called while client_session_removed (via gst_rtsp_session_pool_cleanup) is also being called we might end with a crash.
This is because client_watch_notify is called just after everything in the watch has been freed (mutex, etc.) in gst_rtsp_source_finalize (gstrtspconnection.c). If client_session_removed is executed just at that time client->watch might still have a valid pointer but its internal data might be invalid.
This is the error:
GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_lock': Invalid argument. Aborting.
And this is the backtrace:
(Thread 0x7fcbd24e5700 (LWP 36192))
Created attachment 287546 [details] [review]
rtspconnection: call notify before releasing watch resources
This is a first attempt to fix this. It is a patch for gstrtspconnection.c (gst-plugins-base).
If we call the watch notify function before releasing the watch resources in gst_rtsp_source_finalize, the notify callback can lock and set priv->watch to NULL so other functions can finish properly.
In the particular case of this bug, this is what would happen (with this patch):
client_session_removed is called with or without priv->watch==NULL. If not NULL, watch backlog will be set with watch being valid. It is valid because client_watch_notify might have been executing at the same time but is blocked in the client lock (notify hasn't finished, so watch resources are still valid).
I have to say that I'm not fully sure about this one. It seems to me, someone should keep a reference to the watch, but I don't see who or how.
With patch from bug 737690 I could run my test for more than 12 hours. The test consists of 6 clients connecting to different streams each one and disconnecting after a few random seconds and reconnecting. After the 12 hours more than 70000 connections/disconnections succeeded but I ended up with this crash. Let's see if I can go a bit further with patch in comment 1.
(In reply to comment #2)
> With patch from bug 737690 I could run my test for more than 12 hours. The test
> consists of 6 clients connecting to different streams each one and
> disconnecting after a few random seconds and reconnecting. After the 12 hours
> more than 70000 connections/disconnections succeeded but I ended up with this
> crash. Let's see if I can go a bit further with patch in comment 1.
Sorry, that is 6 streams and 3 clients per stream. So 18 clients in total connecting and disconnecting randomly. Streams have media sharing enabled.
With patch in comment 1 and using queue instead of multiqueue (see bug 737794). The server has been running flawlessly after 20 hours and ~100k connections/disconnections.
Author: Aleix Conchillo Flaqué <email@example.com>
Date: Wed Oct 1 15:04:09 2014 -0700
rtspconnection: call watch notify before freeing any watch resources
This gives control to the notify function allowing it to finish other
watch related functionality.