GNOME Bugzilla – Bug 790909
Race between gst_rtsp_client_close() and client_watch_notify() leads to undefined behaviour
Last modified: 2018-11-03 15:41:27 UTC
Created attachment 364515 [details] [review] prevent undefined behaviour The failed preroll causes client_watch_notify() which calls rtsp_ctrl_timeout_remove() function. This function calls g_main_context_find_source_by_id() which use priv->watch_context to find the source to destroy it. But if in the same time gst_rtsp_client_close() was called (e.g. by gst_rtsp_server_client_filter() function) it can unref and set priv->watch_context to NULL before rtsp_ctrl_timeout_remove(). In this case manual on g_main_context_find_source_by_id() says: if GMainContext is NULL, the default context will be used. But we don't use default context in rtsp client so we trying to find non-existent source, it means: //-> from https://developer.gnome.org/glib/stable/glib-The-Main-Event-Loop.html It is a programmer error to attempt to lookup a non-existent source. More specifically: source IDs can be reissued after a source has been destroyed and therefore it is never valid to use this function with a source ID which may have already been removed. An example is when scheduling an idle to run in another thread with g_idle_add(): the idle may already have run and been removed by the time this function is called on its (now invalid) source ID. This source ID may have been reissued, leading to the operation being performed against the wrong source. //<- So we have undefined behaviour here. P.S. If we are lucky and default context does not have source with such id, it causes "g_source_destroy: assertion 'source != NULL' failed". I've attached a patch to prevent use g_main_context_find_source_by_id() with NULL GMainContext. Please watch it.
Hi, Thanks for your patch. Do you know if the issue still applies to git master ? Is there a way to reproduce the issue ?
(In reply to Edward Hervey from comment #1) > Hi, > > Thanks for your patch. Do you know if the issue still applies to git > master ? Is there a way to reproduce the issue ? I didn't try to reproduced it on master, but master code looks like nothing changed. There should be the same problem. I have reproduced it this way: 1) Create pipeline with appsrc which didn't give any data (camera was broken) 2) Many clients tried to connect at the same time. If they failed (always in this case), they tried again and again 3) Simultaneously (from another thread) gst_rtsp_server_client_filter() was called with GST_RTSP_FILTER_REMOVE 4) 9 out of 10 crushes as result
Created attachment 367377 [details] To reproduce bug RTSP server based on appsrc example with no buffer on need-data request.
Created attachment 367378 [details] To reproduce bug (script 1)
Created attachment 367379 [details] To reproduce (script 2)
Created attachment 367380 [details] [review] New fix (mutex added)
Ok, it definitely the same situation in master. I've wrote a minimal app based on appsrc example to reproduce it (in attachments). There are two key points: 1 - No data sent by appsrc. It leads to failed preroll. 2 - Every 35 second all clients are filtered out. And I've wrote two scripts with gst-launch-1.0 to simulate many simultaneous connections. You need just compile it and launch, then start launch.sh on the same machine. So I reproduced bug pretty easy on my machine (Windows). I've also rewrote fix, cause I've noticed that there are no protection against the same time access.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-rtsp-server/issues/35.