After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 790909 - Race between gst_rtsp_client_close() and client_watch_notify() leads to undefined behaviour
Race between gst_rtsp_client_close() and client_watch_notify() leads to undef...
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-rtsp-server
git master
Other Windows
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-11-27 17:01 UTC by Kseniya Vasilchuk
Modified: 2018-11-03 15:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
prevent undefined behaviour (1.15 KB, patch)
2017-11-27 17:01 UTC, Kseniya Vasilchuk
none Details | Review
To reproduce bug (3.69 KB, text/plain)
2018-01-24 14:12 UTC, Kseniya Vasilchuk
  Details
To reproduce bug (script 1) (149 bytes, text/x-sh)
2018-01-24 14:13 UTC, Kseniya Vasilchuk
  Details
To reproduce (script 2) (67 bytes, text/x-sh)
2018-01-24 14:13 UTC, Kseniya Vasilchuk
  Details
New fix (mutex added) (3.92 KB, patch)
2018-01-24 14:14 UTC, Kseniya Vasilchuk
none Details | Review

Description Kseniya Vasilchuk 2017-11-27 17:01:41 UTC
Created attachment 364515 [details] [review]
prevent undefined behaviour

The failed preroll causes client_watch_notify() which calls rtsp_ctrl_timeout_remove() function.
This function calls g_main_context_find_source_by_id() which use priv->watch_context to find the source to destroy it.

But if in the same time gst_rtsp_client_close() was called (e.g. by gst_rtsp_server_client_filter() function) it can unref and set priv->watch_context to NULL before rtsp_ctrl_timeout_remove().
In this case manual on g_main_context_find_source_by_id() says: 

if GMainContext is NULL, the default context will be used. But we don't use default context in rtsp client so we trying to find non-existent source, it means:

//-> from https://developer.gnome.org/glib/stable/glib-The-Main-Event-Loop.html
It is a programmer error to attempt to lookup a non-existent source.
More specifically: source IDs can be reissued after a source has been destroyed and therefore it is never valid to use this function with a source ID which may have already been removed. An example is when scheduling an idle to run in another thread with g_idle_add(): the idle may already have run and been removed by the time this function is called on its (now invalid) source ID. This source ID may have been reissued, leading to the operation being performed against the wrong source.
//<- 

So we have undefined behaviour here.

P.S.
If we are lucky and default context does not have source with such id, it causes "g_source_destroy: assertion 'source != NULL' failed".

I've attached a patch to prevent use g_main_context_find_source_by_id() with NULL GMainContext. Please watch it.
Comment 1 Edward Hervey 2017-11-28 08:26:25 UTC
Hi,

  Thanks for your patch. Do you know if the issue still applies to git master ? Is there a way to reproduce the issue ?
Comment 2 Kseniya Vasilchuk 2017-12-06 15:14:05 UTC
(In reply to Edward Hervey from comment #1)
> Hi,
> 
>   Thanks for your patch. Do you know if the issue still applies to git
> master ? Is there a way to reproduce the issue ?

I didn't try to reproduced it on master, but master code looks like nothing changed. There should be the same problem.

I have reproduced it this way:
1) Create pipeline with appsrc which didn't give any data (camera was broken)
2) Many clients tried to connect at the same time. If they failed (always in this case), they tried again and again
3) Simultaneously (from another thread)  gst_rtsp_server_client_filter() was called with GST_RTSP_FILTER_REMOVE
4) 9 out of 10 crushes as result
Comment 3 Kseniya Vasilchuk 2018-01-24 14:12:43 UTC
Created attachment 367377 [details]
To reproduce bug

RTSP server based on appsrc example with no buffer on need-data request.
Comment 4 Kseniya Vasilchuk 2018-01-24 14:13:11 UTC
Created attachment 367378 [details]
To reproduce bug (script 1)
Comment 5 Kseniya Vasilchuk 2018-01-24 14:13:33 UTC
Created attachment 367379 [details]
To reproduce (script 2)
Comment 6 Kseniya Vasilchuk 2018-01-24 14:14:31 UTC
Created attachment 367380 [details] [review]
New fix (mutex added)
Comment 7 Kseniya Vasilchuk 2018-01-24 14:15:05 UTC
Ok, it definitely the same situation in master. I've wrote a minimal app based on appsrc example to reproduce it (in attachments). There are two key points: 
1 - No data sent by appsrc. It leads to failed preroll.
2 - Every 35 second all clients are filtered out. 
And I've wrote two scripts with gst-launch-1.0 to simulate many simultaneous connections. You need just compile it and launch, then start launch.sh on the same machine. So I reproduced bug pretty easy on my machine (Windows).

I've also rewrote fix, cause I've noticed that there are no protection against the same time access.
Comment 8 GStreamer system administrator 2018-11-03 15:41:27 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-rtsp-server/issues/35.