After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 732226 - Races when sessions time out
Races when sessions time out
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-rtsp-server
git master
Other Linux
: Normal normal
: 1.3.91
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-06-25 11:24 UTC by Ognyan Tonchev (redstar_)
Modified: 2014-07-09 14:18 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Keep hard refs to the sessions from the client object (7.66 KB, patch)
2014-06-25 11:27 UTC, Ognyan Tonchev (redstar_)
none Details | Review
Keep hard refs to the sessions from the client object (7.74 KB, patch)
2014-06-25 17:43 UTC, Ognyan Tonchev (redstar_)
none Details | Review
client: keep ref to client for the session removed handler (4.11 KB, patch)
2014-07-09 13:22 UTC, Ognyan Tonchev (redstar_)
committed Details | Review
server tests: send teardown to cleanup session (1.31 KB, patch)
2014-07-09 13:23 UTC, Ognyan Tonchev (redstar_)
committed Details | Review
client tests: send teardown to cleanup session (3.25 KB, patch)
2014-07-09 13:24 UTC, Ognyan Tonchev (redstar_)
committed Details | Review
client: check if watch is set in handle_teardown() (1.34 KB, patch)
2014-07-09 13:25 UTC, Ognyan Tonchev (redstar_)
committed Details | Review

Description Ognyan Tonchev (redstar_) 2014-06-25 11:24:58 UTC
I am having trouble because of a race in gst-rtsp-server when expired sessions are being removed. In my case the GSource responsible for cleaning up the timed-out sessions is attached to the application default context. The GSource is set up the following way:

source = gst_rtsp_session_pool_create_watch ()
g_source_set_callback (source, (GSourceFunc) session_timeout_cb, NULL, NULL);

And the GSource callback looks like this:

session_timeout_cb(pool) {
  gst_rtsp_session_pool_filter (pool, remove_expired_session, NULL);
}

remove_expired_session() {
  if (gst_rtsp_session_is_expired (session)) {
    return GST_RTSP_FILTER_REMOVE;
  }
  return GST_RTSP_FILTER_KEEP;
}

And this code introduces a race in rtsp-client.c. The race is between the week reference callback client_session_finalized () run in the main context and the rest of the functions in RTSPClient manipulating the internal sessions list (the code does not even keep hard references to the sessions) from clients own context.

The result I am getting is segmentation faults of different kind when the server is under high load and sessions start to expire.
Comment 1 Ognyan Tonchev (redstar_) 2014-06-25 11:27:48 UTC
Created attachment 279215 [details] [review]
Keep hard refs to the sessions from the client object

Here I have implemented a solution where client keeps hard references to the session objects, session are removed from the internal list when they expire (session::expired signal emitted) or the client side sends teardown.
Comment 2 Ognyan Tonchev (redstar_) 2014-06-25 17:43:54 UTC
Created attachment 279239 [details] [review]
Keep hard refs to the sessions from the client object

Slightly more correct version of the patch.

I should also mention that due to the extra client ref owned by the session expired handler, the RTSPClient object will only be freed after either the client sends TEARDOWN or the session expires. In both cases the expired handler will be removed and the extra client ref will be dropped.

But if there is no session cleanup mechanism enabled for the server and if a client never sends TEARDOWN then the RTSPClient object will be leaked. Which is not very good...
Comment 3 Wim Taymans 2014-07-01 12:50:46 UTC
I reworked it slightly. The signal is now on the session, because that's the one emitting the signal. The signal is emitted when the session is removed from the pool (not only when it expired) so you get the same effect when filtering the sessions on the client.

If the session was shared between multiple client objects, they would override the "gst-rtsp-expired-handler" data on the session. I changed it so that there is only one handler on the configured session pool.

Please check if that also works for you.

commit 5e2afcefdda5066641c01d243444c4f670054de0
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Tue Jul 1 14:41:14 2014 +0200

    client: protect sessions with lock
    
    Protect the list of sessions with the lock.
    
    Fixes https://bugzilla.gnome.org/show_bug.cgi?id=732226

commit fe081e73015dc269d585f4be3efbc0d93ead128b
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Tue Jul 1 12:13:47 2014 +0200

    Client: keep a ref to the session
    
    Don't just keep a weak ref to the session objects but use a hard ref. We
    will be notified when a session is removed from the pool (expired) with
    the new session-removed signal.
    Don't automatically close the RTSP connection when all the sessions of
    a client are removed, a client can continue to operate and it can create
    a new session if it wants. If you want to remove the client from the
    server, you have to use gst_rtsp_server_client_filter() now.
    
    Based on patch from Ognyan Tonchev <ognyan.tonchev at axis.com>
    
    See https://bugzilla.gnome.org/show_bug.cgi?id=732226

commit 964ca3c98842c5dd7d640b81bc018c0a6687c4ef
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Mon Jun 30 15:14:34 2014 +0200

    session-pool: add session-removed signal
    
    Add a signal to be notified when a session is removed from the pool.
Comment 4 Ognyan Tonchev (redstar_) 2014-07-09 13:21:02 UTC
There is still a race which i actually manage to trigger when stressing the server a bit. Session remove handler can be called after the client object has been freed. We should probably use g_signal_connect_data() instead and introduce a circular reference which we can break in teardown/timeout/disconnect.
Comment 5 Ognyan Tonchev (redstar_) 2014-07-09 13:22:38 UTC
Created attachment 280261 [details] [review]
client: keep ref to client for the session removed handler

How about this patch which. We basically keep a ref to the client object for the session removed handler.

This extra ref will be dropped when all client sessions have been removed. A session is removed when a client sends teardown, closes its endpoint of the TCP connection or the sessions expires.
Comment 6 Ognyan Tonchev (redstar_) 2014-07-09 13:23:46 UTC
Created attachment 280262 [details] [review]
server tests: send teardown to cleanup session
Comment 7 Ognyan Tonchev (redstar_) 2014-07-09 13:24:18 UTC
Created attachment 280263 [details] [review]
client tests: send teardown to cleanup session
Comment 8 Ognyan Tonchev (redstar_) 2014-07-09 13:25:18 UTC
Created attachment 280264 [details] [review]
client: check if watch is set in handle_teardown()

This is an already existing bug which i am hitting now when running the unit tests.
Comment 9 Wim Taymans 2014-07-09 14:17:23 UTC
commit 6543082d2b6eb75da4bddc5b43aa24e759123fae
Author: Ognyan Tonchev <ognyan@axis.com>
Date:   Wed Jul 9 15:16:08 2014 +0200

    client: check if watch is set in handle_teardown()
    
    The unit tests run without a watch

commit bfd498585a7fba9c9da91b18da8058204ce84afc
Author: Ognyan Tonchev <ognyan@axis.com>
Date:   Wed Jul 9 14:19:10 2014 +0200

    client tests: send teardown to cleanup session

commit f78886e7cb6992a5d25098978e6edc62baeee32e
Author: Ognyan Tonchev <ognyan@axis.com>
Date:   Wed Jul 9 14:17:46 2014 +0200

    server tests: send teardown to cleanup session

commit e0bc97e40cdf49de2911038c92275b6181113ba6
Author: Ognyan Tonchev <ognyan@axis.com>
Date:   Wed Jul 9 15:01:31 2014 +0200

    client: keep ref to client for the session removed handler
    
    This extra ref will be dropped when all client sessions have been
    removed. A session is removed when a client sends teardown, closes its
    endpoint of the TCP connection or the sessions expires.
    
    Fixes https://bugzilla.gnome.org/show_bug.cgi?id=732226