After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 764744 - Crashes when multiple udpsrc are created for each client on a shared media, misses tracking and cleanup
Crashes when multiple udpsrc are created for each client on a shared media, m...
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-rtsp-server
1.8.0
Other Linux
: Normal critical
: 1.8.2
Assigned To: GStreamer Maintainers
GStreamer Maintainers
: 765132 (view as bug list)
Depends on: 764679
Blocks:
 
 
Reported: 2016-04-07 16:58 UTC by Jake Foytik
Modified: 2016-07-19 14:02 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Stack Trace for crash (2.26 KB, text/plain)
2016-04-07 17:10 UTC, Jake Foytik
  Details
Stack trace of all threads (21.12 KB, text/plain)
2016-04-07 17:27 UTC, Jake Foytik
  Details
gst logs (3.37 MB, application/gzip)
2016-04-12 11:54 UTC, Jake Foytik
  Details
initial patch (14.70 KB, patch)
2016-04-14 19:57 UTC, Jake Foytik
none Details | Review
Updated patch with recommended changes (16.31 KB, patch)
2016-04-18 12:27 UTC, Jake Foytik
none Details | Review
Unit tests (14.78 KB, patch)
2016-04-18 17:33 UTC, Jake Foytik
none Details | Review
Updated patch (30.75 KB, patch)
2016-04-21 11:53 UTC, Jake Foytik
none Details | Review
Final Patch (29.42 KB, patch)
2016-04-25 13:18 UTC, Jake Foytik
committed Details | Review

Description Jake Foytik 2016-04-07 16:58:37 UTC
In our application we are seeing a couple of issues when using rtsp-server in TLS mode with a media-factory that is set as 'shared'. These issues can be reproduced using a modified version of the test-video example code. In our tests, we enable the WITH_AUTH and WITH_TLS flags and set the media factory to shared ( gst_rtsp_media_factory_set_shared(factory, TRUE); ).

Issue 1:
When streaming multiple rtsp streams from the same factory, occasionally a stream will connect but not be able to decode the SRTP packets. Steps to reproduce :
 - Run test-video (with TLS, Auth, and shared factory)
 - Connect a client using gst-launch-1.0 rtspsrc location=.... 
 - Wait around 6 minutes.
 - Connect another client using gst-launch-1.0 --gst-debug=srtpdec:5 rtspsrc location=... . (Now there will be 2 clients connected to the same session)
 - Observe : The second client will connect with server and will not be able to decode the SRTP packets. srtpdec will post a warnings "Unable to unprotect buffer (unprotect failed code 7)" and "Error authentication packet, dropping".

It seems as if there is some timeout, or mismanagement of the the key used between the SRTP elements. This behavior was observed for both gstreamer version 1.6.3 and 1.8.0.

Issue 2:
When streaming multiple rtsp streams from the same factory, the example program will crash when stopping all streams. Steps to reproduce :
 - Run test-video (with TLS, Auth, and shared factory)
 - Connect a client using gst-launch-1.0 rtspsrc location=.... 
 - Connect another client using gst-launch-1.0 rtspsrc location=... . (Now there will be 2 clients connected to the same session)
 - Stop both clients.
 - Observe : The test-video program will abort with the error "g_mutex_clear() called on uninitialised or locked mutex"

This behavior is only seen in gstreamer 1.8.0. I could not reproduce this in version 1.6.3.
Comment 1 Jake Foytik 2016-04-07 17:10:23 UTC
Created attachment 325557 [details]
Stack Trace for crash
Comment 2 Sebastian Dröge (slomo) 2016-04-07 17:16:06 UTC
Can you get a backtrace of all threads when it crashes, and the output printed to the terminal from that assertion?
Comment 3 Jake Foytik 2016-04-07 17:24:03 UTC
Here is the console output. I'll post the backtrace once I get it.

$ ./test-video
stream ready at rtsps://127.0.0.1:8554/test

(lt-test-video:6682): GStreamer-CRITICAL **: 
Trying to dispose element udpsrc23, but it is in PLAYING (locked) instead of the NULL state.
You need to explicitly set elements to the NULL state before
dropping the final reference, to allow them to clean up.
This problem may also be caused by a refcounting bug in the
application or some element.

g_mutex_clear() called on uninitialised or locked mutex
Aborted (core dumped)
Comment 4 Jake Foytik 2016-04-07 17:27:19 UTC
Created attachment 325558 [details]
Stack trace of all threads
Comment 5 Sebastian Dröge (slomo) 2016-04-11 07:44:36 UTC
Can you check why the udpsrc is not set to NULL state and destroyed in gst_rtsp_stream_leave_bin() in rtsp-stream.c? Is that function ever called for you?
Comment 6 Jake Foytik 2016-04-11 13:20:24 UTC
Thanks for providing some direction. I've been debugging the gst_rtsp_stream_leave_bin() function. It is being called, and I have verified that it is actually setting the elements to the NULL state. It looks like it may be a udpsrc not managed within rtsp-stream. In my debug test, I can see:
  - udpsrc4 --> STATE_NULL
  - udpsrc5 --> STATE_NULL
  - udpsrc6 --> STATE_NULL
  - udpsrc7 --> STATE_NULL
Then, after the leave_bin function has been called I get the crash which provides the error that udpsrc3 was not in the NULL state when unreffing. Any idea where this other udpsrc is managed?
Comment 7 Sebastian Dröge (slomo) 2016-04-12 06:56:50 UTC
It should be managed exactly there, that's already a useful information :) Can you get a GST_DEBUG=6 debug log, and for the exactly same run the exact text of the assertion (i.e. what the name of the problematic udpsrc is)?
Comment 8 Jake Foytik 2016-04-12 11:54:12 UTC
Created attachment 325791 [details]
gst logs

Here are the gstreamer logs for the issue. The Gstreamer-Critical is contained in the logs, and in this case "udpsrc3" was the problem element.
Comment 9 Jake Foytik 2016-04-12 20:28:09 UTC
Sebastian, I think I may have narrowed in on the problem.

When connecting the client to the server, the gst_rtsp_stream_allocate_udp_sockets() function is called which in our case allocates two udpsrc elements in stream->priv->udpsrc_v4[0] and stream->priv->udpsrc_v4[1]. When another client connects to the server and we're dealing with a media configured to be shared, gst_rtsp_stream_allocate_udp_sockets() is called with the same stream. Thus, two new udpsrc elements are created and the pointers stored in stream->priv->udpsrc_v4[0] and stream->priv->udpsrc_v4[1]. The previous pointers were overwritten and now it looks like we have leaked the old udpsrc_v4[0] and the old udpsrc_v4[1] is now managed/owned by a bin element. Furthermore, it looks like udpsrc_v4[1] is configured to be locked with the function gst_element_set_locked_state(), so now when the bin is set to NULL state this element will remain in its PLAYING state which eventually becomes a problem when unreffing the bin.
Comment 10 Sebastian Dröge (slomo) 2016-04-13 06:17:30 UTC
That's probably related to bug #757488 then. Even if it was not caused by that, that commit refactored all the relevant code :)

Not sure how this is supposed to work: for shared media you use the same multiudpsink, but probably want to have new udpsrcs and funnel them into the single rtpbin. I don't think this did the right thing before.
Comment 11 Jake Foytik 2016-04-13 14:27:16 UTC
Yea, it seems like it was incorrect before as well. 

I'm able to get around the crash by removing line rtsp-stream.c:1120 (gst_element_set_locked_state()). I would expect this to have some repercussions, but the streams have been playing fine. Do you know why the element is set to locked?

I think the correct solution would be to have rtsp-stream maintain a handle to all of the udpsrcs created for the shared media and have them properly cleaned up when a client disconnects. In the current implementation, it is possible to have a lot of udpsrcs needlessly hanging around if a shared media has been open for a long time with multiple clients connecting and disconnecting. It looks like the sending side of the pipeline is cleaned up in the update_transport() function where the "remove" signal is called on the multiudpsink. Should we not clean up the udpsrcs here too?

I have confirmed this crash is not related to the TLS configuration. Looking back at my first comment, I originally thought the two issues were related but that appears to be incorrect. Should I create another bug for the SRTP issue?
Comment 12 Sebastian Dröge (slomo) 2016-04-13 14:43:03 UTC
Yes, please open a new bug for the SRTP issue.

And you're right, it should keep around all udpsrc of all clients and remove them as needed when clients disconnect. All udpsrc should go to the funnel. Do you want to try to write a patch for this?
Comment 13 Jake Foytik 2016-04-13 17:59:10 UTC
Sure, I'll work on the patch.
Comment 14 Sebastian Dröge (slomo) 2016-04-14 06:52:01 UTC
Great, let me know if you need some help :) Thanks!
Comment 15 Jake Foytik 2016-04-14 19:57:22 UTC
Created attachment 326051 [details] [review]
initial patch

Sebastian, here is an initial patch with the changes. I've provided some notes below and there are a couple sections I would like your input on.

Notes:
 -- I've assumed that there can exist multiple unicast udpsrcs but only a single set of multicast udpsrcs (one for IPV4 and one for IPV6).
   - The multicast udpsrcs remain the same.
   - The unicast udpsrcs are now maintained in a hash table that uses the RTSPTransport as a key. It didn't look like there was any special logic required for handling IPV4 v. IPV6 except for the creation of the sockets, so the hash table can contain both types.
   - When unicast sockets are allocated, the udpsrcs are stored in the hash table. See the gst_rtsp_stream_allocate_udp_sockets() function.
   - When a client disconnects and the connection is unicast, we now cleanup the udpsrcs related to the RTSPTransport.  See update_transport() function.
   - In the gst_rtsp_stream_leave_bin() function, all udpsrcs are now cleaned up.
 -- I've made some fundamental changes to the gst_rtsp_stream_allocate_udp_sockets() function that I think are correct, but you should verify. They include:
    - IPV6 unicast sockets were only allowed to be allocated one time, in the same manner as multicast. This is different for IPV4 unicast. I assumed that we wanted to allow multiple IPV6 unicast connections, so I removed the 'have_ipv6' check that would skip allocation.
    - I changed the logic for calculating the return 'result'. Before, it performed 'result = have_ipv4 || have_ipv4_mcast || have_ipv6 || have_ipv6_mcast'. This would return TRUE if a client was trying to create an IPV4 unicast connection and it failed to allocate the unicast udpsrcs, but a client had previously connected with IPV6 multicast successfully.
 -- There are two sections of the code where the code has been commented out with '#ifdef NOTHING'. Please review these portions. I think they are no longer needed since we now postpone the creation of udpsrc elements to after the join_bin function, but there may be a scenario I am missing.

Please review the patch and make recommendations. I'll make changes as needed.
Comment 16 Sebastian Dröge (slomo) 2016-04-15 06:27:53 UTC
(In reply to Jake Foytik from comment #15)
>  -- I've assumed that there can exist multiple unicast udpsrcs but only a
> single set of multicast udpsrcs (one for IPV4 and one for IPV6).
>    - The multicast udpsrcs remain the same.

Correct

>    - The unicast udpsrcs are now maintained in a hash table that uses the
> RTSPTransport as a key. It didn't look like there was any special logic
> required for handling IPV4 v. IPV6 except for the creation of the sockets,
> so the hash table can contain both types.

No special handling, correct

>    - When unicast sockets are allocated, the udpsrcs are stored in the hash
> table. See the gst_rtsp_stream_allocate_udp_sockets() function.
>    - When a client disconnects and the connection is unicast, we now cleanup
> the udpsrcs related to the RTSPTransport.  See update_transport() function.
>    - In the gst_rtsp_stream_leave_bin() function, all udpsrcs are now
> cleaned up.

Makes sense

>  -- I've made some fundamental changes to the
> gst_rtsp_stream_allocate_udp_sockets() function that I think are correct,
> but you should verify. They include:
>     - IPV6 unicast sockets were only allowed to be allocated one time, in
> the same manner as multicast. This is different for IPV4 unicast. I assumed
> that we wanted to allow multiple IPV6 unicast connections, so I removed the
> 'have_ipv6' check that would skip allocation.

Makes sense and is probably just an overlook. IPv4/6 should behave exactly the same.

>     - I changed the logic for calculating the return 'result'. Before, it
> performed 'result = have_ipv4 || have_ipv4_mcast || have_ipv6 ||
> have_ipv6_mcast'. This would return TRUE if a client was trying to create an
> IPV4 unicast connection and it failed to allocate the unicast udpsrcs, but a
> client had previously connected with IPV6 multicast successfully.

That part looked weird to me last time I looked too, so that fix is probably fine :)

> Please review the patch and make recommendations. I'll make changes as
> needed.

A unit test would be nice to have
Comment 17 Sebastian Dröge (slomo) 2016-04-15 06:36:29 UTC
Review of attachment 326051 [details] [review]:

For a unit test, adding/removing clients and checking if things still work afterwards would be a good test I guess... also checking if all but the first client properly work

Check the existing tests for ideas :)

::: gst/rtsp-server/rtsp-stream.c
@@ +264,3 @@
   priv->ptmap = g_hash_table_new_full (NULL, NULL, NULL,
       (GDestroyNotify) gst_caps_unref);
+  priv->udpsrcs = g_hash_table_new (g_direct_hash, g_direct_equal);

You probably want to create the hash table with a free function, so that you don't have to manually free the items when they get removed

@@ +1467,3 @@
+        g_slice_new0 (GstRTSPStreamUDPSrcs);
+    transport_udpsrcs->udpsrc[0] = NULL;
+    transport_udpsrcs->udpsrc[1] = NULL;

They are already NULL because you allocate them with new0()

@@ +1480,3 @@
+      /* TODO: Seems like this should be the same as the IPV4 unicast case, so 
+       * I've commented out this check that prevented multiple IPV6 unicast udpsrcs. 
+       * This could be incorrect. */

It seems correct, they should behave the same

@@ +1494,3 @@
+     * Otherwise, add it to the hash table */
+    if (transport_udpsrcs->udpsrc[0] == NULL
+        && transport_udpsrcs->udpsrc[1] == NULL)

I think udpsrc[1] could be NULL if RTCP is disabled, or not?

@@ +2517,3 @@
+ * postponed.
+ * See https://bugzilla.gnome.org/show_bug.cgi?id=757488 */
+#ifdef NOTHING

I think you're right, also for the other one below. This code can disappear, it's now done in play_udpsources_one_family()

@@ +3305,3 @@
 
+/* Properly dispose udpsrcs that were created for a given transport. */
+/* Must be called with a lock. */

Which lock? :) Better to be explicit here in the comments
Comment 18 Sebastian Dröge (slomo) 2016-04-16 09:04:32 UTC
*** Bug 765132 has been marked as a duplicate of this bug. ***
Comment 19 Jake Foytik 2016-04-18 12:27:41 UTC
Created attachment 326247 [details] [review]
Updated patch with recommended changes

This is the original patch with the following changes:
 - Added hash table free function.
 - Removed unneeded initialization of udpsrcs.
 - Removed commented code.
 - Better comments.

This addresses all of the suggestions in the review except for:

> I think udpsrc[1] could be NULL if RTCP is disabled, or not?

That is fine if RTCP is disabled. No udpsrcs will be created and an entry will not be stored in the hash table.
Comment 20 Jake Foytik 2016-04-18 17:33:35 UTC
Created attachment 326288 [details] [review]
Unit tests

Here is a separate patch for the changes to the Unit tests. I can squash this commit with the other changes if you would prefer.

Since I changed the return logic of gst_rtsp_stream_allocate_udp_sockets(), two of the unit tests were now failing. The problem seems to be with allocating sockets for Multicast IPV6. I think this issue was hidden in there before. The problematic sections of the test are commented with '#ifdef ENABLE_FAILING_TESTS'.

I also added a test to rtsp-server to add and remove clients as suggested.
Comment 21 Sebastian Dröge (slomo) 2016-04-19 07:00:20 UTC
Review of attachment 326247 [details] [review]:

Generally looks good, thanks! Just some minor things

How does the mcast ipv6 test fail exactly?

::: gst/rtsp-server/rtsp-stream.c
@@ +314,3 @@
+  /* We expect all udpsrcs to be cleaned up by this point. */
+  if (g_hash_table_size (priv->udpsrcs) > 0)
+    GST_ERROR ("Unreffing udpsrcs hash table that contains elements.");

This should probably be more like a g_critical() :)

@@ +1492,3 @@
+    if (transport_udpsrcs->udpsrc[0] == NULL
+        && transport_udpsrcs->udpsrc[1] == NULL)
+      g_slice_free (GstRTSPStreamUDPSrcs, transport_udpsrcs);

The problem here is that you free the transport if udpsrc[1] == NULL. No matter if udpsrc[0] is NULL or not. Doesn't this fail for streams without RTCP now because you will free the data and not keep track of it?

@@ +2704,3 @@
 }
 
+/* Must be called with a lock. */

Which lock? :)
Comment 22 Sebastian Dröge (slomo) 2016-04-19 07:03:39 UTC
Review of attachment 326288 [details] [review]:

::: tests/check/gst/stream.c
@@ +141,3 @@
   pool = gst_rtsp_address_pool_new ();
   fail_unless (gst_rtsp_address_pool_add_range (pool, "192.168.1.1",
+          "192.168.1.1", 6000, 6001, 0));

I assume this will also fail if you don't have an interface in the network when binding the socket?

@@ +325,3 @@
+/* Having an issue with g_socket_bind() for the IPV6 multicast address. With the
+ * return logic in gst_rtsp_stream_allocate_udp_sockets(), this portion now 
+ * fails. */

Is it https://bugzilla.gnome.org/show_bug.cgi?id=764679 ?
Comment 23 Jake Foytik 2016-04-19 14:25:07 UTC
> @@ +1492,3 @@
> +    if (transport_udpsrcs->udpsrc[0] == NULL
> +        && transport_udpsrcs->udpsrc[1] == NULL)
> +      g_slice_free (GstRTSPStreamUDPSrcs, transport_udpsrcs);
>
> The problem here is that you free the transport if udpsrc[1] == NULL. No matter 
> if udpsrc[0] is NULL or not. Doesn't this fail for streams without RTCP now    
> because you will free the data and not keep track of it?

The transport_udpsrcs struct will only be freed if both udpsrc[0] AND udpsrc[1] are NULL. If they are both NULL, there is no need to add an empty struct to the hash table. If only one udpsrc is NULL, then the struct is added to the hash table. This can be changed to the following if you think it makes it clearer:

    if (transport_udpsrcs->udpsrc[0] != NULL
        || transport_udpsrcs->udpsrc[1] != NULL)
      g_hash_table_insert (priv->udpsrcs, ct, transport_udpsrcs);
    else
      g_slice_free (GstRTSPStreamUDPSrcs, transport_udpsrcs);
      
I'll create an updated path to include the recommended changes.
Comment 24 Jake Foytik 2016-04-21 11:53:17 UTC
Created attachment 326483 [details] [review]
Updated patch

Made the recommended changes and combined the changes into a single patch.
Comment 25 Sebastian Dröge (slomo) 2016-04-21 12:06:55 UTC
Review of attachment 326483 [details] [review]:

Generally looks good now, we just need to solve the unit test failure :)

::: tests/check/gst/stream.c
@@ +326,3 @@
+ * return logic in gst_rtsp_stream_allocate_udp_sockets(), this portion now 
+ * fails. */
+#ifdef ENABLE_FAILING_TESTS

Are these here now the same as https://bugzilla.gnome.org/show_bug.cgi?id=764679 ?
Comment 26 Jake Foytik 2016-04-22 13:21:47 UTC
> Are these here now the same as https://bugzilla.gnome.org/show_bug.cgi?id=764679 ?

It seems like it is the same. I can get past the unit test errors if I bind to ANY for IPV6 multicast (as it is currently done for Windows only). See rtsp-stream.c:1294.
Comment 27 Sebastian Dröge (slomo) 2016-04-22 16:22:25 UTC
Thanks, I'll try to get all the patches in together then. This one here is definitely good to go, we only have to find a proper solution for the UDP problem in the other bug first :)
Comment 28 Sebastian Dröge (slomo) 2016-04-25 06:37:45 UTC
Comment on attachment 326483 [details] [review]
Updated patch

Can you attach a new patch that a) doesn't have the changes in "common", and b) keeps the failing tests enabled? Thanks :)
Comment 29 Jake Foytik 2016-04-25 13:18:28 UTC
Created attachment 326687 [details] [review]
Final Patch

Removed the changes in common and removed the #ifdef statements around failing tests.
Comment 30 Sebastian Dröge (slomo) 2016-04-29 08:53:28 UTC
commit fe5f8077c1523206147c746cc40364ea16da669f
Author: Jake Foytik <jake.foytik@ipconfigure.com>
Date:   Mon Apr 25 08:55:25 2016 -0400

    rtsp-stream: Fix crash on cleanup with shared media and multiple udpsrc
    
     - Unicast udpsrcs are now managed in a hash table. This allows for proper cleanup in with shared streams and fixes a memory leak.
     - Unicast udpsrcs are now properly cleaned up when shared connections exit. See the update_transport() function.
     - Create unit test for shared media.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=764744
Comment 31 Xavier Claessens 2016-07-19 14:00:46 UTC
Actually I think this is the wrong fix, we shouldn't have more than one udpsrc in the first place. That's another regression introduced by:

commit f62a9a7eb9fda096fb88c013d6787c28090037a2
Author: Patricia Muscalu <patricia@axis.com>
Date:   Tue Feb 23 14:59:32 2016 +0100

    rtsp-stream: postpone UDP socket allocation until SETUP
    
    Postpone the allocation of the UDP sockets until we know
    what transport has been chosen by the client.
    Both unicast and multicast UDP sources are created in one
    function.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=757488
Comment 32 Xavier Claessens 2016-07-19 14:02:22 UTC
I'm working in a proper fix as part of bug #766612