After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 768249 - rtspsrc lockup on gst_rtspsrc_stop
rtspsrc lockup on gst_rtspsrc_stop
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
1.8.2
Other Linux
: Normal major
: 1.8.3
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2016-06-30 14:48 UTC by Sergio Torres Soldado (zenx)
Modified: 2016-07-25 10:27 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Fix for hanging GstRTSPConnection (930 bytes, patch)
2016-07-04 16:24 UTC, Sergio Torres Soldado (zenx)
committed Details | Review

Description Sergio Torres Soldado (zenx) 2016-06-30 14:48:36 UTC
I am getting a situation that is quite hard to reproduce which essentially locks up a pipeline making it unrecoverable.

Thread 2 (Thread 27966.27968)

  • #0 __lll_lock_wait
    at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 135
  • #1 __GI___pthread_mutex_lock
    at ../nptl/pthread_mutex_lock.c line 116
  • #2 g_rec_mutex_lock
    at .././../glib/gthread-posix.c line 389
  • #3 gst_rtspsrc_stop
    at ../.././../gst/rtsp/gstrtspsrc.c line 7717
  • #4 gst_rtspsrc_change_state
    at ../.././../gst/rtsp/gstrtspsrc.c line 7799
  • #5 gst_element_change_state
    at .././../gst/gstelement.c line 2648
  • #6 gst_element_set_state_func
    at .././../gst/gstelement.c line 2602
  • #7 gst_element_set_state
    at .././../gst/gstelement.c line 2503
  • #8 gst_bin_element_set_state
  • #9 gst_bin_change_state_func
    at .././../gst/gstbin.c line 2756
  • #10 gst_element_change_state
    at .././../gst/gstelement.c line 2648
  • #11 gst_element_set_state_func
    at .././../gst/gstelement.c line 2602
  • #12 gst_element_set_state
    at .././../gst/gstelement.c line 2503
  • #13 gst_bin_element_set_state
    at .././../gst/gstbin.c line 2414
  • #14 gst_bin_change_state_func
    at .././../gst/gstbin.c line 2756
  • #15 gst_pipeline_change_state
    at .././../gst/gstpipeline.c line 499
  • #16 gst_element_change_state
    at .././../gst/gstelement.c line 2648
  • #17 gst_element_continue_state
    at .././../gst/gstelement.c line 2356
  • #18 gst_element_change_state
    at .././../gst/gstelement.c line 2687
  • #19 gst_element_continue_state
    at .././../gst/gstelement.c line 2356
  • #20 gst_element_change_state
    at .././../gst/gstelement.c line 2694
  • #21 gst_element_set_state_func
    at .././../gst/gstelement.c line 2602
  • #22 gst_element_set_state
    at .././../gst/gstelement.c line 2503

Comment 1 Sergio Torres Soldado (zenx) 2016-06-30 14:49:44 UTC
It seems that the socket is being created as blocking with an infinite timeout.
Comment 2 Sebastian Dröge (slomo) 2016-06-30 20:29:25 UTC
There are two things here: a) the reading happens without timeout (there should be one, GstRTSPConnection has one and that should've been configured there), b) the reading happens without a GCancellable too (it's 0x0) and ideally there should be one which would be triggered exactly before gst_rtspsrc_stop() takes the mutex so that everything just shuts down asap.
Comment 3 Sergio Torres Soldado (zenx) 2016-07-01 15:04:32 UTC
Looking at the backtrace g_socket_input_stream_read() doesn't take into account any timeout, it calls g_socket_receive_with_blocking() which calls g_socket_receive_with_timeout() with timeout=-1 always. I have noticed that gstreamer sometimes sets canceable to false, probably from build_next() in gstrtspconnection.c
Comment 4 Nicolas Dufresne (ndufresne) 2016-07-01 23:17:49 UTC
I think I provided some wip patch addressing this in another bug no?
Comment 5 Sebastian Dröge (slomo) 2016-07-04 07:51:30 UTC
That would be https://bugzilla.gnome.org/show_bug.cgi?id=744209 I guess? That seems to target a different problem, but the general problem here really is the same (GstRTSPConnection needing some bigger refactoring to prevent all such problems and make the code simpler).
Comment 6 Sergio Torres Soldado (zenx) 2016-07-04 12:41:58 UTC
Thanks Nicolas, I will soon try out that patch but first I would like to understand why this is happening.

Why is conn->may_cancel being set to FALSE in build_next() what consequence does removing this have? It seems like it could resolve the issue or not setting the socket to block forever.
Comment 7 Sergio Torres Soldado (zenx) 2016-07-04 16:24:51 UTC
Created attachment 330857 [details] [review]
Fix for hanging GstRTSPConnection

If conn->may_cancel isn't reset to TRUE the next entrance in build_next may hang if there is no data received through read_bytes since the socket is blocking with an infinite timeout.
Comment 8 Sergio Torres Soldado (zenx) 2016-07-06 13:03:06 UTC
I have been using this patch for several days now and it seems to resolve the issue.
Comment 9 Sebastian Dröge (slomo) 2016-07-07 07:27:43 UTC
Review of attachment 330857 [details] [review]:

::: gst-libs/gst/rtsp/gstrtspconnection.c
@@ +2162,3 @@
   }
 done:
+  conn->may_cancel = TRUE;

This seems to defeat the purpose of this may_cancel flag. If you end up here because STATE_START and then STATE_DATA_HEADER/READ_LINES, and then no more data is there until the next iteration, it would go out here.

The intention of the flag seems to be to not allow interruptions in the middle of RTSP messages (why?!) and at the end of the message it will go to STATE_END where may_cancel is set to TRUE again.


Where does it end up for you, what's the events that lead to reading being called without timeout/cancellable?

@@ +2168,3 @@
 invalid_body_len:
   {
+    conn->may_cancel = TRUE;

This one probably makes sense, but after this the connection is unusable anyway, right?
Comment 10 Sergio Torres Soldado (zenx) 2016-07-07 09:09:27 UTC
Hi Sebastian, thank you for your help.

It looks like the may_cancel flag is used only inside the while(true) loop in build_next() and NOT re-establishing it's state on exit looks odd. If an error inside the loop occurs (maybe because of lost data, lost synchronisation because the remote endpoint sent invalid data, ...) it seems important to re-establish the cancelability, i.e. not doing that only upon receiving a whole, valid rtsp message.

If the remote endpoint randomly disappears and somehow the read functions return an error the next build_next() re-entrance read of the first byte of the header hangs forever when it should return 0 and let gst_rtspsrc_loop_udp() call gst_rtsp_conninfo_reconnect(). In that case you can't change the state of the rtspsrc element to NULL for instance, resulting in a deadlock.
Comment 11 Sebastian Dröge (slomo) 2016-07-07 09:11:28 UTC
But isn't this while loop just running until all data was read from the connection? That is, isn't it stopping when currently no new data is available in the middle of a message... and would then later continue from the middle of the message if data becomes available again?
Comment 12 Sergio Torres Soldado (zenx) 2016-07-07 09:38:55 UTC
Yes it seems to try and complete a whole message. But if the rtsp server looses it's state and responds with invalid data that triggers an error on our side, and after that stops responding won't try and establish a new rtsp connection. If this makes any sense.
Comment 13 Sebastian Dröge (slomo) 2016-07-07 09:43:08 UTC
Yes if state is lost, things should reset.

However you would now also make reading in the middle of a message interruptible if reading a single messages is split over multiple calls. Or not?
Comment 14 Sergio Torres Soldado (zenx) 2016-07-07 10:04:13 UTC
I don't think so because fill_raw_bytes() calls g_input_stream_read() and handles it "NULL" if conn->may_cancel is FALSE. This calls g_socket_input_stream_read() which calls g_socket_receive_with_blocking() with blocking=TRUE BUT with a NULL cancellable, which ultimately means the rtspsrc holds the task mutex until data arrives, which might be never.
Comment 15 Sebastian Dröge (slomo) 2016-07-07 15:28:50 UTC
Right, so in that case problems are even worse and we would block there potentially forever :) I guess the patch is ok then, just that there are far more problems.
Comment 16 Sebastian Dröge (slomo) 2016-07-07 16:15:33 UTC
commit 75f43da4c8f1531f714f17c1674dde21d09fc43d
Author: Sergio Torres Soldado <torres.soldado@gmail.com>
Date:   Mon Jul 4 17:19:08 2016 +0100

    rtspconnection: Fix potential deadlock caused by blocking read forever
    
    Reset the connection "may_cancel" property to avoid a potential deadlock
    if there is no data to read and the socket stays blocked forever.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=768249
Comment 17 Sergio Torres Soldado (zenx) 2016-07-07 16:32:04 UTC
cool, thank you Sebastian :)