Bug 760584 – adaptivedemux: tests: hls seek tests sometimes time out

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 760584 - adaptivedemux: tests: hls seek tests sometimes time out


Summary:	adaptivedemux: tests: hls seek tests sometimes time out


Status:	RESOLVED DUPLICATE of bug 760532

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-01-13 16:26 UTC by Florin Apostol
Modified:	2016-01-20 11:52 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Florin Apostol 2016-01-13 16:26:52 UTC

HLS seek tests (testSeekSnapAfterPosition, testReverseSeekSnapBeforePosition, testSeekPosition) are sometimes timing out. The reproducing rate is around once in 5-10 runs.

From my investigation they block in AppSink preroll waiting for a switch to PLAYING event that never comes.

When the AppSink changes state from PAUSED to PLAYING on first buffer received, it will trigger a pipeline state change from PAUSED to PLAYING to be done on a separate thread. This will race with the thread performing seek that will flush the AppSink and make it loose state and change from PLAYING to PAUSED.
Changing pipeline to PLAYING means changing first AppSink element and then the adaptivedemux element.

I identified 3 different scenarios for how AppSink behaves when it receives the first buffer after the seek:

1. AppSink has the pending state PLAYING when doing the preroll. I noticed that this happens when the pipeline change from PAUSED to PLAYING has finished.
In this case gst_base_sink_do_preroll will change state to PLAYING and will not call gst_base_sink_wait_preroll. The test finishes successfully.

2. AppSink has the pending state PAUSED when doing the preroll.
In this case it will wait for a switch to PLAYING. This event comes probably because the pipeline did not change yet the state for AppSink and is doing it now. The test finishes successfully.

3. Similar with 2, but the switch to PLAYING event does not come. The test will block in AppSink gst_base_sink_wait_preroll.
It seems that pipeline changed state of AppSink to PLAYING but this was lost during the flush. Changing adaptivedemux state blocks during the seek. It will resume after the seek, so in this scenario pipeline change managed to change state on Appsink, was interrupted by the seek and finished changing state of adaptivedemux after the seek.

Comment 1 Thiago Sousa Santos 2016-01-15 15:30:56 UTC

Got this today:

+ Trace 235903

Thread 6 (Thread 0x7f6a4a03c700 (LWP 18001))

#0 syscall
at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
#1 g_cond_wait
at /build/glib2.0-2.46.2/./glib/gthread-posix.c line 1397
#2 gst_task_func
at gsttask.c line 316
#3 g_thread_pool_thread_proxy
at /build/glib2.0-2.46.2/./glib/gthreadpool.c line 307
#4 g_thread_proxy
at /build/glib2.0-2.46.2/./glib/gthread.c line 778
#5 start_thread
at pthread_create.c line 333
#6 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109

Thread 5 (Thread 0x7f6a4963a700 (LWP 18002))

#0 syscall
at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
#1 g_cond_wait
at /build/glib2.0-2.46.2/./glib/gthread-posix.c line 1397
#2 gst_adaptive_demux_stream_download_uri
at gstadaptivedemux.c line 2386
#3 gst_adaptive_demux_stream_download_loop
at gstadaptivedemux.c line 2505
#4 gst_adaptive_demux_stream_download_loop
at gstadaptivedemux.c line 2738
#5 gst_task_func
at gsttask.c line 331
#6 g_thread_pool_thread_proxy
at /build/glib2.0-2.46.2/./glib/gthreadpool.c line 307
#7 g_thread_proxy
at /build/glib2.0-2.46.2/./glib/gthread.c line 778
#8 start_thread
at pthread_create.c line 333
#9 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109

Thread 4 (Thread 0x7f6a48bde700 (LWP 18003))

#0 syscall
at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
#1 g_cond_wait
at /build/glib2.0-2.46.2/./glib/gthread-posix.c line 1397
#2 gst_base_sink_wait_preroll
at gstbasesink.c line 2212
#3 gst_base_sink_do_preroll
at gstbasesink.c line 2306
#4 gst_base_sink_do_sync
at gstbasesink.c line 2508
#5 gst_base_sink_chain_unlocked
at gstbasesink.c line 3494
#6 gst_base_sink_chain_main
at gstbasesink.c line 3655
#7 gst_pad_push_data
at gstpad.c line 4155
#8 gst_pad_push_data
at gstpad.c line 4407
#9 gst_pad_push
at gstpad.c line 4526
#10 gst_adaptive_demux_stream_push_buffer
at gstadaptivedemux.c line 1815
#11 0x00007f6a4ac0e3f2 in
#12 0x0000000000000000 in

Thread 3 (Thread 0x7f6a3bfff700 (LWP 18004))

#0 syscall
at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
#1 g_cond_wait
at /build/glib2.0-2.46.2/./glib/gthread-posix.c line 1397
#2 gst_task_func
at gsttask.c line 316
#3 g_thread_pool_thread_proxy
at /build/glib2.0-2.46.2/./glib/gthreadpool.c line 307
#4 g_thread_proxy
at /build/glib2.0-2.46.2/./glib/gthread.c line 778
#5 start_thread
at pthread_create.c line 333
#6 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109

Thread 2 (Thread 0x7f6a3a843700 (LWP 18006))

#0 syscall
at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
#1 g_cond_wait_until
at /build/glib2.0-2.46.2/./glib/gthread-posix.c line 1444
#2 g_async_queue_pop_intern_unlocked
at /build/glib2.0-2.46.2/./glib/gasyncqueue.c line 422
#3 g_async_queue_timeout_pop
at /build/glib2.0-2.46.2/./glib/gasyncqueue.c line 543
#4 g_thread_pool_thread_proxy
at /build/glib2.0-2.46.2/./glib/gthreadpool.c line 167
#5 g_thread_pool_thread_proxy
at /build/glib2.0-2.46.2/./glib/gthreadpool.c line 364
#6 g_thread_proxy
at /build/glib2.0-2.46.2/./glib/gthread.c line 778
#7 start_thread
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109

Comment 2 Thiago Sousa Santos 2016-01-15 19:42:05 UTC

This is what happens:

1) The pipeline is doing its change from PAUSED to PLAYING in gstbin.c:2764:gst_bin_change_state_func

2) the appsink (src_0) is set from PAUSED to PLAYING successfully

3) Immediately after that appsink does a lost-state and posts an async-start message

4) The pipeline finishes its state change with all returns == SUCCESS and does this check:

  if (ret != GST_STATE_CHANGE_ASYNC && !bin->priv->pending_async_done)

and finishes its state changes. VOID_PENDING

5) appsink posts its async-done and waits for its state to be set to playing but:

GST_STATES gstbin.c:3326:bin_handle_async_done:<pipeline> nothing pending

it is ignored and keeps stalled.

Comment 3 Tim-Philipp Müller 2016-01-15 23:28:57 UTC

wonder if this is the same as I'm seeing in bug #760532.

Comment 4 Thiago Sousa Santos 2016-01-18 12:26:21 UTC

(In reply to Tim-Philipp Müller from comment #3)
> wonder if this is the same as I'm seeing in bug #760532.

Looks very similar, yes

Comment 5 Sebastian Dröge (slomo) 2016-01-19 17:35:48 UTC

I would say it's exactly the same thing. We have two state changes happening at the same time in different threads.

For the other bug, it seems to be fixed by reverting (as then sending the next seek event is blocked until the state change to PLAYING is finished).

commit b427997119a2b6aacbeb550f729936f8b963e24b
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Thu Dec 10 11:35:05 2015 +0200

    element: Don't hold state lock all the time while sending an event
    
    This lock seems to exist only to prevent elements from changing states while
    events are being processed. However events are going to be processed
    nonetheless in those elements if sent directly via pads, so protection must
    already be implemented inside the elements for event handling if it is needed.
    
    As such having the lock here is not very useful and is actually causing
    various deadlocks in different situations as described in
    https://bugzilla.gnome.org/show_bug.cgi?id=744040


That only seems like a workaround though, specific to seek events. There seems to be a generic race condition somewhere with finishing asynchronous state changes and loss of state at the same time.

Comment 6 Sebastian Dröge (slomo) 2016-01-20 08:21:16 UTC

Also the lock there would only catch it when actually sending a SEEK event to an element, as compared to sending to pads. It all seems like a bad workaround :)

Comment 7 Sebastian Dröge (slomo) 2016-01-20 11:52:51 UTC

The reason for everything in bug #760532, including the gst-rtsp-server hangs. Let's track it there then.

*** This bug has been marked as a duplicate of bug 760532 ***