Bug 731015 – v4l2src: deadlock on shutdown

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 731015 - v4l2src: deadlock on shutdown


Summary:	v4l2src: deadlock on shutdown


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	1.3.91
Assigned To:	Nicolas Dufresne (ndufresne)
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Duplicates:	725886 (view as bug list)
Depends on:
Blocks:

Reported:	2014-05-30 20:46 UTC by Thiago Sousa Santos
Modified:	2014-07-04 12:57 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
[PATCH] v4l2bufferpool: Cleanup poll method and retry on EINTR/EAGAIN (1.68 KB, patch) 2014-05-30 23:46 UTC, Nicolas Dufresne (ndufresne)	committed	Details \| Review
[PATCH] v4l2bufferpool: Keep the GstPollFD around and check errors (2.53 KB, patch) 2014-05-30 23:46 UTC, Nicolas Dufresne (ndufresne)	rejected	Details \| Review

Description Thiago Sousa Santos 2014-05-30 20:46:03 UTC

When testing the pipeline:

gst-launch-1.0 v4l2src ! videoconvert ! queue ! motioncells ! queue ! videoconvert ! autovideosink

the video freezes at startup and trying to shutdown the pipeline deadlocks. I couldn't figure out much from the bt and using GST_DEBUG makes it work. Not sure why qbuf/dqbuf aren't running as dqbuf should be waiting for buffers to be available but qbuf is waiting for some lock to be released.

+ Trace 233650

Thread 11 (Thread 0x7fffc4bb2700 (LWP 10267))

#0 syscall
at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
#1 ??
from /usr/lib/libtbb.so.2
#2 ??
from /usr/lib/libtbb.so.2
#3 start_thread
at pthread_create.c line 312
#4 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 111

Thread 10 (Thread 0x7fffc47b1700 (LWP 10268))

#0 syscall
at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
#1 ??
from /usr/lib/libtbb.so.2
#2 ??
from /usr/lib/libtbb.so.2
#3 start_thread
at pthread_create.c line 312
#4 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 111

Thread 9 (Thread 0x7fffc4fb3700 (LWP 10266))

#0 syscall
at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
#1 ??
from /usr/lib/libtbb.so.2
#2 ??
from /usr/lib/libtbb.so.2
#3 start_thread
at pthread_create.c line 312
#4 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 111

Thread 6 (Thread 0x7fffde463700 (LWP 10263))

#0 syscall
at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
#1 ??
from /usr/lib/x86_64-linux-gnu/libv4lconvert.so.0
#2 ??
from /usr/lib/x86_64-linux-gnu/libv4l2.so.0
#3 v4l2_ioctl
from /usr/lib/x86_64-linux-gnu/libv4l2.so.0
#4 gst_v4l2_allocator_dqbuf
at gstv4l2allocator.c line 1283
#5 gst_v4l2_buffer_pool_dqbuf
at gstv4l2bufferpool.c line 1027
#6 gst_v4l2_buffer_pool_acquire_buffer
at gstv4l2bufferpool.c line 1132
#7 gst_buffer_pool_acquire_buffer
at gstbufferpool.c line 1206
#8 gst_v4l2src_create
at gstv4l2src.c line 624
#9 gst_base_src_get_range
at gstbasesrc.c line 2445
#10 gst_base_src_loop
at gstbasesrc.c line 2721
#11 gst_task_func
at gsttask.c line 317
#12 g_thread_pool_thread_proxy
at /tmp/buildd/glib2.0-2.40.0/./glib/gthreadpool.c line 307
#13 g_thread_proxy
at /tmp/buildd/glib2.0-2.40.0/./glib/gthread.c line 764
#14 start_thread
at pthread_create.c line 312
#15 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 111

Thread 4 (Thread 0x7fffdf465700 (LWP 10261))

#0 __lll_lock_wait
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 135
#1 _L_lock_913
from /lib/x86_64-linux-gnu/libpthread.so.0
#2 __GI___pthread_mutex_lock
at ../nptl/pthread_mutex_lock.c line 79
#3 v4l2_ioctl
from /usr/lib/x86_64-linux-gnu/libv4l2.so.0
#4 gst_v4l2_allocator_qbuf
at gstv4l2allocator.c line 1235
#5 gst_v4l2_buffer_pool_qbuf
at gstv4l2bufferpool.c line 991
#6 gst_v4l2_buffer_pool_release_buffer
at gstv4l2bufferpool.c line 1235
#7 gst_buffer_pool_release_buffer
at gstbufferpool.c line 1284
#8 _gst_buffer_dispose
at gstbuffer.c line 541
#9 gst_mini_object_unref
at gstminiobject.c line 446
#10 gst_buffer_unref
at ../../../gst/gstbuffer.h line 360
#11 gst_base_transform_handle_buffer
at gstbasetransform.c line 2128
#12 gst_base_transform_chain
at gstbasetransform.c line 2224
#13 gst_pad_chain_data_unchecked
at gstpad.c line 3827
#14 gst_pad_push_data
at gstpad.c line 4060
#15 gst_pad_push
at gstpad.c line 4171
#16 gst_queue_push_one
at gstqueue.c line 1118
#17 gst_queue_loop
at gstqueue.c line 1247
#18 gst_task_func
at gsttask.c line 317
#19 g_thread_pool_thread_proxy
at /tmp/buildd/glib2.0-2.40.0/./glib/gthreadpool.c line 307
#20 g_thread_proxy
at /tmp/buildd/glib2.0-2.40.0/./glib/gthread.c line 764
#21 start_thread
at pthread_create.c line 312
#22 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 111

Thread 1 (Thread 0x7ffff7fcd700 (LWP 10226))

#0 __lll_lock_wait
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 135
#1 _L_lock_1086
from /lib/x86_64-linux-gnu/libpthread.so.0
#2 __GI___pthread_mutex_lock
at ../nptl/pthread_mutex_lock.c line 134
#3 g_mutex_lock
at /tmp/buildd/glib2.0-2.40.0/./glib/gthread-posix.c line 209
#4 gst_base_src_set_playing
at gstbasesrc.c line 3627
#5 gst_base_src_change_state
at gstbasesrc.c line 3802
#6 gst_v4l2src_change_state
at gstv4l2src.c line 598
#7 gst_element_change_state
at gstelement.c line 2600
#8 gst_element_set_state_func
at gstelement.c line 2556
#9 gst_bin_element_set_state
at gstbin.c line 2323
#10 gst_bin_change_state_func
at gstbin.c line 2660
#11 gst_pipeline_change_state
at gstpipeline.c line 469
#12 gst_element_change_state
at gstelement.c line 2600
#13 gst_element_set_state_func
at gstelement.c line 2556
#14 main
at gst-launch.c line 1135

Comment 1 Nicolas Dufresne (ndufresne) 2014-05-30 21:05:25 UTC

I get no deadlock here, tested with 3 different cams, git master (no trace). Do you have better steps to reproduce ?

Comment 2 Thiago Sousa Santos 2014-05-30 21:13:26 UTC

Not really, all I do is run that pipeline and it locks. It works if I remove motioncells, but also happens if I use facedetect. I'm assuming this just happens with any element that adds a huge latency? Maybe latency enough for the bufferpool to deplete before preroll?

If you have any debugging tip I can try here.

Comment 3 Nicolas Dufresne (ndufresne) 2014-05-30 23:36:12 UTC

Ok, this only occurs with libv4l2, due to a bug in libv4l2, but also a bug in GStreamer, which is that we don't check polling errors and endup blocking  in dqbuf.

Now, interesting fact:

This works more often:

gst-launch-1.0 v4l2src ! video/x-raw,format=YUY2 ! videoconvert ! ...

In both cases, our pool size is 2, but in the default case, a conversion to RGB is being done in libv4l2, holding the lock ! The driver goes low on buffers, and then report an error during poll, most likely to say, hey, I've lost a frame.

Having more buffers helps, but waste a lot of memory for other uses cases, and on embedded devices. Loosing frames is most likely fine if too slow, we should send gaps. While doing so, some buffers should come back and as we will block on the stream for one frame, it will give time for the device to recover. I have a first patch where are simply spin in the poll on error, not ideal but works. Posting here, I'll finish it later.

Comment 4 Nicolas Dufresne (ndufresne) 2014-05-30 23:46:17 UTC

Created attachment 277589 [details] [review]
[PATCH] v4l2bufferpool: Cleanup poll method and retry on EINTR/EAGAIN


https://bugzilla.gnome.org/show_bug.cgi?id=731015
---
 sys/v4l2/gstv4l2bufferpool.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

Comment 5 Nicolas Dufresne (ndufresne) 2014-05-30 23:46:22 UTC

Created attachment 277590 [details] [review]
[PATCH] v4l2bufferpool: Keep the GstPollFD around and check errors


Thsi is not yet fully correct yet, as we simply retry on error, which can
spin, until the driver has recovered. The pool will be in error if the signal
is lost, or if there is not enough buffers. I see EINVAL in errno for the later
but haven't find anything int he V4L2 spec that would confirm. A GAP
should be send instead of retrying now.

https://bugzilla.gnome.org/show_bug.cgi?id=731015
---
 sys/v4l2/gstv4l2bufferpool.c | 17 ++++++++++++-----
 sys/v4l2/gstv4l2bufferpool.h |  1 +
 2 files changed, 13 insertions(+), 5 deletions(-)

Comment 6 Nicolas Dufresne (ndufresne) 2014-05-31 02:00:12 UTC

*** Bug 725886 has been marked as a duplicate of this bug. ***

Comment 7 Nicolas Dufresne (ndufresne) 2014-05-31 02:55:46 UTC

Ok, went back into the spec, my interpretation was not correct.

"When the application did not call VIDIOC_QBUF or VIDIOC_STREAMON yet the poll() function succeeds, but sets the POLLERR flag in the revents field."

I didn't expect that this was also in the case the application have dequeued all the buffers. So basically this error mean that there is no buffers currently queued. The default acquire function in GstBufferPool can wait for buffers to come back, though it's looping around the alloc method. If a drivers had support for CREATE_BUFS, this would be a case where we'd want a new buffer to be allocated and queued, but I don't see right now how to do that with the way we implement the buffer pool. In capture, buffers are never put back in the pool internal queue, instead they are put in V4L2 queue, hence the implementation of default_acquire_buffer() is not straight forward for us to use.

Comment 8 Nicolas Dufresne (ndufresne) 2014-06-04 19:16:28 UTC

Comment on attachment 277589 [details] [review]
[PATCH] v4l2bufferpool: Cleanup poll method and retry on EINTR/EAGAIN

commit 1648e46f8649edd131a4b67cf86afcf7798344ea
Author: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Date:   Fri May 30 19:37:57 2014 -0400

    v4l2bufferpool: Cleanup poll method and retry on EINTR/EAGAIN
    
    https://bugzilla.gnome.org/show_bug.cgi?id=731015

Comment 9 Nicolas Dufresne (ndufresne) 2014-06-04 19:17:43 UTC

Comment on attachment 277590 [details] [review]
[PATCH] v4l2bufferpool: Keep the GstPollFD around and check errors

We need to block instead of spinning. Note that the a kernel patch have been sent, so poll simply block on underrun. But considering all drivers are affected, we have to implement a compatible workaround.

Comment 10 Nicolas Dufresne (ndufresne) 2014-07-03 19:34:38 UTC

Please test as much as possible. I did a lot already, but it's a bit tricky. Note that there is no concurrent dqbuf calls.

commit 652ed3bceb4e31fcd5dea9fdc9525fafb3350968
Author: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Date:   Thu Jul 3 15:28:45 2014 -0400

    v4l2bufferpool: Wait before polling if queue is empty
    
    In kernel before 3.17, polling during queue underrun would unblock right
    away and trigger POLLERR. As we are not handling POLLERR, we would endup
    blocking in DQBUF call, which won't be unblocked correctly when going
    to NULL state. A deadlock at start caused by locking error in libv4l2 was
    also seen before this patch. Instead, we wait until the queue is no longer
    empty before polling.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=731015