GNOME Bugzilla – Bug 731015
v4l2src: deadlock on shutdown
Last modified: 2014-07-04 12:57:35 UTC
When testing the pipeline: gst-launch-1.0 v4l2src ! videoconvert ! queue ! motioncells ! queue ! videoconvert ! autovideosink the video freezes at startup and trying to shutdown the pipeline deadlocks. I couldn't figure out much from the bt and using GST_DEBUG makes it work. Not sure why qbuf/dqbuf aren't running as dqbuf should be waiting for buffers to be available but qbuf is waiting for some lock to be released.
+ Trace 233650
Thread 11 (Thread 0x7fffc4bb2700 (LWP 10267))
Thread 10 (Thread 0x7fffc47b1700 (LWP 10268))
Thread 9 (Thread 0x7fffc4fb3700 (LWP 10266))
Thread 6 (Thread 0x7fffde463700 (LWP 10263))
Thread 4 (Thread 0x7fffdf465700 (LWP 10261))
Thread 1 (Thread 0x7ffff7fcd700 (LWP 10226))
I get no deadlock here, tested with 3 different cams, git master (no trace). Do you have better steps to reproduce ?
Not really, all I do is run that pipeline and it locks. It works if I remove motioncells, but also happens if I use facedetect. I'm assuming this just happens with any element that adds a huge latency? Maybe latency enough for the bufferpool to deplete before preroll? If you have any debugging tip I can try here.
Ok, this only occurs with libv4l2, due to a bug in libv4l2, but also a bug in GStreamer, which is that we don't check polling errors and endup blocking in dqbuf. Now, interesting fact: This works more often: gst-launch-1.0 v4l2src ! video/x-raw,format=YUY2 ! videoconvert ! ... In both cases, our pool size is 2, but in the default case, a conversion to RGB is being done in libv4l2, holding the lock ! The driver goes low on buffers, and then report an error during poll, most likely to say, hey, I've lost a frame. Having more buffers helps, but waste a lot of memory for other uses cases, and on embedded devices. Loosing frames is most likely fine if too slow, we should send gaps. While doing so, some buffers should come back and as we will block on the stream for one frame, it will give time for the device to recover. I have a first patch where are simply spin in the poll on error, not ideal but works. Posting here, I'll finish it later.
Created attachment 277589 [details] [review] [PATCH] v4l2bufferpool: Cleanup poll method and retry on EINTR/EAGAIN https://bugzilla.gnome.org/show_bug.cgi?id=731015 --- sys/v4l2/gstv4l2bufferpool.c | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-)
Created attachment 277590 [details] [review] [PATCH] v4l2bufferpool: Keep the GstPollFD around and check errors Thsi is not yet fully correct yet, as we simply retry on error, which can spin, until the driver has recovered. The pool will be in error if the signal is lost, or if there is not enough buffers. I see EINVAL in errno for the later but haven't find anything int he V4L2 spec that would confirm. A GAP should be send instead of retrying now. https://bugzilla.gnome.org/show_bug.cgi?id=731015 --- sys/v4l2/gstv4l2bufferpool.c | 17 ++++++++++++----- sys/v4l2/gstv4l2bufferpool.h | 1 + 2 files changed, 13 insertions(+), 5 deletions(-)
*** Bug 725886 has been marked as a duplicate of this bug. ***
Ok, went back into the spec, my interpretation was not correct. "When the application did not call VIDIOC_QBUF or VIDIOC_STREAMON yet the poll() function succeeds, but sets the POLLERR flag in the revents field." I didn't expect that this was also in the case the application have dequeued all the buffers. So basically this error mean that there is no buffers currently queued. The default acquire function in GstBufferPool can wait for buffers to come back, though it's looping around the alloc method. If a drivers had support for CREATE_BUFS, this would be a case where we'd want a new buffer to be allocated and queued, but I don't see right now how to do that with the way we implement the buffer pool. In capture, buffers are never put back in the pool internal queue, instead they are put in V4L2 queue, hence the implementation of default_acquire_buffer() is not straight forward for us to use.
Comment on attachment 277589 [details] [review] [PATCH] v4l2bufferpool: Cleanup poll method and retry on EINTR/EAGAIN commit 1648e46f8649edd131a4b67cf86afcf7798344ea Author: Nicolas Dufresne <nicolas.dufresne@collabora.com> Date: Fri May 30 19:37:57 2014 -0400 v4l2bufferpool: Cleanup poll method and retry on EINTR/EAGAIN https://bugzilla.gnome.org/show_bug.cgi?id=731015
Comment on attachment 277590 [details] [review] [PATCH] v4l2bufferpool: Keep the GstPollFD around and check errors We need to block instead of spinning. Note that the a kernel patch have been sent, so poll simply block on underrun. But considering all drivers are affected, we have to implement a compatible workaround.
Please test as much as possible. I did a lot already, but it's a bit tricky. Note that there is no concurrent dqbuf calls. commit 652ed3bceb4e31fcd5dea9fdc9525fafb3350968 Author: Nicolas Dufresne <nicolas.dufresne@collabora.com> Date: Thu Jul 3 15:28:45 2014 -0400 v4l2bufferpool: Wait before polling if queue is empty In kernel before 3.17, polling during queue underrun would unblock right away and trigger POLLERR. As we are not handling POLLERR, we would endup blocking in DQBUF call, which won't be unblocked correctly when going to NULL state. A deadlock at start caused by locking error in libv4l2 was also seen before this patch. Instead, we wait until the queue is no longer empty before polling. https://bugzilla.gnome.org/show_bug.cgi?id=731015