Bug 761940 – v4l2: deadlock when using dmabuf-import (zero-copy) between two V4L2 devices

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 761940 - v4l2: deadlock when using dmabuf-import (zero-copy) between two V4L2 devices


Summary:	v4l2: deadlock when using dmabuf-import (zero-copy) between two V4L2 devices


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	1.4.5
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-02-12 13:16 UTC by Carlos Alberto Lopez Perez
Modified:	2016-06-10 15:56 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
gst-inspect-1.0 v4l2video2h264enc (4.31 KB, text/plain) 2016-02-12 13:17 UTC, Carlos Alberto Lopez Perez		Details
gst-inspect-1.0 v4l2src (9.16 KB, text/plain) 2016-02-12 13:18 UTC, Carlos Alberto Lopez Perez		Details
Patch that makes zero-copy work between two V4L2 devices. (1.42 KB, patch) 2016-02-12 13:25 UTC, Carlos Alberto Lopez Perez	needs-work	Details \| Review

Description Carlos Alberto Lopez Perez 2016-02-12 13:16:53 UTC

Hi,

I'm working on a Freescale I.MX6 board that has a hardware video encoder (CODA)
that is exposed as a V4L2 device (/dev/video2).

This board has also a camera connected that is exposed as another V4L2 device
(/dev/video0).

I'm using GStreamer to encode and stream video from the camera.
The pipeline I'm using is this:

1. On the board: To record, encode and stream the video:
$ gst-launch-1.0 v4l2src ! video/x-raw,width=1280,height=720,framerate=30/1 ! v4l2video2h264enc ! h264parse ! rtph264pay pt=96 config-interval=5 ! udpsink host=${my.local.ip.addr} port=5555 


2. On my PC to receive the video and measure the FPS at the same time:
$ gst-launch-1.0 -e -v udpsrc port=5555 ! application/x-rtp, payload=96 ! rtpjitterbuffer ! rtph264depay ! avdec_h264 ! fpsdisplaysink sync=false text-overlay=false 


Everything works fine, on my PC I get the video playback at 30FPS without problems.


The issue is that this is taking around 45% of CPU usage on the board due to the memcopies of raw 720p video from v4l2src0:pool:src to v4l2video2h264enc0:pool:sink

So, to fix that I want to use zero-copy (via dmabuf-import).

So, I change the pipeline on the board to:


gst-launch-1.0 v4l2src io-mode=dmabuf ! video/x-raw,width=1280,height=720,framerate=30/1 ! v4l2video2h264enc output-io-mode=dmabuf-import ! h264parse ! rtph264pay pt=96 config-interval=5 ! udpsink host=${my.local.ip.addr} port=5555 

The issue is that is not working. I get a deadlock.

There are 3 bufferpools at play here:


  - v4l2src0:pool:src             -> The source of the video from the camera V4L2 device /dev/video0
  - v4l2video2h264enc0:pool:sink  -> The input of the encoder (the sink pad of the V4L2 hardware encoder at /dev/video2). It matches the output-io-mode property of v4l2video2h264enc
  - v4l2video2h264enc0:pool:src	  -> The output of the enconder (The src pad of the V4L2 hardware encoder at /dev/video2). It matches the capture-io-mode property of v4l2video2h264enc
  
  See the attached files properties_of_v4l2src.txt and properties_of_v4l2video2h264enc.txt for more info
  

Some comments about the versions used:
 - I'm using gstreamer and gstreamer-plugins-good 1.4.5
 - For the v4l2video encoder element of gstreamer-plugins-good I'm using the patches available in bug 728438 <https://gitlab.com/veo-labs/gst-plugins-good>

Comment 1 Carlos Alberto Lopez Perez 2016-02-12 13:17:46 UTC

Created attachment 320972 [details]
gst-inspect-1.0 v4l2video2h264enc

Comment 2 Carlos Alberto Lopez Perez 2016-02-12 13:18:15 UTC

Created attachment 320973 [details]
gst-inspect-1.0 v4l2src

Comment 3 Carlos Alberto Lopez Perez 2016-02-12 13:21:22 UTC

I have been debugging this, and I have observed that if I raise GST_V4L2_MIN_BUFFERS from 2 to 8 I get video playback working for ~1 second and then it deadlocks again.

After more debugging, I think that the deadlocks happens when the bufferpool is full.

Comment 4 Carlos Alberto Lopez Perez 2016-02-12 13:22:58 UTC

Finally I have applied a small patch that don't lets the bufferpool get full and it suddenly everything works as expected.

Video playback is working fine and stable (30FPS without issue) using zero-copy (dmabuf-import).

And the CPU usage drops to a ~[3-15]% which is much better than without zero-copy.


I'm attaching the patch for comments.

Comment 5 Carlos Alberto Lopez Perez 2016-02-12 13:25:03 UTC

Created attachment 320975 [details] [review]
Patch that makes zero-copy work between two V4L2 devices.

Comment 6 Nicolas Dufresne (ndufresne) 2016-02-12 17:02:44 UTC

(In reply to Carlos Alberto Lopez Perez from comment #0)
> Some comments about the versions used:
>  - I'm using gstreamer and gstreamer-plugins-good 1.4.5

Please try with 1.6 or master, so we know if that has been fixed in recent version or not.

>  - For the v4l2video encoder element of gstreamer-plugins-good I'm using the
> patches available in bug 728438
> <https://gitlab.com/veo-labs/gst-plugins-good>

Obviously, if there is a bug in that element, we can only guide you. There is quite a large list of todo's in the review bug for this element. I started on my free-time to address them, but my free time is limit lately.

Comment 7 Nicolas Dufresne (ndufresne) 2016-02-12 17:04:16 UTC

(In reply to Carlos Alberto Lopez Perez from comment #3)
> I have been debugging this, and I have observed that if I raise
> GST_V4L2_MIN_BUFFERS from 2 to 8 I get video playback working for ~1 second
> and then it deadlocks again.
> 
> After more debugging, I think that the deadlocks happens when the bufferpool
> is full.

We call this a stall, not a dealock btw. It happens if you exhaust a buffer pool capabilities. This would indicate the the encoder is not requesting enough buffer from upstream (see propose_allocation) or is not releasing those buffers properly.

Comment 8 Nicolas Dufresne (ndufresne) 2016-06-06 21:30:59 UTC

Review of attachment 320975 [details] [review]:

I think a better solution would be to implement a try_dqbuf. Just like dqbuf except it would be non-blocking. This way, for every queued buffer, we'd run a loop to release anything no longer in use. This would reduce the amount of buffer waste that we have atm.

Comment 9 Nicolas Dufresne (ndufresne) 2016-06-07 18:24:33 UTC

Is there still any interest here ?