After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 775014 - v4l2src buffer leak, race condition sending EOS from another thread
v4l2src buffer leak, race condition sending EOS from another thread
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal normal
: 1.8.4
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks: 772521
Reported: 2016-11-24 14:58 UTC by Philipp Zabel
Modified: 2016-11-24 21:14 UTC
See Also:
GNOME target: ---
GNOME version: ---

v4l2src: release buffer if create fails (830 bytes, patch)
2016-11-24 14:58 UTC, Philipp Zabel
committed Details | Review
gstv4l2bufferpool: remove critical error message when process is called on an inactive pool (1.21 KB, patch)
2016-11-24 14:59 UTC, Philipp Zabel
committed Details | Review

Description Philipp Zabel 2016-11-24 14:58:47 UTC
Created attachment 340686 [details] [review]
v4l2src: release buffer if create fails

If a v4l2src is sent an EOS event from another thread, the buffer pool is set to flushing/inactive.
If at the same time the v4l2src thread is blocking on a DQBUF poll in gst_v4l2src_create, it continues to run and may acquire a buffer. In that case it will call gst_v4l2_buffer_pool_process with the pool already inactive.
gst_v4l2_buffer_pool_process throws a critical error message and returns GST_FLOW_ERROR, and then gst_v4l2src_create just returns that error without decrementing the buffer refcount. The returned buffer is then ignored by gst_base_src_get_range due to the error return value.

The observed result is that when repeatedly starting and stopping a pipeline this way, sometimes gst_v4l2src_stop is not called due to the outstanding buffer that is never returned, the v4l2 queue doesn't free its buffers, and during the next stream start the REQBUFS fails with -EBUSY.
Comment 1 Philipp Zabel 2016-11-24 14:59:33 UTC
Created attachment 340687 [details] [review]
gstv4l2bufferpool: remove critical error message when process is called on an inactive pool

If the pool is inactive, it is guaranteed to also be flushing, so the
following check will return GST_FLOW_FLUSHING anyway.
This can happen if a v4l2src is blocking on DQBUF in create and is sent
an EOS event on another thread. In that case the pool is set to
flushing/inactive without locking, the v4l2src is unblocked, and may
call pool_process with a valid buffer on the already inactive pool.
Comment 2 Nicolas Dufresne (ndufresne) 2016-11-24 15:29:59 UTC
Review of attachment 340687 [details] [review]:

::: sys/v4l2/gstv4l2bufferpool.c
@@ +1706,2 @@
     return GST_FLOW_FLUSHING;

Would it also work to move this assertion after the flushing check ? Otherwise, I'm fine with this change. I have notice that this assertion was popping in various error conditions too (including unplugging a USB camera).
Comment 3 Nicolas Dufresne (ndufresne) 2016-11-24 15:39:54 UTC
Review of attachment 340686 [details] [review]:

Will also be backported.
Comment 4 Nicolas Dufresne (ndufresne) 2016-11-24 15:52:39 UTC
Review of attachment 340686 [details] [review]:

Comment 5 Nicolas Dufresne (ndufresne) 2016-11-24 15:57:34 UTC
Could also cause bug #772521 from time to time, we notice two behaviours.
Comment 6 Philipp Zabel 2016-11-24 15:58:03 UTC
gst_base_src_send_event (GST_EVENT_EOS) calls gst_base_src_activate_pool (FALSE) -> gst_buffer_pool_set_active (FALSE), which calls do_set_flushing (TRUE) and potentially do_stop () in that order and only then sets priv->active = FALSE.
I think gstbufferpool priv->active can never be FALSE unless pool->flushing is already set to 1, so moving that assertion after the GST_FLOW_FLUSHING return should not be a problem. It should never be hit.

Here is the relevant part of a log in the error case:

First EOS is sent from a thread:

0:00:52.897536007  2797 0x552bbe00 DEBUG       GST_ELEMENT_PADS gstelement.c:1592:gst_element_send_event: send eos event on element v4l2src0
0:00:52.897605007  2797 0x552bbe00 DEBUG                basesrc gstbasesrc.c:1752:gst_base_src_send_event:<v4l2src0> handling event 0x7320bdb8 eos event: 0x7320bdb8, time 99:99:99.999999999, seq-num 2776, (NULL)
0:00:52.897678007  2797 0x552bbe00 DEBUG                basesrc gstbasesrc.c:1839:gst_base_src_send_event:<v4l2src0> EOS marked, calling unlock
0:00:52.897729007  2797 0x552bbe00 DEBUG                basesrc gstbasesrc.c:3027:gst_base_src_activate_pool:<v4l2src0> gst_base_src_activate_pool pool=0x5538dd38 active=0
0:00:52.897792673  2797 0x552bbe00 LOG               bufferpool gstbufferpool.c:473:gst_buffer_pool_set_active:<v4l2src0:pool:src> active 0
0:00:52.897845340  2797 0x552bbe00 DEBUG         v4l2bufferpool gstv4l2bufferpool.c:933:gst_v4l2_buffer_pool_flush_start:<v4l2src0:pool:src> start flushing
0:00:52.897901340  2797 0x552bbe00 LOG               bufferpool gstbufferpool.c:505:gst_buffer_pool_set_active:<v4l2src0:pool:src> outstanding buffers 1
0:00:52.897962673  2797 0x552bbe00 DEBUG                basesrc gstbasesrc.c:1849:gst_base_src_send_event:<v4l2src0> unlock called, waiting for LIVE_LOCK

then ->alloc() returns with a buffer and gst_v4l2src_create continues:

0:00:52.896312673  2797 0x74701f80 DEBUG               GST_POLL gstpoll.c:1066:gst_poll_fd_has_error: 0x55389050: fd (fd:22, idx:1) 0
0:00:52.898107673  2797 0x74701f80 LOG           v4l2bufferpool gstv4l2bufferpool.c:1162:gst_v4l2_buffer_pool_dqbuf:<v4l2src0:pool:src> dequeueing a buffer
0:00:52.898275340  2797 0x74701f80 LOG            v4l2allocator gstv4l2allocator.c:1331:gst_v4l2_allocator_dqbuf:<v4l2src0:pool:src:allocator> dequeued buffer 4 (flags 0x2005)
0:00:52.898386673  2797 0x74701f80 DEBUG          v4l2allocator gstv4l2allocator.c:1335:gst_v4l2_allocator_dqbuf:<v4l2src0:pool:src:allocator> driver pretends buffer is queued even if dequeue succeeded
0:00:52.898482673  2797 0x74701f80 LOG           v4l2bufferpool gstv4l2bufferpool.c:1194:gst_v4l2_buffer_pool_dqbuf:<v4l2src0:pool:src> dequeued buffer 0x7320a850 seq:14 (ix=4), mem 0x5537f568 used 829440, plane=0, flags 00002001, ts 1:04:25.730204000, pool-queued=4, buffer=0x7320a850
0:00:52.898690673  2797 0x74701f80 DEBUG         v4l2bufferpool gstv4l2bufferpool.c:1722:gst_v4l2_buffer_pool_process:<v4l2src0:pool:src> process buffer 0x726d2c5c

** (videosrc_dualstream_scaler:2797): CRITICAL **: gst_v4l2_buffer_pool_process: assertion 'gst_buffer_pool_is_active (bpool)' failed
0:00:52.902650007  2797 0x74701f80 DEBUG                v4l2src gstv4l2src.c:912:gst_v4l2src_create:<v4l2src0> error processing buffer -5 (error)
Comment 7 Nicolas Dufresne (ndufresne) 2016-11-24 16:40:52 UTC
I think I'll just leave the patch as-is. I must admit the assert was a bit of a debugging left over, trying to chase issues. I'm been testing those patch in the last few hours and they definitely improve the state.
Comment 8 Nicolas Dufresne (ndufresne) 2016-11-24 16:47:27 UTC
Attachment 340686 [details] pushed as 65826f8 - v4l2src: release buffer if create fails
Attachment 340687 [details] pushed as 6b58076 - gstv4l2bufferpool: remove critical error message when process is called on an inactive pool
Comment 9 Nicolas Dufresne (ndufresne) 2016-11-24 17:21:32 UTC
Comment 10 Nicolas Dufresne (ndufresne) 2016-11-24 21:12:15 UTC