GNOME Bugzilla – Bug 775014
v4l2src buffer leak, race condition sending EOS from another thread
Last modified: 2016-11-24 21:14:43 UTC
Created attachment 340686 [details] [review]
v4l2src: release buffer if create fails
If a v4l2src is sent an EOS event from another thread, the buffer pool is set to flushing/inactive.
If at the same time the v4l2src thread is blocking on a DQBUF poll in gst_v4l2src_create, it continues to run and may acquire a buffer. In that case it will call gst_v4l2_buffer_pool_process with the pool already inactive.
gst_v4l2_buffer_pool_process throws a critical error message and returns GST_FLOW_ERROR, and then gst_v4l2src_create just returns that error without decrementing the buffer refcount. The returned buffer is then ignored by gst_base_src_get_range due to the error return value.
The observed result is that when repeatedly starting and stopping a pipeline this way, sometimes gst_v4l2src_stop is not called due to the outstanding buffer that is never returned, the v4l2 queue doesn't free its buffers, and during the next stream start the REQBUFS fails with -EBUSY.
Created attachment 340687 [details] [review]
gstv4l2bufferpool: remove critical error message when process is called on an inactive pool
If the pool is inactive, it is guaranteed to also be flushing, so the
following check will return GST_FLOW_FLUSHING anyway.
This can happen if a v4l2src is blocking on DQBUF in create and is sent
an EOS event on another thread. In that case the pool is set to
flushing/inactive without locking, the v4l2src is unblocked, and may
call pool_process with a valid buffer on the already inactive pool.
Review of attachment 340687 [details] [review]:
@@ +1706,2 @@
if (GST_BUFFER_POOL_IS_FLUSHING (pool))
Would it also work to move this assertion after the flushing check ? Otherwise, I'm fine with this change. I have notice that this assertion was popping in various error conditions too (including unplugging a USB camera).
Review of attachment 340686 [details] [review]:
Will also be backported.
Review of attachment 340686 [details] [review]:
Could also cause bug #772521 from time to time, we notice two behaviours.
gst_base_src_send_event (GST_EVENT_EOS) calls gst_base_src_activate_pool (FALSE) -> gst_buffer_pool_set_active (FALSE), which calls do_set_flushing (TRUE) and potentially do_stop () in that order and only then sets priv->active = FALSE.
I think gstbufferpool priv->active can never be FALSE unless pool->flushing is already set to 1, so moving that assertion after the GST_FLOW_FLUSHING return should not be a problem. It should never be hit.
Here is the relevant part of a log in the error case:
First EOS is sent from a thread:
0:00:52.897536007 2797 0x552bbe00 DEBUG GST_ELEMENT_PADS gstelement.c:1592:gst_element_send_event: send eos event on element v4l2src0
0:00:52.897605007 2797 0x552bbe00 DEBUG basesrc gstbasesrc.c:1752:gst_base_src_send_event:<v4l2src0> handling event 0x7320bdb8 eos event: 0x7320bdb8, time 99:99:99.999999999, seq-num 2776, (NULL)
0:00:52.897678007 2797 0x552bbe00 DEBUG basesrc gstbasesrc.c:1839:gst_base_src_send_event:<v4l2src0> EOS marked, calling unlock
0:00:52.897729007 2797 0x552bbe00 DEBUG basesrc gstbasesrc.c:3027:gst_base_src_activate_pool:<v4l2src0> gst_base_src_activate_pool pool=0x5538dd38 active=0
0:00:52.897792673 2797 0x552bbe00 LOG bufferpool gstbufferpool.c:473:gst_buffer_pool_set_active:<v4l2src0:pool:src> active 0
0:00:52.897845340 2797 0x552bbe00 DEBUG v4l2bufferpool gstv4l2bufferpool.c:933:gst_v4l2_buffer_pool_flush_start:<v4l2src0:pool:src> start flushing
0:00:52.897901340 2797 0x552bbe00 LOG bufferpool gstbufferpool.c:505:gst_buffer_pool_set_active:<v4l2src0:pool:src> outstanding buffers 1
0:00:52.897962673 2797 0x552bbe00 DEBUG basesrc gstbasesrc.c:1849:gst_base_src_send_event:<v4l2src0> unlock called, waiting for LIVE_LOCK
then ->alloc() returns with a buffer and gst_v4l2src_create continues:
0:00:52.896312673 2797 0x74701f80 DEBUG GST_POLL gstpoll.c:1066:gst_poll_fd_has_error: 0x55389050: fd (fd:22, idx:1) 0
0:00:52.898107673 2797 0x74701f80 LOG v4l2bufferpool gstv4l2bufferpool.c:1162:gst_v4l2_buffer_pool_dqbuf:<v4l2src0:pool:src> dequeueing a buffer
0:00:52.898275340 2797 0x74701f80 LOG v4l2allocator gstv4l2allocator.c:1331:gst_v4l2_allocator_dqbuf:<v4l2src0:pool:src:allocator> dequeued buffer 4 (flags 0x2005)
0:00:52.898386673 2797 0x74701f80 DEBUG v4l2allocator gstv4l2allocator.c:1335:gst_v4l2_allocator_dqbuf:<v4l2src0:pool:src:allocator> driver pretends buffer is queued even if dequeue succeeded
0:00:52.898482673 2797 0x74701f80 LOG v4l2bufferpool gstv4l2bufferpool.c:1194:gst_v4l2_buffer_pool_dqbuf:<v4l2src0:pool:src> dequeued buffer 0x7320a850 seq:14 (ix=4), mem 0x5537f568 used 829440, plane=0, flags 00002001, ts 1:04:25.730204000, pool-queued=4, buffer=0x7320a850
0:00:52.898690673 2797 0x74701f80 DEBUG v4l2bufferpool gstv4l2bufferpool.c:1722:gst_v4l2_buffer_pool_process:<v4l2src0:pool:src> process buffer 0x726d2c5c
** (videosrc_dualstream_scaler:2797): CRITICAL **: gst_v4l2_buffer_pool_process: assertion 'gst_buffer_pool_is_active (bpool)' failed
0:00:52.902650007 2797 0x74701f80 DEBUG v4l2src gstv4l2src.c:912:gst_v4l2src_create:<v4l2src0> error processing buffer -5 (error)
I think I'll just leave the patch as-is. I must admit the assert was a bit of a debugging left over, trying to chase issues. I'm been testing those patch in the last few hours and they definitely improve the state.
Attachment 340686 [details] pushed as 65826f8 - v4l2src: release buffer if create fails
Attachment 340687 [details] pushed as 6b58076 - gstv4l2bufferpool: remove critical error message when process is called on an inactive pool