GNOME Bugzilla – Bug 727100
v4l2bufferpool: Race condition on qbuf/dqbuf in a multithreaded context
Last modified: 2014-09-17 15:04:37 UTC
Created attachment 273006 [details] [review] Protects some of the pool's variables in case of a multithread access to qbuf/dqbuf There is currently no protection in v4l2bufferpool for concurrent access to a bufferpool on qbuf / dqbuf. A race condition might occur when a v4l2 element shares it's output bufferpool with a multithreaded downstream element. When the downstream element releases the buffer back into the v4l2 bufferpool (thus making a call to qbuf), the v4l2 element might acquire a buffer (thus making a call to dqbuf). If the ioctl on DQBUF returns faster than the one on QBUF, the pool's buffers are not updated and thus the output buffer might be NULL, as we can see below: Thread v4l2 element : GstV4l2scale:using pool alloc Thread v4l2 element : GstV4l2BufferPool:acquire Thread v4l2 element : GstV4l2BufferPool:doing DQBUF Thread downstream element : GstV4l2BufferPool:release buffer 0x73c776f0 Thread downstream element : GstV4l2BufferPool:enqueue buffer 0x73c776f0, index:3, queued:0, flags:00000003 mem:0x6dbbf000 used:663552 Thread downstream element : GstV4l2BufferPool:doing QBUF Thread v4l2 element : GstV4l2BufferPool:No free buffer found in the pool at index 3. Here is a patch that corrects this race condition.
Review of attachment 273006 [details] [review]: ::: sys/v4l2/gstv4l2bufferpool.h @@ +61,3 @@ guint num_queued; /* number of buffers queued in the driver */ guint copy_threshold; /* when our pool runs lower, start handing out copies */ + GMutex mutex; Only details, the object lock is unused, maybe you could use GST_OBJECT_LOCK/UNLOCK instead of creating a new mutex ?
Generally speaking it's a good thing to fix, and I see this as a bug fix. The the buffer pool implementation need more then that to actually be safe to use. So maybe that could be a 1.2 fix, but for 1.4, we need an allocator that recycle the lost memories ;-P
Created attachment 273068 [details] [review] New patch - uses GST_OBJECT_LOCK / UNLOCK instead of a new mutex (1.2.3 ?) You are right, I could have used directly GST_OBJECT_LOCK / UNLOCK - here is the modified patch.
Comment on attachment 273068 [details] [review] New patch - uses GST_OBJECT_LOCK / UNLOCK instead of a new mutex (1.2.3 ?) This patch does not apply on git master. Is this patch on top of 1.2 branch ? Would it be possible to provide and test a patch for git master too ?
I switched to the master branch and my patch is applicable on the HEAD, are you sure you have no other local commits? As it requires updating other libraries, I won't be able to test it that soon, but I'll give it a try.
(In reply to comment #5) > I switched to the master branch and my patch is applicable on the HEAD, are you > sure you have no other local commits? As it requires updating other libraries, > I won't be able to test it that soon, but I'll give it a try. git am still fails for me, head is 39fc394254.
Review of attachment 273068 [details] [review]: ::: sys/v4l2/gstv4l2bufferpool.c @@ +812,3 @@ buf, index, pool->num_queued, meta->vbuffer.flags, meta->mem, meta->vbuffer.bytesused); In git master there is } here.
Created attachment 273167 [details] [review] Patch for Master branch
(In reply to comment #7) > Review of attachment 273068 [details] [review]: > > ::: sys/v4l2/gstv4l2bufferpool.c > @@ +812,3 @@ > buf, index, pool->num_queued, meta->vbuffer.flags, > meta->mem, meta->vbuffer.bytesused); > > > In git master there is } here. Ok, I see - I didn't use git am. Here is the patch for the master branch.
Review of attachment 273167 [details] [review]: ::: sys/v4l2/gstv4l2bufferpool.c @@ +852,3 @@ + pool->buffers[index] = NULL; + pool->num_queued--; + GST_OBJECT_UNLOCK (pool); Maybe we should process QBUF ioctl error in 'queue_failed' block.
(In reply to comment #10) > Review of attachment 273167 [details] [review]: > > ::: sys/v4l2/gstv4l2bufferpool.c > @@ +852,3 @@ > + pool->buffers[index] = NULL; > + pool->num_queued--; > + GST_OBJECT_UNLOCK (pool); > > Maybe we should process QBUF ioctl error in 'queue_failed' block. It's a small change, but it also could be nicer. Btw, there is a big refactoring I'm working on, that add an allocator and all. But when that is done (really soon now), I'll have a look if there is a lockless way to handle that.
Looking at that race, couldn't we solve this lock less, by setting the queue buffer before calling qbuf ? The ioctl is good synchronization by itself.
(In reply to comment #12) > Looking at that race, couldn't we solve this lock less, by setting the queue > buffer before calling qbuf ? The ioctl is good synchronization by itself. Since device is open in blocking mode, I think it should do the job in this case, ie when we are blocked in DQBUF ioctl However I'm thinking about another scenario: we have two threads: first which qbuf one buffer and second which will dqbuf another buffer. So we have: Thread 1 Thread 2 gst_qbuf(pool, buf1) gst_dqbuf(pool, buf2) | | | | pool->buffers[1] = buf1 | pool->num_queued++ | ioctl(DQBUF) outbuf = pool->buffers[2] pool->buffers[2] = NULL pool->num_queued--; ioctl(QBUF) | Result could be an inconsistent pool->num_queued due to the fact that pool->num_queued++ may not be atomic if Thread 2 is executed just after Thread 1 fetch the value of pool->num_queued. Nevertheless, this case could be related with the fact that in 1.2, v4l2bufferpool is not really thread-safe.
Well, it's not has bad as it looks like. As qbuf/dqbuf only occurs on writable buffers. Hence the buffers[] array will stay consistent (as long as we fix the order issue). The real problem is the num_queued counter and it's usage for copy threashold. I'll give it more thought, but I think we can keep it lockless (as this is what the GstBufferPool design suggest to do). I think some work is needed even in git master.
This should not happen in master any-more, but some testing would be nice. It's written in a lock less manner. Still need to decide what with do with 1.2.
Is there still interest in fixing this for 1.2 ? I don't think I'll have time myself, hence I'm tempted to mark this one as obsolete. Any comment ?
For 1.2 we still have the lockless manner. However, as far as I am concerned, I am agree to mark this issue as obsolete.
So if anyone is interest in improving this issue in 1.2, feel free to re-open, otherwise this is implemented differently in 1.4 and should be fine.