After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 747270 - avfvideosrc: BUFFER_QUEUE_SIZE too small
avfvideosrc: BUFFER_QUEUE_SIZE too small
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-bad
git master
Other All
: Normal normal
: git master
Assigned To: Ilya Konstantinov
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2015-04-02 20:25 UTC by Ilya Konstantinov
Modified: 2018-11-03 13:33 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
avfvideosrc: Change NSMutableArray to GQueue (8.57 KB, patch)
2015-04-03 22:07 UTC, Ilya Konstantinov
none Details | Review
avfvideosrc.m: Get rid of the buffer queue (6.92 KB, patch)
2015-04-03 23:00 UTC, Ilya Konstantinov
none Details | Review
avfvideosrc: Get rid of the buffer queue (6.86 KB, patch)
2015-04-03 23:29 UTC, Ilya Konstantinov
none Details | Review
avfvideosrc: Get rid of the buffer queue (8.71 KB, patch)
2015-04-04 03:19 UTC, Ilya Konstantinov
none Details | Review
applemedia: gst_core_media_buffer_init (5.46 KB, patch)
2015-04-04 03:19 UTC, Ilya Konstantinov
none Details | Review
avfvideosrc: Use buffer pool (5.40 KB, patch)
2015-04-04 03:21 UTC, Ilya Konstantinov
none Details | Review

Description Ilya Konstantinov 2015-04-02 20:25:27 UTC
In avfvideosrc.m, we have:

  #define BUFFER_QUEUE_SIZE     2

This buffer queue is used in a producer-consumer fashion between the AV capture callbacks (on AVFoundation's thread) and the element (on the main thread).

With such a small queue, we were constantly:

 a) losing performance due to the main thread going to sleep constantly

 b) periodically dropping frames in this code:

  if ([bufQueue count] == BUFFER_QUEUE_SIZE)
    [bufQueue removeLastObject];

Increasing the queue size to e.g. 32 improves performance significantly. On iPhone 5s, improvement of ~7% is seen in the following pipeline:

  avfvideosrc device-index=0 ! video/x-raw,width=1280,height=720,framerate=30/1 ! vtenc_h264 bitrate=400 ! fakesink
Comment 1 Reynaldo H. Verdejo Pinochet 2015-04-02 20:52:56 UTC
How did you came out with 32? also, are you attaching a patch?
Comment 2 Nicolas Dufresne (ndufresne) 2015-04-02 21:01:59 UTC
V4L2 uses 4 in the case we don't know. I don't know if this element is pushing buffers downstream, but if so, it should be negotiated.
Comment 3 Ilya Konstantinov 2015-04-02 21:30:03 UTC
(In reply to Reynaldo H. Verdejo Pinochet from comment #1)
> How did you came out with 32?

I said "e.g. 32", meaning it's just an arbitrary number I chose that's not too small. :) I think it should be the smallest one that works.

> also, are you attaching a patch?

Not yet, I'm still looking for feedback.

> I don't know if this element is pushing buffers downstream, but if so, it should be negotiated.

avfvideosrc is obviously pushing buffers downstream -- it's a video capture source (e.g. from the iPhone camera).

What do you mean by negotiating? Isn't it a private implementation detail of this element? The existence of the AV capture thread is an AVFoundation implementation detail.
Comment 4 Nicolas Dufresne (ndufresne) 2015-04-02 22:06:22 UTC
(In reply to Ilya Konstantinov from comment #3)
> avfvideosrc is obviously pushing buffers downstream -- it's a video capture
> source (e.g. from the iPhone camera).
> 
> What do you mean by negotiating? Isn't it a private implementation detail of
> this element? The existence of the AV capture thread is an AVFoundation
> implementation detail.

I'm asking since many stack have limitation and simply copy the data and push. When you push your own data downstream and have a fixed amount of buffer, it is important to provide at least the minimum amount of buffers for the pipeline to work. This value can be obtained through the allocation query.

For sources, this is done by implement the virtual method decide_allocation(). This method isn't trivial to implement though. You can refer to v4l2src for inspiration. Ideally you should wrap this up in a BufferPool, (and maybe Allocator) if you want to properly track the memory.
Comment 5 Cecill Etheredge (ijsf) 2015-04-02 22:35:22 UTC
(In reply to Reynaldo H. Verdejo Pinochet from comment #1)
> How did you came out with 32? also, are you attaching a patch?

I'd like to know how you guys came up with 2.
Comment 6 Nicolas Dufresne (ndufresne) 2015-04-02 22:50:37 UTC
(In reply to Cecill Etheredge (ijsf) from comment #5)
> (In reply to Reynaldo H. Verdejo Pinochet from comment #1)
> > How did you came out with 32? also, are you attaching a patch?
> 
> I'd like to know how you guys came up with 2.

Two make sense if you don't share the data. Basically 1 for the driver to fill, and 1 for the user space to copy to some other memory location. And then it loops likes this. But since the code now push the buffers downstream this is just wrong.

Looking at the code, it's a bit bad. It wrap the memory into a buffer, but used a GstMeta to actually track the memory lifetime. This is not completly safe. It also sets the NO_SHARE flags, which will end-up causing copies all over. So yes, there is a lot of work to do.
Comment 7 Cecill Etheredge (ijsf) 2015-04-02 22:53:51 UTC
Thanks, that makes sense for a buffered system.

On a different note.. why use a buffer at all and not push directly to wherever it needs to go?
Comment 8 Ilya Konstantinov 2015-04-02 23:04:01 UTC
Right now I'm working on a patch to eliminate the buffer queue completely. Queueing is GstQueue's job. And it's perfectly fine for the avfvideosrc to push buffers from the dispatch thread.

As for GstMeta and all that business, please see my work on bug 747216. Though as far as safety goes, I don't see a problem -- the CVPixelBuffer's lifetime is exactly like the GstBuffer's lifetime.

NO_SHARE has been indeed causing perf trouble for OpenWebRTC w/intervideo* components. Though my current patch for 747216 is "no-share" too, I hope to introduce sharing.
Comment 9 Nicolas Dufresne (ndufresne) 2015-04-02 23:34:53 UTC
(In reply to Cecill Etheredge (ijsf) from comment #7)
> Thanks, that makes sense for a buffered system.
> 
> On a different note.. why use a buffer at all and not push directly to
> wherever it needs to go?

Queues are useful to ensure a parallelism between the driver processing and the GStreamer pipeline. So while a buffer is being pushed downstream, the driver should be preparing the next buffer. (assuming you mean why use queues). AVF Has quite some code, and I suppose it's not entirely needed. It is likely done this way to fit the push src model (which is the easiest way to write a source in GST).
Comment 10 Ilya Konstantinov 2015-04-03 17:08:20 UTC
(In reply to Nicolas Dufresne (stormer) from comment #9)
> AVF Has quite some code, and I suppose it's not entirely needed.

That's the main point here, I believe. AVF does things like pooling sample buffers and has a lot of concern for this already. Also, Darwin's dispatch queues is a producer-consumer queue already.

> is likely done this way to fit the push src model (which is the easiest way
> to write a source in GST).

We might replace NSMutableArray+NSConditionLock with GAsyncQueue, but to assure best performance, I think we should learn to push buffers from the AVF dispatch.

I remember either you or Tim said that it's basically re-doing GBaseSrc, but perhaps it's not as complicated, seeing that a lot of what GBaseSrc does deals with non-live streams.
Comment 11 Nicolas Dufresne (ndufresne) 2015-04-03 17:28:31 UTC
I don't get the point, GBaseSrc does not force having queues btw, as we said, create can wait for an event. The context switch has nothing to do with the performance cost I have sure.

And to be be clear, I will not sponsor work that move this object away from a GstBaseSrc. It's already barely maintained, that would only make the situation worst.

One of the think I notice is that we force AVFoundation into doing conversion to BGRA all the time. Even if a texture is being used downstream. That's probably where we get the biggest hit. Implement AVF specific GstMemory, make sure we don't request memory pointers unless map() is called, and you will probably already have a enough performance for "truly embedded" iOS apps.
Comment 12 Nicolas Dufresne (ndufresne) 2015-04-03 17:30:55 UTC
(oh, and obviously, negotiation the allocation and use the obtained number when setting up the queue in stead of hard-coding, so we don't starve AVF internal queues)
Comment 13 Ilya Konstantinov 2015-04-03 18:40:59 UTC
(In reply to Nicolas Dufresne (stormer) from comment #11)
> I don't get the point, GBaseSrc does not force having queues btw, as we
> said, create can wait for an event.

The "point" I refer to was – if a queue facility is needed between the driver and userspace, AVF is providing it. Our own queueing facility would be redundant, as far as driver-userspace communication goes.

> The context switch has nothing to do with the performance cost I have sure.

I have a rudimentary avfvideosrc which pushes from the dispatch queue. It (kinda, sorta) works, enough to get performance metrics.

If we can have e.g. GstAsyncQueue-based avfvideosrc having the same performance, I'll gladly go off my proposal. I like my patches small.

(If I could lower CPU usage by 2% through a solution that doesn't exactly fit with GstBaseSrc, I'd do it, even as a fork.)
 
> And to be be clear, I will not sponsor work that move this object away from
> a GstBaseSrc. It's already barely maintained, that would only make the
> situation worst.

I understand that, but do understand too that if performance would be worse than a pure AVF solution, it'll continue to be underused and under-maintained.

> ... we force AVFoundation into doing conversion to BGRA all the time.

Are you sure? I see avfvideosrc supports NV12, UYVY, YUY and BGRA.

(In any case, my test is afvvideosrc ! video/x-raw,format=BGRA,width=1280,height=720,framerate=30/1 ! fakesink, and I didn't notice any difference when I changed format to NV12.)

> Implement AVF specific GstMemory, make sure we don't request memory
> pointers unless map() is called

In an "avfvideosrc ! vtenc_h264" pipeline, the current code *already* passes CVPixelBuffer directly.

About GstMemory and mapping, that's what I'm doing in bug 747216. Surprisingly, it didn't bring much performance improvement.
Comment 14 Ilya Konstantinov 2015-04-03 22:07:02 UTC
Created attachment 300916 [details] [review]
avfvideosrc: Change NSMutableArray to GQueue

EXPERIMENT 1

Use a proven GLib implementation, implementation taken almost verbatim
from GstDecklinkVideoSrc (Sebastian Dröge's code).

CPU usage was 19% and remained 19%.
Comment 15 Ilya Konstantinov 2015-04-03 22:58:16 UTC
Clarifying:

Increasing BUFFER_QUEUE_SIZE from 2 (to whatever) reduces CPU usage from 32% to 26% only on this pipeline:
  
  avfvideosrc device-index=0 !
  video/x-raw,width=1280,height=720,framerate=30/1 !
  vtenc_h264 bitrate=400 !
  fakesink

On a simpler:

  avfvideosrc device-index=0 !
  video/x-raw,width=1280,height=720,framerate=30/1 !
  fakesink

the CPU usage was 19% and remains 19% after BUFFER_QUEUE_SIZE increase (probably since the frame drop scenario never occurs).
Comment 16 Ilya Konstantinov 2015-04-03 23:00:51 UTC
Created attachment 300920 [details] [review]
avfvideosrc.m: Get rid of the buffer queue

EXPERIMENT 2

Keep a single copy of the sample buffer at a time -- no queues,
no mallocs. (If downstream cannot keep up, GstQueues can be used.)

BENCHMARK: On iPhone 5s (iOS 8.0), the following pipeline:

  avfvideosrc device-index=0 !
  video/x-raw,format=BGRA,width=1280,height=720,framerate=30/1 !
  fakesink enable-last-sample=0

is down from 19% to 15%.
Comment 17 Ilya Konstantinov 2015-04-03 23:29:18 UTC
Created attachment 300921 [details] [review]
avfvideosrc: Get rid of the buffer queue

My previous patch was bogus. This patch actually works.

BENCHMARK: Seeing more modest improvement from 19% to 18% on:

  avfvideosrc device-index=0 !
  video/x-raw,format=BGRA,width=1280,height=720,framerate=30/1 !
  fakesink enable-last-sample=0
Comment 18 Ilya Konstantinov 2015-04-03 23:40:51 UTC
A more realistic workload:

  avfvideosrc device-index=0 !
  video/x-raw,width=1280,height=720,framerate=30/1 !
  queue !
  vtenc_h264 bitrate=400 !
  fakesink enable-last-sample=0

performs at ~26% (while pre-patch was 32%).

(Without the "queue !" we're getting constant frame drops in the AV capture thread, as anticipated.)
Comment 19 Ilya Konstantinov 2015-04-04 03:19:01 UTC
Created attachment 300932 [details] [review]
avfvideosrc: Get rid of the buffer queue

CHANGES

* Really let AVFoundation do its job

  Instead of dropping frames in the delegate,
  now I simply hog the delegate until the main
  thread finishes its job. AVFoundation is built
  to handle this by queueing and dropping.

* Protect against spurious wakeups

* currentFrame -> delegateFrame

* Consolidate variables into the CaptureFrame
Comment 20 Ilya Konstantinov 2015-04-04 03:19:49 UTC
Created attachment 300933 [details] [review]
applemedia: gst_core_media_buffer_init

Change gst_core_media_buffer_new into gst_core_media_buffer_init,
to allow e.g. acquiring the buffer from a buffer pool.

(Prelude for next patch.)
Comment 21 Ilya Konstantinov 2015-04-04 03:21:44 UTC
Created attachment 300934 [details] [review]
avfvideosrc: Use buffer pool

We configure a buffer pool to create empty GstBuffers without
any backing memory, since we'll be filling them ourselves.

We discard buffer pools from upstream.
Comment 22 Ilya Konstantinov 2015-04-04 04:59:45 UTC
Stop the presses!
"avfvideosrc: Get rid of the buffer queue" is definitely not working as expected.

I know it's crazy but I actually didn't test with a simple glimagesink, and it really sucks.
Comment 23 GStreamer system administrator 2018-11-03 13:33:12 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/230.