Bug 746834 – v4l2sink: driver is not queried for minimum number of buffers when propose_allocation is not called

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 746834 - v4l2sink: driver is not queried for minimum number of buffers when propose_allocation is not called


Summary:	v4l2sink: driver is not queried for minimum number of buffers when propose_al...


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	1.5.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-03-26 17:11 UTC by Tobias Modschiedler
Modified:	2015-04-02 21:35 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Proposed solution. (4.38 KB, patch) 2015-03-26 17:11 UTC, Tobias Modschiedler	none	Details \| Review
Proposed patch (updated). (4.27 KB, patch) 2015-03-26 19:13 UTC, Tobias Modschiedler	none	Details \| Review
Patch for different approach (REQBUFS increases min_buffers). (1.38 KB, patch) 2015-03-27 18:49 UTC, Tobias Modschiedler	none	Details \| Review
Proposed patch (updated). (4.70 KB, patch) 2015-04-01 08:48 UTC, Tobias Modschiedler	committed	Details \| Review
Requested GStreamer log. (29 bytes, text/plain) 2015-04-01 15:50 UTC, Tobias Modschiedler		Details

Description Tobias Modschiedler 2015-03-26 17:11:14 UTC

Created attachment 300372 [details] [review]
Proposed solution.

When propose_allocation() is not passed through to a v4l2sink, the driver is never queried for its requirements regarding the minimum number of buffers to use. Thus the buffer pool does not have this information.

I am pretty new to GStreamer, but my expectation would be that the driver is always asked about this. The attached patch enables exactly this behavior: It calls gst_v4l2_get_attribute() directly after the device is opened, to get either V4L2_CID_MIN_BUFFERS_FOR_OUTPUT or V4L2_CID_MIN_BUFFERS_FOR_CAPTURE. This number is then used as a default value for min_buffers when creating the buffer pool.

Comment 1 Nicolas Dufresne (ndufresne) 2015-03-26 17:25:47 UTC

Review of attachment 300372 [details] [review]:

Sorry, but that will break decoders. The minimum depends on the input stream, so it cannot be cached. I'm sure you can find a better way though ;-P

Comment 2 Tobias Modschiedler 2015-03-26 17:56:41 UTC

Thanks for looking into this.

Can you give me some more concrete hints on how this will break things, so I can find a better place to do the query?

Comment 3 Tobias Modschiedler 2015-03-26 18:12:37 UTC

More concrete question: Do you mean that setting v4l2obj->min_buffers at this point will cause breakage; or initializing the buffer pool config's min_buffers with it?

Comment 4 Tobias Modschiedler 2015-03-26 19:12:23 UTC

Please correct me if I'm wrong: I don't think setting v4l2obj->min_buffers as in my original patch can cause any problems.

What do you think about my updated patch? It checks the driver's requirements in gst_v4l2_buffer_pool_set_config() instead of blindly setting it during the initialization.

Comment 5 Tobias Modschiedler 2015-03-26 19:13:38 UTC

Created attachment 300387 [details] [review]
Proposed patch (updated).

Comment 6 Nicolas Dufresne (ndufresne) 2015-03-26 19:23:51 UTC

Let me explain. With v4l2 M2M decoder drivers, the value of MIN_BUFFERS_FOR_CAPTURE will depend on the format and the first buffer passed to the OUTPUT device. At this point the v4l2object is already open. With your patch, this value will be set once, and never be updated.

Also, there is nothing that prevent a v4l2sink driver to require S_FMT to be called in order to get the right value. For this reason, I think you should simply read that value before activating the pool for the case it's not activated yet (not proposed). To avoid code duplication, I would suggest to add a method in v4l2object to update the value.

Comment 7 Nicolas Dufresne (ndufresne) 2015-03-26 19:30:04 UTC

Another option could be to do nothing and to fix the driver. It would seem that driver should enforce their minimum at REQBUFS time by updating the structure. In this case the pool is internal and we don't need extras on top of that number.

Comment 8 Tobias Modschiedler 2015-03-27 09:01:41 UTC

Thanks for the explanation, now I understand.

I agree that it seems more reasonable to let the driver choose in REQBUFS. This is what I actually tried first, before trying to modify the GStreamer plugin ;) 

But I still had some problem with the buffer pool in this case. I'll have to try this again and will report my findings. Should I then open a new bug (as it won't really fit this bug's summary) or should I just continue here?

Comment 9 Nicolas Dufresne (ndufresne) 2015-03-27 13:07:48 UTC

You can keep this bug, we'll update the title as needed (or close if we eventually agree on that). One ambiguity is the V4L2 spec is if the MIN_FOR includes or not a buffer for userspace. Though, I think it's quite clear driver should provide at least one buffer that the userspace can use while the driver is operating. When this isn't the case, the pipeline stalls, because the pool as no more buffers to offer. It's also why for decoder, if MIN_FOR isn't supported, we always run out of buffers.

That's why I'm not surprise you are having issues, and that's why I'd like to go through this with you, and figure-out what we need to clarify.

Comment 10 Tobias Modschiedler 2015-03-27 18:36:37 UTC

OK, so when I just use the v4l2sink from git master and let the driver increase the buffer count in REQBUFS as needed, playback doesn't really work. The execution time (as reported by gst-launch) is about double to triple the expected playback time. It seems that DQBUF is called much too early, blocking the execution.

I suspect that gst_v4l2_buffer_pool_start() does not handle a changed buffer count correctly. It seems that it only expects count to be smaller than min_buffers, not larger. My latest patch changes it's behavior to re-calculate min_latency when the count has changed. Also, copy_threshold is only enabled when the number of buffers became smaller.

I'm not sure what setting copy_threshold actually causes, so does this make any sense? The most important thing is to re-calculate min_latency, as obj->min_buffers is always zero in my case (propose_allocation not called).

Comment 11 Tobias Modschiedler 2015-03-27 18:49:08 UTC

Created attachment 300481 [details] [review]
Patch for different approach (REQBUFS increases min_buffers).

Here's the patch. I'm not sure which min_latency value should be used for copy_threshold: original or re-calculated.

Comment 12 Nicolas Dufresne (ndufresne) 2015-03-27 20:08:42 UTC

(In reply to Tobias Modschiedler from comment #10)
> OK, so when I just use the v4l2sink from git master and let the driver
> increase the buffer count in REQBUFS as needed, playback doesn't really
> work. The execution time (as reported by gst-launch) is about double to
> triple the expected playback time. It seems that DQBUF is called much too
> early, blocking the execution.

Are you sure it isn't the copy the slow down everything ? When propose_allocation() isn't called, it means that buffer will arrive to the sink and will need to be copied to V4L2 memory. You can use perf to profile it.

> 
> I suspect that gst_v4l2_buffer_pool_start() does not handle a changed buffer
> count correctly. It seems that it only expects count to be smaller than
> min_buffers, not larger. My latest patch changes it's behavior to
> re-calculate min_latency when the count has changed. Also, copy_threshold is
> only enabled when the number of buffers became smaller.
> 
> I'm not sure what setting copy_threshold actually causes, so does this make
> any sense? The most important thing is to re-calculate min_latency, as
> obj->min_buffers is always zero in my case (propose_allocation not called).

That's true that the latency "without doing anything" won't be right at the moment. About the copy threshold it's probably only implement for capture device.

Comment 13 Nicolas Dufresne (ndufresne) 2015-03-27 20:11:05 UTC

Review of attachment 300481 [details] [review]:

That change does not make much sense. Some driver need more buffers to operate without adding latency. I think you really have to get the MIN_FOR_OUTPUT value (even if not proposing an allocation).

Comment 14 Tobias Modschiedler 2015-03-30 09:33:52 UTC

(In reply to Nicolas Dufresne (stormer) from comment #12)
> (In reply to Tobias Modschiedler from comment #10)
> > The execution time (as reported by gst-launch) is about double to
> > triple the expected playback time. It seems that DQBUF is called much too
> > early, blocking the execution.
> 
> Are you sure it isn't the copy the slow down everything ? When
> propose_allocation() isn't called, it means that buffer will arrive to the
> sink and will need to be copied to V4L2 memory. You can use perf to profile
> it.

I didn't profile it, but the driver's debug output shows that DQBUF is already called after two QBUFs. At no time, there are more than two buffers queued. Also, the posted patch (even if it's incorrect for the general case) solves the problem.

(In reply to Nicolas Dufresne (stormer) from comment #13)
> Review of attachment 300481 [details] [review] [review]:
> 
> That change does not make much sense. Some driver need more buffers to
> operate without adding latency. I think you really have to get the
> MIN_FOR_OUTPUT value (even if not proposing an allocation).

OK, I see, thanks for clarifying this. I'll try to come up with a more sensible solution.

Comment 15 Nicolas Dufresne (ndufresne) 2015-03-30 15:44:17 UTC

Additionally, if you could share a trace with GST_DEBUG="v4l2*:7", this way I'd on the same page as you are. Sorry for this little juggling, it's always like this when hacking V4L2 since it's very multi-purpose kernel interface.

Comment 16 Tobias Modschiedler 2015-04-01 08:48:39 UTC

Created attachment 300727 [details] [review]
Proposed patch (updated).

Here is a patch which asks the driver about MIN_BUFFERS_FOR_* before setting up the buffer pool. Hopefully, this time nothing else is broken ;)

Is it OK to increase min_buffers in gst_v4l2_buffer_pool_set_config() like I did?

Comment 17 Tobias Modschiedler 2015-04-01 15:50:56 UTC

Created attachment 300767 [details]
Requested GStreamer log.

Well, with the latest patch (actually with any of the patches), playback does somewhat work for me. But now I see that the CPU load is quite high: 40% to 80% just for playing back without decoding. So I might still be having problems with slow copying (as you suspected). I noticed that if I increase the bitrate (or just enable extensive GStreamer logging), a lot of video data is lost.

My pipeline might be a bit unusual/strange/shitty:
filesrc location="test.ts" !
"video/mpegts, systemstream=(boolean)true, packetsize=(int)188" !
tee name=t !
queue ! tsdemux ! fakesink sync=true t. !
queue ! rndbuffersize min=188 max=188 !
v4l2sink device=/dev/video0 io-mode=2 sync=true

Basically, I just want to put the "raw" MPEG-TS data into the v4l2 device, but with the right timing from the stream. That's why there's the tsdemux in the pipeline which throws away the data.

The rndbuffersize just forces a buffer size of 188 Byte, because the driver does not support anything else. When I omit it, I get flooded by the message "gst_buffer_resize_range: assertion 'bufmax >= bufoffs + offset + size' failed". I don't know if this is a driver bug or just caused by the strange pipeline.

Anyway, I attached the log with GST_DEBUG="v4l2*:7" after applying this latest patch.

Comment 18 Nicolas Dufresne (ndufresne) 2015-04-01 20:45:07 UTC

Ok, the patch is nearly correct, I'd get the min buffer (if zero) in set_config() of the pool instead of set_default(). This is because if you driver had multiple formats, it would have had multiple minimum.

Now, the driver makes a questionable decision, and GstV4L2 lacks some support for MPEG TS. Let me explains.

Because GstV4l2 want to be agnostic, we let the driver decide of proper size. So we set sizeimage to 0, and leave it to the driver to decide. The driver in this case has a simple restriction, this value must be a multiple of 188 bytes. It dumbly return 188, because it's a valid value. Though 188 is extremely small, the overhead in GST and of all the ioctl involved will burn your CPU.

GstV4l2 also kind of lack some support. If the buffer is bigger then the v4l2 buffer size, it simply fails. It's fine for raw data, but for encoded data it should probably spread the data over multiple buffers and properly set the size for the last one. For mpegts, the buffer size will always be a multiple of 188 anyway.

I think we could have mpegts specific code to select a decent default size (N * 188), or change the driver to pick a better default. Then if a buffer is bigger and we have encoded data, we should loop (copy and queue), until all data has been consumed. If you take the time to improve, then this simple pipeline will become sufficient. 

filesrc location="test.ts" ! tsparse ! identity sync=1 ! v4l2sink

But that should be filed in separate bugs. If I find that time, I might just do myself the little adjustment. Thanks for reporting.

Comment 19 Tobias Modschiedler 2015-04-02 08:43:33 UTC

(In reply to Nicolas Dufresne (stormer) from comment #18)
> Ok, the patch is nearly correct, I'd get the min buffer (if zero) in
> set_config() of the pool instead of set_default(). This is because if you
> driver had multiple formats, it would have had multiple minimum.

In this patch, it's not done in set_defaults(), that was in a previous patch. Now I moved it to object_setup_pool(), just before the call to buffer_pool_new(), because then the format should be decided on (or can it still change?).

> The
> driver in this case has a simple restriction, this value must be a multiple
> of 188 bytes. It dumbly return 188, because it's a valid value.

> GstV4l2 also kind of lack some support. If the buffer is bigger then the
> v4l2 buffer size, it simply fails. It's fine for raw data, but for encoded
> data it should probably spread the data over multiple buffers and properly
> set the size for the last one.

> I think we could have mpegts specific code to select a decent default size
> (N * 188), or change the driver to pick a better default. Then if a buffer
> is bigger and we have encoded data, we should loop (copy and queue), until
> all data has been consumed.

OK, I understand. Would it be possible to reasonably use larger buffers (N*188) with an unmodified GstV4L2 right now? (If the driver chooses this size.)

> Thanks for reporting.
Thanks for helping me understand things better!

Comment 20 Nicolas Dufresne (ndufresne) 2015-04-02 13:11:53 UTC

Review of attachment 300727 [details] [review]:

You're right. Got confused by splint lack or context.

Comment 21 Nicolas Dufresne (ndufresne) 2015-04-02 13:24:14 UTC

(In reply to Tobias Modschiedler from comment #19)
> OK, I understand. Would it be possible to reasonably use larger buffers
> (N*188) with an unmodified GstV4L2 right now? (If the driver chooses this
> size.)

Looking again at the code, it is requesting ENCODED_BUFFER_SIZE (1 Mb) to the driver. Even there, the driver chooses 188 bytes. Maybe this is how the HW works, even though splitting in the driver with fixed sized packet is trivial and required no parsing.

That means the options to improve performance on GST side is to add a loop in gst_v4l2_buffer_pool_process(). So you'd be submitting 188 bytes per v4l2buffer, but from GstBuffer of much larger size. Using tsparse to ensure proper framing. I suggest we open a new bug to track this though.

Comment 22 Nicolas Dufresne (ndufresne) 2015-04-02 21:35:18 UTC

Thanks.