GNOME Bugzilla – Bug 735660
v4l2: fix new v4l2 code not working with certain devices (regression)
Last modified: 2014-09-09 22:48:51 UTC
Created attachment 284798 [details] [review] [PATCH 1/3] v4l2: get_nearest_size: Always reinit all struct fields on retry Hi, Here is a patch-set with 3 patches which I've written to get gstreamer-1.4's v4l2 code to work with my bttv card. All the details of what and why are in the commit messages. Regards, Hans
Created attachment 284799 [details] [review] [PATCH 2/3] v4l2: get_nearest_size: Fix "Unsupported field type" errors
Created attachment 284800 [details] [review] [PATCH 3/3] v4l2: Not all drivers support requesting 0 buffers
Quick note on the 3th patch, I just realized that it may be better in the probe path to first try with 0 buffers, and only if that fails try again with 1 buffer. Then we save a request + free on devices where the driver does support request_bufs with a count of 0, and we can drop the free (as that won't work anyways on devices which don't accept count == 0).
Review of attachment 284800 [details] [review]: If a driver refuse to free buffers when there is no buffers allocator, why would it work when there is 1 buffer ? Also, if it already support freeing the 1 buffer, it can't be that hard to fix upstream driver to comply with the spec. I'd like a specific list of drivers that expose this bug, and would prefer if we fix them first.
(In reply to comment #4) > Review of attachment 284800 [details] [review]: > > If a driver refuse to free buffers when there is no buffers allocator, why > would it work when there is 1 buffer ? These drivers always refuse to free buffers, using count=0 to free buffer is an extension to the V4L2 requestbufs ioctl which got added much later then the introduction of the requestbufs ioctl itself. These drivers only free buffers when the fd get closed. > Also, if it already support freeing the > 1 buffer, it can't be that hard to fix upstream driver to comply with the spec. Fixing these drivers means moving them over to the videobuf2 internal kernel-api, problem is that that API does not yet support cards capable of hardware video overlays, which these cards are. This is not going to get fixed anytime soon, and all userspace apps know how to deal with these cards. This used to work in gstreamer too until the recent v4l2 support rework. This is a regression in gstreamer which needs to be fixed in gstreamer.
Review of attachment 284800 [details] [review]: ::: sys/v4l2/gstv4l2allocator.c @@ +489,3 @@ + breq.count = 0; + breq.memory = memory; + v4l2_ioctl (allocator->video_fd, VIDIOC_REQBUFS, &breq); Just spoke with Hans. 18 drivers are affected, 16 in 3.17. Though he said:à <hverkuil> Realistically, if reqbufs(0) is not supported by the driver, then the only safe alternative is to close and reopen the device node. So this patch isn't correct.
Note the other patch seems incomplete fixed around bug 726195 and 726194. We need to write proper interlacing detection code, it's not true that for 1 resolution driver only do mixed or progressive. Would it be possible to move patches to proper bugs instead of having all unrelated issue in the same bug ?
(In reply to comment #5) > (In reply to comment #4) > > Review of attachment 284800 [details] [review] [details]: > > > > If a driver refuse to free buffers when there is no buffers allocator, why > > would it work when there is 1 buffer ? > > These drivers always refuse to free buffers, using count=0 to free buffer is an > extension to the V4L2 requestbufs ioctl which got added much later then the > introduction of the requestbufs ioctl itself. These drivers only free buffers > when the fd get closed. Yep, that's why the patch isn't correct. > > > Also, if it already support freeing the > > 1 buffer, it can't be that hard to fix upstream driver to comply with the spec. > > Fixing these drivers means moving them over to the videobuf2 internal > kernel-api, problem is that that API does not yet support cards capable of > hardware video overlays, which these cards are. This is not going to get fixed > anytime soon, and all userspace apps know how to deal with these cards. This > used to work in gstreamer too until the recent v4l2 support rework. This is a > regression in gstreamer which needs to be fixed in gstreamer. As long as we make it explicit what it is a workaround, and what is real code. This has grown into a large mess with the "transparent" workaround attitude. Note that the reason your patch didn't work is that later S_FMT will fail if you have requested 1 buffer (may only happen on M2M device btw). If REQBUFS(0) does not work when no buffer are allocated, it won't work more if 1 is allocated. Unless I'm mistaken, this method would only work if you made sure S_FMT is not called afterward. I don't want this approach since it will cause further issue for memory type like DMABUF and USERPTR, as it will prevent any attempt to adapt memory alignment.
(In reply to comment #6) > Review of attachment 284800 [details] [review]: > > ::: sys/v4l2/gstv4l2allocator.c > @@ +489,3 @@ > + breq.count = 0; > + breq.memory = memory; > + v4l2_ioctl (allocator->video_fd, VIDIOC_REQBUFS, &breq); > > Just spoke with Hans. 18 drivers are affected, 16 in 3.17. Though he said:à > > <hverkuil> Realistically, if reqbufs(0) is not supported by the driver, then > the only safe alternative is to close and reopen the device node. > > So this patch isn't correct. (In reply to comment #8) > Unless I'm mistaken, this method would only work if you made sure > S_FMT is not called afterward. I don't want this approach since it will cause > further issue for memory type like DMABUF and USERPTR, as it will prevent any > attempt to adapt memory alignment. <sigh>, strictly speaking Hans V. is correct, the only 101% safe way is freeing the buffers, but in practice most drivers which are affected to allow changing the amount of buffers, and doing a set_fmt after buffers have been requested, as long as no streaming is being done, as is clearly mentioned in the commit message of the patch. Without this patch gstreamer apps do not work with my bttv cards, where as with 1.0 (don't know if I ever got around to testing this with 1.2) they did work. This clearly is a regression, and this clearly needs to be fixed. As I've already said in comment #3 before you even started registering your objections, it may be better to only try with 1 buffer if the request with 0 buffers fails. And yes that code path should have a comment explaining this is a workaround for certain devices. Or even better why do the probe if the device supports normal mmap buffers at all? AFAIK if the device claims V4L2_CAP_STREAMING it must at least support the standard mmap type buffers, so we can just assume that. We should probably double check this with Hans V.
(In reply to comment #8) > As long as we make it explicit what it is a workaround, and what is real code. > This has grown into a large mess with the "transparent" workaround attitude. I agree the transparant workaround attitude is a problem, as a v4l2 kernel developer I'm a big fan of fixing kernel bugs where possible. At a minimum any driver issues which need workarounds should be actively reported upstream, so that they can be fixed in newer kernels. Talking about this, I was a bit baffled by the s_fmt fallback path in get_nearest_size, do you know if any devices which actually need that ? AFAIK ALL drivers should support try_fmt just fine.
(In reply to comment #7) > Note the other patch seems incomplete fixed around bug 726195 and 726194. We > need to write proper interlacing detection code, it's not true that for 1 > resolution driver only do mixed or progressive. > Would it be possible to move patches to proper bugs instead of having all unrelated issue in the same bug ? Note the subject of this bug: "v4l2: fix new v4l2 code not working with certain devices (regression)", so if you wonder why all the patches are together in a single bug, because the entire set is needed to get things work again / to fix the regression. Frankly, I'm quite unhappy about all the flack I'm getting here, don't shoot the messenger and all that. I just spend a couple of hours this morning to fix a regression I did not cause, and instead of a thank you I'm getting a lot of push-back, and no indication of any willingness to actually care about and fix the regression. Not a great way to win harts and minds IMHO. No about the technical argument you make: > Note the other patch seems incomplete fixed around bug 726195 and 726194. We > need to write proper interlacing detection code, it's not true that for 1 > resolution driver only do mixed or progressive. Yes some devices may support both progressive and interlaced for a single resolution. But currently the code is not structured to deal with this, dealing with this requires at least some refactoring. My patch is not about this, my patch is a bug fix to the current code, which follows a model where only one of progessive or interlaced can be supported, and tries to get progressive first. This current code, which is currently shipping in a stable release, is broken not at a design level which you're talking about now (it could use improvements there too). but in the implementation of the chosen design. Even following the chosen design of a mode being either progressive or interlaced it get things wrong, with as end result that things do not work at all. So rather then talking about design issues, which seems to me to be something for 1.5.x, lets first focus on fixing the regression (ah there is that word from the summary again), and get the current code to at least work properly for what it does do, and that is *exactly* what my patch does.
> (In reply to comment #8) > As I've already said in comment #3 before you even started registering your > objections, it may be better to only try with 1 buffer if the request with 0 > buffers fails. And yes that code path should have a comment explaining this is > a workaround for certain devices. > > Or even better why do the probe if the device supports normal mmap buffers at > all? AFAIK if the device claims V4L2_CAP_STREAMING it must at least support the > standard mmap type buffers, so we can just assume that. We should probably > double check this with Hans V. He already stated that in theory even old videobuf base devices could have CAP_STREAMING and USERPTR only, but said in practice, all the 18 affected drivers support MMAP (very few will also do USERPTR). So yes, I think this is a plan, and this will simply bring back the old behavior, which is to assume that CAP_STREAM mean that MMAP is supported without the need to alloc/free any buffers. Note also that we found that MFC driver has this bug where you can only free buffers through REQBUFS(0), so to reallocate you need to REQBUFS(0) -> REQBUFS(N). I blindly thought this was normal initially, hence the code doing that. Harmless in general, but I'll try and add comment so we remember this is also a hack. (In reply to comment #10) > Talking about this, I was a bit baffled by the s_fmt fallback path in > get_nearest_size, do you know if any devices which actually need that ? AFAIK > ALL drivers should support try_fmt just fine. About this one, we blindly removed it in 1.4 if I remember correctly, as it was causing important performance issue. Might be a bit rough, but I think remove some hack when they cause issues is the only way to progress and get new stuff in. Thanks for reporting btw.
(In reply to comment #11) > (In reply to comment #7) > > Note the other patch seems incomplete fixed around bug 726195 and 726194. We > > need to write proper interlacing detection code, it's not true that for 1 > > resolution driver only do mixed or progressive. > > Would it be possible to move patches to proper bugs instead of having all unrelated issue in the same bug ? > > Note the subject of this bug: "v4l2: fix new v4l2 code not working with certain > devices (regression)", so if you wonder why all the patches are together in a > single bug, because the entire set is needed to get things work again / to fix > the regression. > > Frankly, I'm quite unhappy about all the flack I'm getting here, don't shoot > the messenger and all that. I just spend a couple of hours this morning to fix > a regression I did not cause, and instead of a thank you I'm getting a lot of > push-back, and no indication of any willingness to actually care about and fix > the regression. Not a great way to win harts and minds IMHO. > > No about the technical argument you make: > > > Note the other patch seems incomplete fixed around bug 726195 and 726194. We > > need to write proper interlacing detection code, it's not true that for 1 > > resolution driver only do mixed or progressive. > > Yes some devices may support both progressive and interlaced for a single > resolution. But currently the code is not structured to deal with this, dealing > with this requires at least some refactoring. My patch is not about this, my > patch is a bug fix to the current code, which follows a model where only one of > progessive or interlaced can be supported, and tries to get progressive first. > > This current code, which is currently shipping in a stable release, is broken > not at a design level which you're talking about now (it could use improvements > there too). but in the implementation of the chosen design. Even following the > chosen design of a mode being either progressive or interlaced it get things > wrong, with as end result that things do not work at all. > > So rather then talking about design issues, which seems to me to be something > for 1.5.x, lets first focus on fixing the regression (ah there is that word > from the summary again), and get the current code to at least work properly for > what it does do, and that is *exactly* what my patch does. I'm not making an argument against your work, I really appreciate that you take the time to report and propose patches that fixes issue you have. This is blind review, and request for comment. I only rejected the third patch because it breaks (yes literally) other use case I'm aware of. The two other patches may be merged when I'm done going over all use cases I know (that includes trying to see if they help with related bugs, or would lead to more re-write later). The reason it takes a lot of time and effort to get stuff in for v4l2 is that we really need to dig all patches deeply as everyone writes patch for their single HW and assume it's the absolute solution.
(In reply to comment #12) > (In reply to comment #10) > > Talking about this, I was a bit baffled by the s_fmt fallback path in > > get_nearest_size, do you know if any devices which actually need that ? AFAIK > > ALL drivers should support try_fmt just fine. > > About this one, we blindly removed it in 1.4 if I remember correctly, as it was > causing important performance issue. No the s_fmt fallback is still there in 1.4 and master, at least in gst_v4l2_object_get_nearest_size, could be it has been removed from some other path. Note I'm all in favor of removing this s_fmt fallback.
(In reply to comment #13) > I'm not making an argument against your work, I really appreciate that you take > the time to report and propose patches that fixes issue you have. This is blind > review, and request for comment. I only rejected the third patch because it > breaks (yes literally) other use case I'm aware of. The two other patches may > be merged when I'm done going over all use cases I know (that includes trying > to see if they help with related bugs, or would lead to more re-write later). Ok, fair enough. As I already indicated in comment #3 I've had my own doubts about the 3th patch (note I did test it with a uvc webcam too, so it was not tested with a single device, I've written and maintain quite a few webcam drivers). So I think that we seem to be in agreement that the best way to deal with the problem the 3th patch tries to address is to simply assume that if reqbufs(0) fails for plain mmap buffers, that we're then dealing with one of the "troublesome" devices, and just assume reqbufs mmap capability, (and no createbufs capability). This should be easy enough to write a patch for (with a big fat comment explaining this workaround in the code), assuming that V4L2_CAP_STREAMING has already been tested for when gst_v4l2_allocator_new gets called. Is this assumption correct, and do you agree that this will be the best way to fix this ? If so then I can write and test a patch using this method.
(In reply to comment #15) > So I think that we seem to be in agreement that the best way to deal with the > problem the 3th patch tries to address is to simply assume that if reqbufs(0) > fails for plain mmap buffers, that we're then dealing with one of the > "troublesome" devices, and just assume reqbufs mmap capability, (and no > createbufs capability). > > This should be easy enough to write a patch for (with a big fat comment > explaining this workaround in the code), assuming that V4L2_CAP_STREAMING has > already been tested for when > gst_v4l2_allocator_new gets called. > > Is this assumption correct, and do you agree that this will be the best way to > fix this ? > > If so then I can write and test a patch using this method. Actually there is a subtle difference in what I had in mind, I would probe REQBUFS(0) for all case first (as we do now), then if the flags are 0 (no capabilities), I'd warn and set the MMAP capability. As we are assuming an old videobuf driver, assuming no createbuf support (don't know who had a driver with that implemented to be honest). It's up to you, I'm compiling/sanity checking the two others before merging right now, I can deal with that third patch if you prefer.
Comment on attachment 284798 [details] [review] [PATCH 1/3] v4l2: get_nearest_size: Always reinit all struct fields on retry commit 69fe66f4e1e69d17a691dbd73bed3f6286bd962f Author: Hans de Goede <hdegoede@redhat.com> Date: Fri Aug 29 10:57:20 2014 +0200 v4l2: get_nearest_size: Always reinit all struct fields on retry They may have been modified by the ioctl even if it failed. This also makes the S_FMT fallback path try progressive first, making it consistent with the preferred TRY_FMT path. https://bugzilla.gnome.org/show_bug.cgi?id=735660
Comment on attachment 284799 [details] [review] [PATCH 2/3] v4l2: get_nearest_size: Fix "Unsupported field type" errors commit 355c12bec3b58cb8110eb8b45e9bb4998218b94d Author: Hans de Goede <hdegoede@redhat.com> Date: Fri Aug 29 12:01:27 2014 +0200 v4l2: get_nearest_size: Fix "Unsupported field type" errors Most V4L2 ioctls like try_fmt will adjust input fields to match what the hardware can do rather then returning -EINVAL. As is docmented here: http://linuxtv.org/downloads/v4l-dvb-apis/vidioc-g-fmt.html EINVAL is only returned if the buffer type field is invalid or not supported. So upon requesting V4L2_FIELD_NONE devices which can only do interlaced mode will change the field value to e.g. V4L2_FIELD_BOTTOM as only returning half the lines is the closest they can do to progressive modes. In essence this means that we've failed to get a (usable) progessive mode and should fall back to interlaced mode. This commit adds a check for having gotten a usable field value after the first try_fmt, to force fallback to interlaced mode even if the try_fmt succeeded, thereby fixing get_nearest_size failing on these devices. https://bugzilla.gnome.org/show_bug.cgi?id=735660
Created attachment 284854 [details] [review] v4l2allocator: Workaround driver that don't support REQBUFS(0) Here's what I'm proposing, let me know if that sounds right for you (and if it works).
Created attachment 284882 [details] [review] [PATCH] v4l2allocator: Workaround driver that don't support REQBUFS(0) (In reply to comment #16) > Actually there is a subtle difference in what I had in mind, I would probe > REQBUFS(0) for all case first (as we do now), then if the flags are 0 (no > capabilities), I'd warn and set the MMAP capability. Right, that is exactly what I had in mind too, but I could have worded it better. Thanks for the patch, I've slightly modified it making 2 changes: 1) Move the GST_OBJECT_FLAG_SET call to after fixing up the flags 2) Add the last chunk from my original patch, which makes reqbufs(0) failing in gst_v4l2_allocator_stop non fatal With these 2 fixes the patch works fine (tested on a bttv card), I'm attaching the updated version now.
Review of attachment 284882 [details] [review]: ::: sys/v4l2/gstv4l2allocator.c @@ +651,1 @@ + GST_OBJECT_FLAG_SET (allocator, flags); Thanks good catch @@ +783,3 @@ "error releasing buffers buffers: %s", g_strerror (errno)); + /* Not all drivers support freeing buffers by requesting 0 buffers, so + we don't set ret to an error here. */ This has the side effect, as we won't be able to restart the allocator. Maybe we should empty the atomic queue and deactivate the allocator even if freeing buffers didn't work ? Note, for all this to be useful, we'd need S_FMT to succeed even if there is buffers allocated. Hans V. seemed to think this won't work. He said said the only way we could change the format after having allocated buffers would be to close and re-open. Would it be possible to double check that ? If this is the case, we'd keep the code has is, I would simply like to add something in the warning, so developers will know that re-negotiation isn't possible for this device. They would then have the choice between trying to port to VB2, or implement a close/re-open work around (both are fair amount of work).
(In reply to comment #21) > Review of attachment 284882 [details] [review]: > > ::: sys/v4l2/gstv4l2allocator.c > @@ +651,1 @@ > + GST_OBJECT_FLAG_SET (allocator, flags); > > Thanks good catch > > @@ +783,3 @@ > "error releasing buffers buffers: %s", g_strerror (errno)); > + /* Not all drivers support freeing buffers by requesting 0 buffers, so > + we don't set ret to an error here. */ > > This has the side effect, as we won't be able to restart the allocator. Why not? Calling reqbufs multiple times, even with different buffer counts on old videobuf drivers works fine, as long as the previous buffers are not busy at that time (streaming needs to be off, and they need to be unmapped, but that is the case after a gst_v4l2_allocator_stop even if the reqbufs(0) fails. > Maybe we should empty the atomic queue and deactivate the allocator even if freeing > buffers didn't work ? Right, that is what my patch does by not setting ret to an error (AFAIK). > Note, for all this to be useful, we'd need S_FMT to succeed even if there is > buffers allocated. Hans V. seemed to think this won't work. He said said the > only way we could change the format after having allocated buffers would be to > close and re-open. Would it be possible to double check that ? I've just checked the bttv driver, and s_fmt will work fine there with mapped buffers, as long as no streaming is in progress. If the size of the needed buffers gets bigger after a s_fmt, and you don't re-request the buffers, then streamon will fail with -EINVAL. But since the gst code will re-request the buffers all should be fine, which matches my testing where I can change resolution in cheese while streaming. > If this is the case, we'd keep the code has is, I would simply like to add > something in the warning, so developers will know that re-negotiation isn't > possible for this device. May not be possible, as said for bttv it works, I've not checked any others. > They would then have the choice between trying to port to VB2, or implement a close/re-open work > around (both are fair amount of work). Or fixing the old videobuf code to support reqbufs(0), now that I've taken a closer look to answer your questions I think that may actually be pretty doable. I'll discuss this with Hans V. still we need some kind of workaround to get things to work with current kernels, as any fix won't be suitable for Cc: stable, that much is certain.
(In reply to comment #22) > > Maybe we should empty the atomic queue and deactivate the allocator even if freeing > > buffers didn't work ? > > Right, that is what my patch does by not setting ret to an error (AFAIK). To be a bit more clear here, any queued buffers revert there ownership to userspace on a streamoff, and they can be unmmap-ed at this time too. Actually the un-mmap must be done before the reqbufs(0), otherwise it will fail with -EBUSY even if supported. All the reqbufs(0) call does is free underlying kernel memory, which is why it was not part of the original v4l2 api design, the thinking being that that could wait till fd close.
Erm, should have waited a second before hitting commit. So with old drivers all reqbufs(0) does is free kernel memory. But once it got introduced, new drivers started requiring it to be done before doing a s_fmt, as that makes resource management easier (no need to wait with returning an error because the buffers are to small till the first qbuf)l, so it is very much necessary for userspace to this now a days if it wants to re-negotiate the fmt. For old drivers which use the old videobuf, I would expect them to handle things like bttv, as the whole notion of requiring userspace to explicitly freeing buffers before changing fmt through a s_fmt call did not exist in the old model, while re-negotiation was a normal thing to do back then too.
Seems fair, what I had in mind is that we should still do: > allocator->count = 0; > g_atomic_int_set (&allocator->active, FALSE); If the reqbufs(0) failed. There is no reason other then having an old driver for that call to fail, hence we can expect that these drivers to behave like bttv. This will allow the pre-conditions to be met when _start() is called later. Pre-conditions from gst_v4l2_allocator_start() are: > g_return_val_if_fail (count != 0, 0); > GST_OBJECT_LOCK (allocator); > if (g_atomic_int_get (&allocator->active)) > goto already_active;
Created attachment 284926 [details] [review] [PATCH] v4l2allocator: Workaround driver that don't support REQBUFS(0) (In reply to comment #25) > Seems fair, what I had in mind is that we should still do: > > > allocator->count = 0; > > g_atomic_int_set (&allocator->active, FALSE); Ah right, I see. Ok this updated version of the patch should do the trick then.
Comment on attachment 284926 [details] [review] [PATCH] v4l2allocator: Workaround driver that don't support REQBUFS(0) master: commit 5c933f 1.4: commit af1576d1e9432029d6a971dbfa050c65e3c6338a Author: Nicolas Dufresne <nicolas.dufresne@collabora.com> Date: Fri Aug 29 17:09:30 2014 -0400 v4l2allocator: Workaround driver that don't support REQBUFS(0) There is still around 18 drivers not yet ported to videobuf2. These driver don't support freeing buffetrs through REQBUFS(0) hence for these the memory type probing fails. In order to gain back our previous behaviour in presence of these, we implement a workaround that assuming MMAP is supported. Note that an allocator is only created for device with STREAMING support in the device capabilities. In such case one of MMAP, USERPTR and DMABUF is required. Though DMABUF came afterward, so is not an option and in practice none of these drivers will only do USERPTR. https://bugzilla.gnome.org/show_bug.cgi?id=735660 Also-by: Hans de Goede <hdegoede@redhat.com>
Thanks for your contribution.