After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 772521 - v4l2src: Regression bug with consecutive recordings
v4l2src: Regression bug with consecutive recordings
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
1.8.2
Other Linux
: Normal critical
: 1.8.4
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on: 775014 775015
Blocks:
 
 
Reported: 2016-10-06 16:45 UTC by Alfonso
Modified: 2017-02-09 12:29 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Simple c program to replicate the bug. (2.05 KB, text/plain)
2016-10-06 16:45 UTC, Alfonso
Details
Log from script with GST_DEBUG="v4l2*:7" (814.42 KB, text/plain)
2016-10-07 11:02 UTC, Alfonso
Details
strace output (15.88 KB, text/plain)
2016-10-07 11:03 UTC, Alfonso
Details
Log of GST_DEBUG="v4l2*:7" on branch master. (1.03 MB, text/x-log)
2016-11-17 10:17 UTC, Alfonso
Details
strace output on branch master (198.43 KB, text/x-log)
2016-11-18 09:47 UTC, Alfonso
Details

Description Alfonso 2016-10-06 16:45:05 UTC
Created attachment 337081 [details]
Simple c program to replicate the bug.

We are using GStreamer to record video on our opensource software called Galicaster, and we had an issue when starting several recordings in fast succession. We narrowed it down to the GStreamer pipeline in C, and we discovered what seems to be an issue with the gstv4l2allocator when a recording closes. When this issue happens, the device is 'busy' for the next pipeline.

The script we used for testing is the following:
http://paste.ubuntu.com/23284568/

It executes a simple one second pipeline over and over using the same device until it fails.

We executed the script with GST_DEBUG=3 and we got the following error (full output in http://paste.ubuntu.com/23284592/):

0:00:27.492185197 11963      0x25bc830 ERROR          v4l2allocator gstv4l2allocator.c:1245:gst_v4l2_allocator_qbuf:<gc-v4l2-src:pool:src:allocator> failed queueing buffer 1: Invalid argument
0:00:27.492207147 11963      0x25bc830 ERROR         v4l2bufferpool gstv4l2bufferpool.c:1127:gst_v4l2_buffer_pool_qbuf:<gc-v4l2-src:pool:src> could not queue a buffer 1


This was reproduced on an Ubuntu 16.04.1 machine with GStreamer version 1.8.2 using the Logitech HD Webcam C910 and the HD Pro Webcam C920. We could also reproduce this behaviour with lower quality Logitech cameras, but it would take longer to reproduce. It seems that the chance to reproduce the issue is proportional to the video bandwidth.

We think it's a regression bug, because we were not able to reproduce said behavior on the GStreamer 0.10 and 1.2 versions.

I attached a gist with more information about the tests we did:
https://gist.github.com/Alfro/d94deab6b5b5314f2cee8167019c7239


If you need any extra data to be added here, please let me know.
Comment 1 Nicolas Dufresne (ndufresne) 2016-10-06 18:10:51 UTC
Hello, I've had a run of your script on git master, and could not reproduce. For me, the uvcdriver failed after 110 tries.

  uvcvideo: Failed to resubmit video URB (-1).

That caused an assertion in GStreamer due to a minor bug, though GStreamer then failed cleanly without leaks afteward. I haven't had a change to try and reproduce with a C920, or on 1.8.2. That will unfortunatly have to wait after the GStreamer conference. Meanwhile, you can help your case, by providing a trace of when the leak occures. Use GST_DEBUG="v4l2*:7" and attach. You may want to strip trace from working cases. You should also test over 1.8.3 (the most recent stable), and if you have a chance test over git master (or 1.10 when out). There is too much changes between 1.2 and 1.8 to be helpful ranges, though you could also run git bisect if you have the time.
Comment 2 Alfonso 2016-10-07 11:02:58 UTC
Created attachment 337154 [details]
Log from script with GST_DEBUG="v4l2*:7"

Stripped log from the failing pipeline (and the previous one) when executing the script with GST_DEBUG="v4l2*:7"
Comment 3 Alfonso 2016-10-07 11:03:14 UTC
Created attachment 337155 [details]
strace output

Log when executing the script with strace:
strace -e open,close ./ko 1

It can be seen how there is no close() call before the failing open('dev/video0') class.
Comment 4 Alfonso 2016-10-07 11:05:18 UTC
I added the stripped gst trace with GST_DEBUG="v4l2*:7". I also added the strace output.

I'll try later with the master branch. If I have enough time, I will also try to narrow the version down more.
Comment 5 Nicolas Dufresne (ndufresne) 2016-10-08 14:47:17 UTC
For strace, you probably want to match VIDIOC_EXPUF against close() of the same FD.
Comment 6 Nicolas Dufresne (ndufresne) 2016-10-08 14:48:42 UTC
(In reply to Nicolas Dufresne (stormer) from comment #5)
> For strace, you probably want to match VIDIOC_EXPUF against close() of the
> same FD.

Nevermind, that is not applicable to that, sorry.
Comment 7 Nicolas Dufresne (ndufresne) 2016-10-08 14:54:17 UTC
And could you explain why we get EINVAL from a QBUF ? This should never happens unless something broke on kernel side ...
Comment 8 Alfonso 2016-10-10 14:02:00 UTC
By EINVAL, do you mean this error?

0:01:09.024938320 25419      0x16c8f20 ERROR          v4l2allocator gstv4l2allocator.c:1245:gst_v4l2_allocator_qbuf:<gc-v4l2-src:pool:src:allocator> failed queueing buffer 1: Invalid argument

If it helps, I tested the script on a Ubuntu 16.04.1 LTS
Kernel: Linux 4.4.0-38-generic x86_64

Which kernel version do you recommend us to test it with?

If you need other info, I'll be happy to provide it. I couldn't yet take the time to test the issue further with the master branch.
Comment 9 Nicolas Dufresne (ndufresne) 2016-10-10 21:58:02 UTC
Yes "Invalid Argument" is the outcome of your kernel returning EINVAL where it should not. So I need to know why the kernel replied that in order to figure-out if you have hit a kernel bug or if it's something else. You need to check the kernel logs.

I'm testing on 4.8 of course ;-P
Comment 10 Alfonso 2016-11-17 10:16:32 UTC
Hello again.
I tested using master and on Ubuntu 16.10, with kernel 4.8.0-27 and the same behaviour occur. After executing a few pipelines one after the other, a "Resource is busy" appears and it fails to execute.
However, the error on master is different as there is no INTVAL.

I have added the GST_DEBUG=v4l2*:7 output for master.
Comment 11 Alfonso 2016-11-17 10:17:15 UTC
Created attachment 340119 [details]
Log of GST_DEBUG="v4l2*:7" on branch master.
Comment 12 Alfonso 2016-11-17 10:34:56 UTC
Also, I have previously tried the 1.8.2 Gstreamer tag on Ubuntu 14.04 with Kernel 3.13.0-100 and it worked fine. It was still going after close to 400 pipelines.
Comment 13 Nicolas Dufresne (ndufresne) 2016-11-17 16:51:46 UTC
You didn't provide the kernel trace. If it crashed in the kernel, you'll have to report to the kernel project.
Comment 14 Alfonso 2016-11-18 09:47:32 UTC
Created attachment 340226 [details]
strace output on branch master

strace -C -e open,close ./ko 18
Comment 15 Alfonso 2016-11-18 10:28:01 UTC
Here is the strace output.
It does not seem to have crashed in the kernel this time. Nothing shows up when doing dmesg.
Comment 16 Nicolas Dufresne (ndufresne) 2016-11-18 19:16:16 UTC
You could have told us that there was a critical assertion. Having to dig that out from your log it a pain:

** (ko:31351): CRITICAL **: gst_v4l2_buffer_pool_process: assertion 'gst_buffer_pool_is_active (bpool)' failed

That need to be fixed, after an assertion don't expect anything to go well. After this assert the device is not properly closed, hence the next open() not giving back the same FD number.
Comment 17 Nicolas Dufresne (ndufresne) 2016-11-18 20:39:39 UTC
I've given it a shot here. No kernel crash either and failure sometimes. What I understand is that whenever a buffer could not be queued (QBUF fail) on a capture pool, the buffer endup being leaked. I'm not sure how to fix this. I have a major rework in mind, but didn't find the time yet.

Though, this case should never occur. I have enable some kernel traces:
  echo 0xff > /sys/module/videobuf2_core/parameters/debug 

And find out that it fails to queue becaue:
  vb2-core: vb2_core_qbuf: invalid buffer state 5

 * @VB2_BUF_STATE_ACTIVE:	buffer queued in driver and possibly used
 *				in a hardware operation

I believe there is a race in the sequence of streamon/off. Also there is an extra routrip from on to off and on again which I didn't notice before.
Comment 18 Nicolas Dufresne (ndufresne) 2016-11-24 21:14:43 UTC
This is now fixed through bugs:
https://bugzilla.gnome.org/show_bug.cgi?id=775015
https://bugzilla.gnome.org/show_bug.cgi?id=775014

All patches backported to 1.8 and 1.10, will be part of the next release.
Comment 19 Alfonso 2017-02-09 09:44:09 UTC
Hello! Any idea about when the next release will have place? This issue is affecting a number of our users and it would be great to have a new 1.8 release for Ubuntu 16 so they can enjoy the fix :)
Comment 20 Nicolas Dufresne (ndufresne) 2017-02-09 12:29:10 UTC
There is no schedule for bug fix release. This issue is already released in current stable branch (1.10.3). For further assistance, contact your distribution. There may not be any new 1.8 release. For this version, you should package the patches.