Bug 775564 – v4l2 1.10 Regression: white pixels with camera on Raspberry Pi & gstgl

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 775564 - v4l2 1.10 Regression: white pixels with camera on Raspberry Pi & gstgl


Summary:	v4l2 1.10 Regression: white pixels with camera on Raspberry Pi & gstgl


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	1.10.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	1.10.4
Assigned To:	Nicolas Dufresne (ndufresne)
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-12-03 10:26 UTC by gohai
Modified:	2017-02-22 09:17 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
*Trace with GST_DEBUG="v4l:7"** (776.37 KB, text/plain) 2016-12-19 18:28 UTC, gohai		Details
WIP: v4l2object: Update image size when extrapolating (1.28 KB, patch) 2017-01-03 00:36 UTC, Nicolas Dufresne (ndufresne)	committed	Details \| Review

Description gohai 2016-12-03 10:26:04 UTC

We're seeing a regression on the Raspberry Pi (w/ the vc4 GLES2 driver) between version 1.8.3. and 1.10.1, but I believe I saw the same issue already in a 1.9.x build. With the new version, retrieving a GL texture from the Raspberry Pi camera only shows a white rectangle, while the same code worked before. The camera is accessed using the bcm2835-v4l2 driver, and instantiated using gst_device_create_element.

All other functionality (playback etc) seem to work just fine. Our JNI library was compiled against 1.10.1 headers.

The pipeline is something like this: v4l2src ! capsfilter [here just: video/x-raw] ! glupload ! glcolorconvert ! capsfilter [video/x-raw(memory:GLMemory),format=RGBA,texture-target=2D] ! fakesink

The negotiated caps between 1.8.3 and 1.10.1 look the same (just the order is slightly different): video/x-raw(memory:GLMemory), width=(int)320, height=(int)200, framerate=(fraction)90/1, format=(string)RGBA, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, texture-target=(string)2D


To reproduce:

1. Download https://github.com/processing/processing/releases/download/processing-0255-3.2.3/processing-3.2.3-linux-raspbian.zip
2. Unpack and write the image onto an SD card
3. Connect a CSI camera to the Raspberry Pi
4. Boot the prepared SD card
5. Launch Processing via the "Development" menu
6. In Processing: File > Examples... > Contributed Libraries > GL Video > SimpleCapture
7. Hit play

This will yield a working display of the camera image. The GStreamer version bundled is 1.8.3.

8. Quit Processing
9. Remove ~/sketchbook/libraries/glvideo
10. Download http://sukzessiv.net/~gohai/gstreamer/processing-glvideo-1.10.1.zip
11. Unpack into ~/sketchbook/libraries
12. Launch Processing again
13. In Processing: File > Examples... > Contributed Libraries > GL Video > SimpleCapture
14. Hit play

This will result in a white rectangle being displayed. No apparent error messages in the console.


Source:

The gist of the JNI code can be found here: https://github.com/gohai/processing-glvideo/blob/wip-gstreamer-1.10/src/native/impl.c

The call-chain for camera playback:
* Java_gohai_glvideo_GLVideo_gstreamer_1openDevice
** calls getDeviceSrcElement, which calls gst_device_create_element
** calls createGlPipeline, which calls init_device_player to set up the pipeline
* the handle_buffer function is called through callbacks to retrieve the texture id
* the Java_gohai_glvideo_GLVideo_gstreamer_1getFrame function is called from Java to retrieve the texture id, and give up ownership of the previous one

Comment 1 Nicolas Dufresne (ndufresne) 2016-12-03 16:22:17 UTC

It is very unlikely that I will have time to redo such a complex setup, specially that you need to run on specialized hardware. I can provide information to help investigate.

1) Are you able to reproduce with glimagesink ?

  V4l2src ! glimagesink

Do the problem go away if you add a tee. Something like this but in your app.

  V4l2src ! tee ! glimagesink

That would indicate stride error in your renderer.

Have you run with GST_DEBUG=3 and seen any warning ?

Comment 2 gohai 2016-12-18 17:50:27 UTC

@Nicolas (In reply to Nicolas Dufresne (stormer) from comment #1)

> Have you run with GST_DEBUG=3 and seen any warning ?

I checked, and indeed - I am continuously seeing the warnings below. I believe this must be it!

0:00:12.177571786  1879 0x67960d80 WARN          v4l2bufferpool gstv4l2bufferpool.c:1741:gst_v4l2_buffer_pool_process:<v4l2src0:pool:src> Invalid buffer size, this is likely due to a bug in your driver, dropping
0:00:12.177657775  1879 0x67960d80 WARN          v4l2bufferpool gstv4l2bufferpool.c:1958:gst_v4l2_buffer_pool_process:<v4l2src0:pool:src> Dropping corrupted buffer without payload


> 1) Are you able to reproduce with glimagesink ?
> 
>   V4l2src ! glimagesink

This works.

> Do the problem go away if you add a tee. Something like this but in your app.
> 
>   V4l2src ! tee ! glimagesink

Adding a tee to my application does not make it work there.

Comment 3 Nicolas Dufresne (ndufresne) 2016-12-19 16:47:16 UTC

Next step would be to share a trace with GST_DEBUG="v4l*:7" The bug is probably happening with a specific resolution and format.

As the warning mention, it is likely a driver bug, I should be able to confirm with the trace.

Comment 4 gohai 2016-12-19 18:27:41 UTC

(In reply to Nicolas Dufresne (stormer) from comment #3)
> Next step would be to share a trace with GST_DEBUG="v4l*:7" The bug is
> probably happening with a specific resolution and format.

Please find the requested trace attached below.

Comment 5 gohai 2016-12-19 18:28:24 UTC

Created attachment 342231 [details]
Trace with GST_DEBUG="v4l*:7"

Comment 6 Nicolas Dufresne (ndufresne) 2017-01-03 00:08:21 UTC

Here's the result of my analyses. From the trace we have.

> Got format of 320x200, format YUYV, nb planes 1, colorspace 1
> stride 640, sizeimage 133120

The S_FMT implementation in this driver is weird since the required image size should be srtide * height =  640 * 200 = 128000.

In the most recent code, we can notice that we copy that size into the video info structure:

> info->size = format->fmt.pix.sizeimage;

And that value will later be used for sanity check. So what happens is that when we dequeue, the driver report the proper size:

> dequeued buffer 0x63038f20 seq:0 (ix=1), mem 0x630552f0 used 128000

But the sanity check fails.

> Invalid buffer size, this is likely due to a bug in your driver, dropping

Now, this check was added to detect short image produced upon corruption (system not being real-time anymore). Normally this is not an issue, but there is a bug in this driver apparently. There is also a relationship with the size set in buffer pool. This was all to avoid userspace using buffer with that was short allocated by the driver (which can crash the userspace app).

In this case, the driver seems to suggest that extra space is being allocated, so maybe we can find a way to survive this driver bug.

Comment 7 Nicolas Dufresne (ndufresne) 2017-01-03 00:36:24 UTC

Created attachment 342739 [details] [review]
WIP: v4l2object: Update image size when extrapolating

Update the image size according the amount of data we are going to
read/write. This workaround bugs in driver where the sizeimage provided
by TRY/S_FMT represent the buffer length (maximum size) rather then the expected
bytesused (buffer size).

Comment 8 Nicolas Dufresne (ndufresne) 2017-01-03 00:37:08 UTC

I'll need someone with a Raspberry Pi and a PiCAM to test this workaround please.

Comment 9 gohai 2017-01-03 01:30:15 UTC

Thanks for looking into this, I'll give the workaround a try!

Comment 10 gohai 2017-01-03 13:39:51 UTC

The patch from Comment 7 indeed fixes the issue for me. (tested on Raspberry Pi, on top of GStreamer 1.10.1) Thanks again.

Comment 11 gohai 2017-01-29 18:42:36 UTC

Nicolas, are you planning on incorporating your patch into future 1.10.x releases?

I've pointed the driver developers to the issue - here's the response that I've got. (Believe the current behavior is around to stay.)

"It's not a bug. The GPU has a restriction that the height must be a multiple of 16. Your height of 200 is not. The buffer size has to reflect the padded height of a frame to 640x208, otherwise you'll corrupt the memory after your image. 640*208 = 133120 which is the result you have seen.
I've not seen a mechanism within V4L2 to allow a driver to specify vertical padding requirements except by requesting a bigger buffer than is required. This doesn't work on formats such as V4L2_PIX_FMT_YUV420 where you want to specify padding between the planes, so the GPU has to do some further munging of the data on those formats."

Comment 12 Nicolas Dufresne (ndufresne) 2017-01-29 20:08:26 UTC

Well, this guy didn't understood the bug. If it was vertical padding issue, the U and the V planes would be 16 pixels off, and you would notice the colors being slightly wrong. In this case it's just the bytesused does not match size image. It could be legitimate in userptr or dmabuf-import mode, but that, I don't know. Other then that, V4L has mplane format that cover all this. With these formats, each plane is specified separately. This gives full control over padding between planes, even allow having separate allocation per plane.

Other then that, I think this patch can land. If something worked before and it's not dangerous to workaround, then let's go.

Comment 13 Nicolas Dufresne (ndufresne) 2017-02-22 08:54:43 UTC

Comited as 0b83e4ceaf7d9a5cc655ff96bd288b61ef106347

Comment 14 Nicolas Dufresne (ndufresne) 2017-02-22 09:16:02 UTC

In 1.10 as 58c3b8861c00e68e0b39f3a1c2cd72b213aa97b3

Comment 15 Nicolas Dufresne (ndufresne) 2017-02-22 09:17:20 UTC

Wrong ref, 1.10 as 38af12081e90cb9b7124b911d4efa2a6e5dc66b6