Bug 780682 – compositor: corrupts output when input caps change while running

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 780682 - compositor: corrupts output when input caps change while running


Summary:	compositor: corrupts output when input caps change while running


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	1.13.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2017-03-29 12:34 UTC by George Kiagiadakis
Modified:	2017-05-20 14:24 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken (11.05 KB, patch) 2017-04-04 08:51 UTC, George Kiagiadakis	none	Details \| Review
videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken (11.18 KB, patch) 2017-04-04 12:06 UTC, George Kiagiadakis	none	Details \| Review
videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken (11.17 KB, patch) 2017-04-20 16:17 UTC, George Kiagiadakis	committed	Details \| Review

Description George Kiagiadakis 2017-03-29 12:34:53 UTC

Consider the following pipeline:

gst-launch-1.0 -v videotestsrc is-live=true ! video/x-raw,width=10,height=20,format=I420  ! compositor name=c ! glimagesink  multifilesrc location=%04d.bmp loop=true caps=image/bmp,framerate=10/1 ! avdec_bmp ! videoconvert ! videorate ! c.

...where the .bmp files have been created with:
gst-launch-1.0 videotestsrc num-buffers=10 ! video/x-raw,width=320,height=240 ! avenc_bmp ! multifilesink location=%04d.bmp
gst-launch-1.0 videotestsrc num-buffers=10 ! video/x-raw,width=640,height=480 ! avenc_bmp ! multifilesink location=%04d.bmp index=10
(so there should be a total of 20 images, 10 at 320x240 and 10 at 640x480)

Now at the beginning, the caps of avdec_bmp::src have width=320,height=240 and therefore the output of compositor is configured to be at 320x240 as well. As soon as the 11th image is loaded, the caps now have width=640,height=480 and compositor is reconfigured to output @ 640x480. As soon as this happens, the output becomes corrupt (!) momentarily, while valgrind shows a couple of invalid reads around video-converter code:

==26291== Thread 4 c:src:
==26291== Invalid read of size 16
==26291==    at 0x4028AC0: ??? (in /run/user/1000/orcexec.x2M1gv (deleted))
==26291==    by 0x72DAA68: do_unpack_lines (video-converter.c:2785)
==26291==    by 0x72DABF5: gst_line_cache_get_lines (video-converter.c:617)
==26291==    by 0x72DAD03: convert_generic_task (video-converter.c:3072)
==26291==    by 0x72D6658: gst_parallelized_task_runner_run (video-converter.c:297)
==26291==    by 0x72D93E6: video_converter_generic (video-converter.c:3170)
==26291==    by 0x7A290D5: gst_compositor_pad_prepare_frame (compositor.c:572)
==26291==    by 0x7E439BA: gst_aggregator_iterate_sinkpads (gstaggregator.c:380)
==26291==    by 0x7C382A9: gst_video_aggregator_do_aggregate (gstvideoaggregator.c:1364)
==26291==    by 0x7C382A9: gst_video_aggregator_aggregate (gstvideoaggregator.c:1582)
==26291==    by 0x7E46784: gst_aggregator_aggregate_func (gstaggregator.c:824)
==26291==    by 0x4EE29F8: gst_task_func (gsttask.c:335)
==26291==    by 0x5B3DAED: g_thread_pool_thread_proxy (gthreadpool.c:307)

The problem seems to be that compositor chooses input buffers from its queue based on timestamps and never checks if the buffer it has chosen actually matches the configured format, so momentarily it picks a 320x240 frame, thinking that it is a 640x480 one, and sometimes also vice versa when the resolution drops.

Comment 1 Sebastian Dröge (slomo) 2017-03-29 12:38:50 UTC

That would also be solved by https://bugzilla.gnome.org/show_bug.cgi?id=779945

Comment 2 George Kiagiadakis 2017-03-29 16:55:21 UTC

Hmm, theoretically it should be solved, you are right. I just tested, though, with the patches from Olivier's branch and it doesn't seem to make any difference...

Comment 3 George Kiagiadakis 2017-03-30 14:47:07 UTC

So, it looks like the actual problem in my case is not that the caps event is not being processed in the src pad thread. The race in bug 779945 happens when the queue is empty, which is not the case here.

The actual problem is that the framerate of the image stream in my example is smaller than the framerate of the videotestsrc stream and this is what happens:

The last 320x240 buffer in c:sink_1 has pts=0.8s, while the first 640x480 one has pts=0.9s
Compositor generates an output buffer (320x240) with t=0.8s, using the last 320x240 input buffer in the queue.
Right after that, it processes the caps event that was serialized and changes the format to 640x480.
Next, it tries to generate the next output frame, which must have pts=0.833s, since the output framerate is 30/1.
It considers the 640x480 frame, but notices that its PTS is 0.9s, so it keeps it in the queue for later and continues using the previous frame (320x240).
Boom.

Comment 4 George Kiagiadakis 2017-03-30 15:56:55 UTC

Doesn't seem very easy to fix. Basically we need to make sure that event processing doesn't happen until the next buffer can be picked, but since the events don't have timestamps, it's not trivial. Also, event processing happens in GstAggregator, while the frame picking logic is in GstVideoAggregator, which complicates things. Ideas welcome.

Comment 5 Olivier Crête 2017-03-30 16:49:14 UTC

Seems like a bug in GstVideoAggregator, which should keep the event until it picks the relevant buffer.

Comment 6 George Kiagiadakis 2017-04-04 08:51:50 UTC

Created attachment 349218 [details] [review]
videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken

Comment 7 George Kiagiadakis 2017-04-04 12:06:05 UTC

Created attachment 349231 [details] [review]
videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken

Added an additional check to fix a regression caught by the compositor unit test.

Comment 8 Olivier Crête 2017-04-05 20:17:39 UTC

I wonder if this shouldnt be somehow done by the code in bug #776931 ..

Comment 9 George Kiagiadakis 2017-04-06 08:08:50 UTC

(In reply to Olivier Crête from comment #8)
> I wonder if this shouldnt be somehow done by the code in bug #776931 ..

No, it isn't done there. The attachment in bug 776931 basically moves code around, the way I see it. I did try it, but it didn't make any difference on this issue.

Comment 10 George Kiagiadakis 2017-04-20 16:17:20 UTC

Created attachment 350148 [details] [review]
videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken

I noticed that the previous version of the patch had a bug. If _fill_queues() returned GST_FLOW_NEEDS_DATA and the timeout variable of gst_video_aggregator_aggregate() was TRUE, then reconfiguration would not happen.

This is a fixed version of the patch. I simply don't use the return value from _fill_queues() to determine if there is need for reconfiguration, I check the src pad flags instead.

Comment 11 Sebastian Dröge (slomo) 2017-04-21 10:51:52 UTC

Not sure if we should get this in before 1.12 now, Olivier what do you think? It should probably also happen in combination with bug #776931, which is about similar problems.

Comment 12 Olivier Crête 2017-04-24 18:38:48 UTC

I'd keen on pushing this, but #776931 seems like it needs another iteration to fix the comments. I'd also be keen to push my agg-neg-2 branch, which I just attached as #781673.

Comment 13 Sebastian Dröge (slomo) 2017-04-25 09:25:36 UTC

I'd say now is a bit late, I'm planning 1.11.92 in two days, and then only really important bugfixes and 1.12.0 hopefully next week.

Comment 14 Olivier Crête 2017-04-25 15:13:58 UTC

Thinking about this again, I think this one should go in before 1.12 as it fixes a real observable bug, the others are more future looking.

Comment 15 Sebastian Dröge (slomo) 2017-05-09 12:48:56 UTC

Olivier, please merge this now then if it looks all ok to you. Not for 1.12 yet, let's give it some testing time in master first.

Comment 16 Olivier Crête 2017-05-20 14:24:26 UTC

Merged as:

commit 39d2526c34d2185c9d42a95ede8171b2d47fed18
Author: George Kiagiadakis <george.kiagiadakis@collabora.com>
Date:   Tue Apr 4 11:25:43 2017 +0300

    videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken
    
    When caps changes while streaming, the new caps was getting processed
    immediately in videoaggregator, but the next buffer in the queue that
    corresponds to this new caps was not necessarily being used immediately,
    which resulted sometimes in using an old buffer with new caps. Of course
    there used to be a separate buffer_vinfo for mapping the buffer with its
    own caps, but in compositor the GstVideoConverter was still using wrong
    info and resulted in invalid reads and corrupt output.
    
    This approach here is more safe. We delay using the new caps
    until we actually select the next buffer in the queue for use.
    This way we also eliminate the need for buffer_vinfo, since the
    pad->info is always in sync with the format of the selected buffer.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=780682