GNOME Bugzilla – Bug 780682
compositor: corrupts output when input caps change while running
Last modified: 2017-05-20 14:24:48 UTC
Consider the following pipeline: gst-launch-1.0 -v videotestsrc is-live=true ! video/x-raw,width=10,height=20,format=I420 ! compositor name=c ! glimagesink multifilesrc location=%04d.bmp loop=true caps=image/bmp,framerate=10/1 ! avdec_bmp ! videoconvert ! videorate ! c. ...where the .bmp files have been created with: gst-launch-1.0 videotestsrc num-buffers=10 ! video/x-raw,width=320,height=240 ! avenc_bmp ! multifilesink location=%04d.bmp gst-launch-1.0 videotestsrc num-buffers=10 ! video/x-raw,width=640,height=480 ! avenc_bmp ! multifilesink location=%04d.bmp index=10 (so there should be a total of 20 images, 10 at 320x240 and 10 at 640x480) Now at the beginning, the caps of avdec_bmp::src have width=320,height=240 and therefore the output of compositor is configured to be at 320x240 as well. As soon as the 11th image is loaded, the caps now have width=640,height=480 and compositor is reconfigured to output @ 640x480. As soon as this happens, the output becomes corrupt (!) momentarily, while valgrind shows a couple of invalid reads around video-converter code: ==26291== Thread 4 c:src: ==26291== Invalid read of size 16 ==26291== at 0x4028AC0: ??? (in /run/user/1000/orcexec.x2M1gv (deleted)) ==26291== by 0x72DAA68: do_unpack_lines (video-converter.c:2785) ==26291== by 0x72DABF5: gst_line_cache_get_lines (video-converter.c:617) ==26291== by 0x72DAD03: convert_generic_task (video-converter.c:3072) ==26291== by 0x72D6658: gst_parallelized_task_runner_run (video-converter.c:297) ==26291== by 0x72D93E6: video_converter_generic (video-converter.c:3170) ==26291== by 0x7A290D5: gst_compositor_pad_prepare_frame (compositor.c:572) ==26291== by 0x7E439BA: gst_aggregator_iterate_sinkpads (gstaggregator.c:380) ==26291== by 0x7C382A9: gst_video_aggregator_do_aggregate (gstvideoaggregator.c:1364) ==26291== by 0x7C382A9: gst_video_aggregator_aggregate (gstvideoaggregator.c:1582) ==26291== by 0x7E46784: gst_aggregator_aggregate_func (gstaggregator.c:824) ==26291== by 0x4EE29F8: gst_task_func (gsttask.c:335) ==26291== by 0x5B3DAED: g_thread_pool_thread_proxy (gthreadpool.c:307) The problem seems to be that compositor chooses input buffers from its queue based on timestamps and never checks if the buffer it has chosen actually matches the configured format, so momentarily it picks a 320x240 frame, thinking that it is a 640x480 one, and sometimes also vice versa when the resolution drops.
That would also be solved by https://bugzilla.gnome.org/show_bug.cgi?id=779945
Hmm, theoretically it should be solved, you are right. I just tested, though, with the patches from Olivier's branch and it doesn't seem to make any difference...
So, it looks like the actual problem in my case is not that the caps event is not being processed in the src pad thread. The race in bug 779945 happens when the queue is empty, which is not the case here. The actual problem is that the framerate of the image stream in my example is smaller than the framerate of the videotestsrc stream and this is what happens: The last 320x240 buffer in c:sink_1 has pts=0.8s, while the first 640x480 one has pts=0.9s Compositor generates an output buffer (320x240) with t=0.8s, using the last 320x240 input buffer in the queue. Right after that, it processes the caps event that was serialized and changes the format to 640x480. Next, it tries to generate the next output frame, which must have pts=0.833s, since the output framerate is 30/1. It considers the 640x480 frame, but notices that its PTS is 0.9s, so it keeps it in the queue for later and continues using the previous frame (320x240). Boom.
Doesn't seem very easy to fix. Basically we need to make sure that event processing doesn't happen until the next buffer can be picked, but since the events don't have timestamps, it's not trivial. Also, event processing happens in GstAggregator, while the frame picking logic is in GstVideoAggregator, which complicates things. Ideas welcome.
Seems like a bug in GstVideoAggregator, which should keep the event until it picks the relevant buffer.
Created attachment 349218 [details] [review] videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken
Created attachment 349231 [details] [review] videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken Added an additional check to fix a regression caught by the compositor unit test.
I wonder if this shouldnt be somehow done by the code in bug #776931 ..
(In reply to Olivier Crête from comment #8) > I wonder if this shouldnt be somehow done by the code in bug #776931 .. No, it isn't done there. The attachment in bug 776931 basically moves code around, the way I see it. I did try it, but it didn't make any difference on this issue.
Created attachment 350148 [details] [review] videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken I noticed that the previous version of the patch had a bug. If _fill_queues() returned GST_FLOW_NEEDS_DATA and the timeout variable of gst_video_aggregator_aggregate() was TRUE, then reconfiguration would not happen. This is a fixed version of the patch. I simply don't use the return value from _fill_queues() to determine if there is need for reconfiguration, I check the src pad flags instead.
Not sure if we should get this in before 1.12 now, Olivier what do you think? It should probably also happen in combination with bug #776931, which is about similar problems.
I'd keen on pushing this, but #776931 seems like it needs another iteration to fix the comments. I'd also be keen to push my agg-neg-2 branch, which I just attached as #781673.
I'd say now is a bit late, I'm planning 1.11.92 in two days, and then only really important bugfixes and 1.12.0 hopefully next week.
Thinking about this again, I think this one should go in before 1.12 as it fixes a real observable bug, the others are more future looking.
Olivier, please merge this now then if it looks all ok to you. Not for 1.12 yet, let's give it some testing time in master first.
Merged as: commit 39d2526c34d2185c9d42a95ede8171b2d47fed18 Author: George Kiagiadakis <george.kiagiadakis@collabora.com> Date: Tue Apr 4 11:25:43 2017 +0300 videoaggregator: delay using new caps from a sink pad until the next buffer in the queue is taken When caps changes while streaming, the new caps was getting processed immediately in videoaggregator, but the next buffer in the queue that corresponds to this new caps was not necessarily being used immediately, which resulted sometimes in using an old buffer with new caps. Of course there used to be a separate buffer_vinfo for mapping the buffer with its own caps, but in compositor the GstVideoConverter was still using wrong info and resulted in invalid reads and corrupt output. This approach here is more safe. We delay using the new caps until we actually select the next buffer in the queue for use. This way we also eliminate the need for buffer_vinfo, since the pad->info is always in sync with the format of the selected buffer. https://bugzilla.gnome.org/show_bug.cgi?id=780682