GNOME Bugzilla – Bug 755782
compositor: Segmentation fault
Last modified: 2017-03-06 15:36:44 UTC
Basically in compositor.c:433 GST_VIDEO_INFO_FORMAT (&cpad->conversion_info) The problem is that this macro expands in: ((&cpad->conversion_info)->finfo)->format and in this case: (&cpad->conversion_info)->finfo is NULL I don't see where this structure has to be initialized. This is gdb backtrace:
+ Trace 235496
$30 = (const GstVideoFormatInfo *) 0x0
Do you have an easy way of reproducing the issue (such as gst-launch-1.0 or test app)?
Is not easy, it happens when we use compositor inside a Kurento pipeline and not all the times. I didn't have a gst-launch command that makes it fail.
Will some logs be useful for debugging?
Yes, a log would be useful. GST_DEBUG=*aggregat*:6,*video*:6,*compositor*:6 or so
This is the stack trace of the failure with local data, I'm attaching the log.
+ Trace 235498
Created attachment 312359 [details] Execution log
Are you still able to reproduce this with 1.8.x or git master?
We made changes to workaround this, so I can't tell you if it is fixed or not.
I am able to reproduce this reliably. If you start a shmsink gst-launch-1.0 videotestsrc ! video/x-raw,width=640,height=480,framerate=30/1,format=BGRA ! shmsink socket-path=/tmp/foobar sync=true and then run the attached code, I get the same exact segfault. Jose, did you discover a workaround that still allowed you to use the compositor?
Created attachment 331845 [details] shmsrc + compositor that crashes with my example gst-launch
Also, this only crashes for me ~1/5 times, you may have to kill the process and try again a few times until you get a crash. The segfault occurs right as the autovideosink is being displayed so if you get video try again.
I have seen this crash as well and after some debugging, I also know why it happens. It is a race condition caused by a questionable mutex unlock in gst_video_aggregator_src_setcaps() (which originates in the fix for bug 735042). The case I have been debugging runs a complex live pipeline that involves a compositor with 10 sink pads. It looks like this compositor starts immediately after some of the sink pads are configured, so we run into a case where the aggregate() function runs for the first time but with only 3 out of the 10 sink pads being configured at the time. * Since aggregate() is running for the first time, it proceeds to configure the src pad, which runs set_info() for the 3 fully linked sink pads and sets their conversion_info (in GstCompositorPad) * Around the same time, another one of the sink pads (sink_1) is being sent a CAPS event (on another thread). This calls gst_video_aggregator_pad_sink_setcaps() and blocks on the video aggregator mutex. * During the src pad configuration process on the aggregate thread, at some point execution reaches the GST_VIDEO_AGGREGATOR_UNLOCK() call in gst_video_aggregator_src_setcaps() where the video aggregator mutex is being temporarily unlocked * The other thread that was blocked in gst_video_aggregator_pad_sink_setcaps() now takes the lock and configures sink_1. * Configuration of this sink pad completes, the lock is released and taken by the aggregate thread again, while this thread continues and chains one buffer on this sink pad. * aggregate() now reaches the point where it tries to prepare the images from the sink pads that are configured and have a buffer; this includes sink_1 at this time, as it has managed to chain a buffer before execution of this thread reached here! * ... well, unfortunately, though, set_info() has not run for sink_1 so it has no conversion_info and the code crashes as explained in the description above.
Created attachment 347045 [details] [review] compositor: fix potential crash due to race condition This patch should fix the crash, although it doesn't solve the race. I am a bit worried about other potential side effects it could have either on compositor or other elements based on GstVideoAggregator.
Created attachment 347083 [details] [review] videoaggregator: prevent running prepare_frame() on late configured pads Here is a better patch, this should prevent the problem at the videoaggregator level, hence also preventing similar problems in other elements based on videoaggregator. I am still not sure about the theoretical case where a sink pad *changes* caps in this racy way. In this case, set_info() will run with old format info and prepare_frame will run with a buffer containing new format info. This will probably result in a single-frame glitch somewhere in the picture.
(In reply to George Kiagiadakis from comment #14) > > This will probably result in a single-frame glitch somewhere in the picture. Or crash? :)
Created attachment 347135 [details] [review] videoaggregator: redo src caps negotiation if a sink pad's caps have changed in the meantime And this is another approach which should fix all cases. We simply redo the whole src pad negotiation if a sink pad's caps change while the negotiation was in progress.
Review of attachment 347135 [details] [review]: That fix makes sense to me. It definitely should fix the described race without introducing new risk afaict.
I have pushed the patch, slightly modified though. Apparently the code was in the wrong else branch and still crashed, obviously. commit 08c52c931e5145a569da1efcd53d77e0090ec898 Author: George Kiagiadakis <george.kiagiadakis@collabora.com> Date: Fri Mar 3 16:20:15 2017 +0200 videoaggregator: redo src caps negotiation if a sink pad's caps have changed in the meantime https://bugzilla.gnome.org/show_bug.cgi?id=755782