After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 755782 - compositor: Segmentation fault
compositor: Segmentation fault
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-bad
git master
Other Linux
: Normal normal
: 1.11.90
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2015-09-29 10:22 UTC by Jose Antonio Santos Cadenas
Modified: 2017-03-06 15:36 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Execution log (1.40 MB, application/gzip)
2015-09-29 13:00 UTC, Jose Antonio Santos Cadenas
  Details
shmsrc + compositor that crashes with my example gst-launch (2.39 KB, text/x-c++src)
2016-07-20 19:58 UTC, Sean-Der
  Details
compositor: fix potential crash due to race condition (1.04 KB, patch)
2017-03-02 11:05 UTC, George Kiagiadakis
none Details | Review
videoaggregator: prevent running prepare_frame() on late configured pads (2.64 KB, patch)
2017-03-02 16:00 UTC, George Kiagiadakis
none Details | Review
videoaggregator: redo src caps negotiation if a sink pad's caps have changed in the meantime (1.60 KB, patch)
2017-03-03 14:24 UTC, George Kiagiadakis
accepted-commit_now Details | Review

Description Jose Antonio Santos Cadenas 2015-09-29 10:22:02 UTC
Basically in compositor.c:433

GST_VIDEO_INFO_FORMAT (&cpad->conversion_info)

The problem is that this macro expands in:

((&cpad->conversion_info)->finfo)->format

and in this case:

(&cpad->conversion_info)->finfo is NULL

I don't see where this structure has to be initialized.

This is gdb backtrace:

  • #0 gst_compositor_pad_prepare_frame
    at compositor.c line 433
  • #1 gst_aggregator_iterate_sinkpads
    at gstaggregator.c line 377
  • #2 gst_videoaggregator_aggregate
    at gstvideoaggregator.c line 1338
  • #3 gst_videoaggregator_aggregate
    at gstvideoaggregator.c line 1547
  • #4 gst_aggregator_aggregate_func
    at gstaggregator.c line 814
  • #5 gst_task_func
    at gsttask.c line 343
  • #6 g_thread_pool_thread_proxy
    at /build/buildd/glib2.0-2.44.1/./glib/gthreadpool.c line 307
  • #7 g_thread_proxy
    at /build/buildd/glib2.0-2.44.1/./glib/gthread.c line 764
  • #8 start_thread
    at pthread_create.c line 333
  • #9 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109
$30 = (const GstVideoFormatInfo *) 0x0
Comment 1 Tim-Philipp Müller 2015-09-29 12:16:13 UTC
Do you have an easy way of reproducing the issue (such as gst-launch-1.0 or test app)?
Comment 2 Jose Antonio Santos Cadenas 2015-09-29 12:36:58 UTC
Is not easy, it happens when we use compositor inside a Kurento pipeline and not all the times. I didn't have a gst-launch command that makes it fail.
Comment 3 Jose Antonio Santos Cadenas 2015-09-29 12:38:47 UTC
Will some logs be useful for debugging?
Comment 4 Tim-Philipp Müller 2015-09-29 12:47:21 UTC
Yes, a log would be useful. GST_DEBUG=*aggregat*:6,*video*:6,*compositor*:6 or so
Comment 5 Jose Antonio Santos Cadenas 2015-09-29 12:58:20 UTC
This is the stack trace of the failure with local data, I'm attaching the log.


  • #0 gst_compositor_pad_prepare_frame
    at compositor.c line 433
  • #1 gst_aggregator_iterate_sinkpads
    at gstaggregator.c line 377
  • #2 gst_videoaggregator_aggregate
    at gstvideoaggregator.c line 1338
  • #3 gst_videoaggregator_aggregate
    at gstvideoaggregator.c line 1547
  • #4 gst_aggregator_aggregate_func
    at gstaggregator.c line 814
  • #5 gst_task_func
    at gsttask.c line 343
  • #6 g_thread_pool_thread_proxy
    at /build/buildd/glib2.0-2.44.1/./glib/gthreadpool.c line 307
  • #7 g_thread_proxy
    at /build/buildd/glib2.0-2.44.1/./glib/gthread.c line 764
  • #8 start_thread
    at pthread_create.c line 333
  • #9 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109

Comment 6 Jose Antonio Santos Cadenas 2015-09-29 13:00:29 UTC
Created attachment 312359 [details]
Execution log
Comment 7 Tim-Philipp Müller 2016-04-23 15:47:06 UTC
Are you still able to reproduce this with 1.8.x or git master?
Comment 8 Jose Antonio Santos Cadenas 2016-04-25 07:21:56 UTC
We made changes to workaround this, so I can't tell you if it is fixed or not.
Comment 9 Sean-Der 2016-07-20 19:57:52 UTC
I am able to reproduce this reliably. If you start a shmsink

gst-launch-1.0 videotestsrc ! video/x-raw,width=640,height=480,framerate=30/1,format=BGRA ! shmsink socket-path=/tmp/foobar sync=true

and then run the attached code, I get the same exact segfault. Jose, did you discover a workaround that still allowed you to use the compositor?
Comment 10 Sean-Der 2016-07-20 19:58:36 UTC
Created attachment 331845 [details]
shmsrc + compositor that crashes with my example gst-launch
Comment 11 Sean-Der 2016-07-20 20:09:41 UTC
Also, this only crashes for me ~1/5 times, you may have to kill the process and try again a few times until you get a crash. The segfault occurs right as the autovideosink is being displayed so if you get video try again.
Comment 12 George Kiagiadakis 2017-03-02 10:58:42 UTC
I have seen this crash as well and after some debugging, I also know why it happens. It is a race condition caused by a questionable mutex unlock in gst_video_aggregator_src_setcaps() (which originates in the fix for bug 735042).

The case I have been debugging runs a complex live pipeline that involves a compositor with 10 sink pads. It looks like this compositor starts immediately after some of the sink pads are configured, so we run into a case where the aggregate() function runs for the first time but with only 3 out of the 10 sink pads being configured at the time.

* Since aggregate() is running for the first time, it proceeds to configure the src pad, which runs set_info() for the 3 fully linked sink pads and sets their conversion_info (in GstCompositorPad)

* Around the same time, another one of the sink pads (sink_1) is being sent a CAPS event (on another thread). This calls gst_video_aggregator_pad_sink_setcaps() and blocks on the video aggregator mutex.

* During the src pad configuration process on the aggregate thread, at some point execution reaches the GST_VIDEO_AGGREGATOR_UNLOCK() call in gst_video_aggregator_src_setcaps() where the video aggregator mutex is being temporarily unlocked

* The other thread that was blocked in gst_video_aggregator_pad_sink_setcaps() now takes the lock and configures sink_1.

* Configuration of this sink pad completes, the lock is released and taken by the aggregate thread again, while this thread continues and chains one buffer on this sink pad.

* aggregate() now reaches the point where it tries to prepare the images from the sink pads that are configured and have a buffer; this includes sink_1 at this time, as it has managed to chain a buffer before execution of this thread reached here!

* ... well, unfortunately, though, set_info() has not run for sink_1 so it has no conversion_info and the code crashes as explained in the description above.
Comment 13 George Kiagiadakis 2017-03-02 11:05:12 UTC
Created attachment 347045 [details] [review]
compositor: fix potential crash due to race condition

This patch should fix the crash, although it doesn't solve the race. I am a bit worried about other potential side effects it could have either on compositor or other elements based on GstVideoAggregator.
Comment 14 George Kiagiadakis 2017-03-02 16:00:50 UTC
Created attachment 347083 [details] [review]
videoaggregator: prevent running prepare_frame() on late configured pads

Here is a better patch, this should prevent the problem at the videoaggregator level, hence also preventing similar problems in other elements based on videoaggregator.

I am still not sure about the theoretical case where a sink pad *changes* caps in this racy way. In this case, set_info() will run with old format info and prepare_frame will run with a buffer containing new format info. This will probably result in a single-frame glitch somewhere in the picture.
Comment 15 Sebastian Dröge (slomo) 2017-03-02 16:08:04 UTC
(In reply to George Kiagiadakis from comment #14)
> 
> This will probably result in a single-frame glitch somewhere in the picture.

Or crash? :)
Comment 16 George Kiagiadakis 2017-03-03 14:24:17 UTC
Created attachment 347135 [details] [review]
videoaggregator: redo src caps negotiation if a sink pad's caps have changed in the meantime

And this is another approach which should fix all cases. We simply redo the whole src pad negotiation if a sink pad's caps change while the negotiation was in progress.
Comment 17 Thibault Saunier 2017-03-03 14:44:05 UTC
Review of attachment 347135 [details] [review]:

That fix makes sense to me. It definitely should fix the described race without introducing new risk afaict.
Comment 18 George Kiagiadakis 2017-03-06 15:24:46 UTC
I have pushed the patch, slightly modified though. Apparently the code was in the wrong else branch and still crashed, obviously.


commit 08c52c931e5145a569da1efcd53d77e0090ec898
Author: George Kiagiadakis <george.kiagiadakis@collabora.com>
Date:   Fri Mar 3 16:20:15 2017 +0200

    videoaggregator: redo src caps negotiation if a sink pad's caps have changed in the meantime
    
    https://bugzilla.gnome.org/show_bug.cgi?id=755782