GNOME Bugzilla – Bug 782812
gst_element_factory_make: Program received signal SIGSEGV, Segmentation fault. g_slice_alloc (mem_size=mem_size@entry=368)
Last modified: 2017-07-14 13:44:32 UTC
When attempting to add encodebin followed by decodebin into a bin element, the second call to gst_element_factory_make() crashes as below. This is happening on Raspbian Jesse with AddressSanitizer enabled, which doesn't pick up any memory problems. ASAN:DEADLYSIGNAL ================================================================= ==7169==ERROR: AddressSanitizer: SEGV on unknown address 0x00000006 (pc 0x7654dbc4 bp 0x00000168 sp 0x7e9e46b8 T0) #0 0x7654dbc3 in g_slice_alloc (/lib/arm-linux-gnueabihf/libglib-2.0.so.0+0x67bc3) #1 0x76b120d3 in __asan::AsanOnDeadlySignal(int, void*, void*) ../../../../libsanitizer/asan/asan_posix.cc:79 #2 0x7635619f (/lib/arm-linux-gnueabihf/libc.so.6+0x2f19f) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV (/lib/arm-linux-gnueabihf/libglib-2.0.so.0+0x67bc3) in g_slice_alloc ==7169==ABORTING Running the code through gdb allows us to get the following backtrace: Program received signal SIGSEGV, Segmentation fault. g_slice_alloc (mem_size=mem_size@entry=368) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gslice.c:998 998 /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gslice.c: No such file or directory. (gdb) bt
+ Trace 237495
$1 = (const gchar *) 0x6eeffc00 "sink" (gdb) print dir $2 = GST_PAD_SINK (gdb) print templ $3 = (GstPadTemplate *) 0x74241df8 (gdb) print *templ $4 = {object = {object = {g_type_instance = {g_class = 0x724486b0}, ref_count = 2, qdata = 0x74c882a2}, lock = { p = 0x0, i = {0, 0}}, name = 0x746466d0 "sink", parent = 0x0, flags = 1, control_bindings = 0x0, control_rate = 100000000, last_sync = 18446744073709551615, _gst_reserved = 0x0}, name_template = 0x74646690 "sink", direction = GST_PAD_SINK, presence = GST_PAD_ALWAYS, caps = 0x74089028, _gst_reserved = {0x0, 0x0, 0x0, 0x0}} Is there anything inside GstPadTemplate that might trigger a segfault by accessing NULL+6?
This suggests memory corruption some time before. Can you run with valgrind to see if something else goes wrong before?
Alas, no luck with valgrind: ==22184== Memcheck, a memory error detector ==22184== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==22184== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==22184== Command: gst-launch-1.0 --gst-debug-no-color --gst-debug=1,GST_TRACER:7,GST_REFCOUNTING:0,GST_PADS:1,GST_STATES:5,tdttsparse:5,tsdemux:1,omx:1,omxvideo:1,omxvideodec:1,omxh264dec:1,omxvideoenc:1,omxh264enc:1,videoencoder:1,videorate:1,videoscale:1,mpegtsmux:4,decodebin:1,encodebin:1,h264parse:1,transcoder:5,hlssink:5 udpsrc multicast-iface=eth0 uri=udp://239.106.0.6:1234 caps=application/x-rtp,media=(string)video,clock-rate=(int)90000 ! rtpbin ! rtpmp2tdepay ! progressreport update-freq=5 ! transcoder name=transcoder ! capsfilter caps=video/x-raw ! transcoder. transcoder. ! capsfilter caps=video/mpegts ! hlssink target-duration=0 ==22184== disInstr(arm): unhandled instruction: 0xE734F817 cond=14(0xE) 27:20=115(0x73) 4:4=1 3:0=7(0x7) ==22184== valgrind: Unrecognised instruction at address 0x535bbc0. ==22184== at 0x535BBC0: __udivmoddi4 (libgcc2.c:1078) ==22184== by 0x5359777: ??? (bpabi.S:258) ==22184== by 0x491D857: __sanitizer::AppendNumber(char**, char const*, unsigned long long, unsigned char, unsigned char, bool, bool) (sanitizer_printf.cc:59) ==22184== Your program just tried to execute an instruction that Valgrind ==22184== did not recognise. There are two possible reasons for this. ==22184== 1. Your program has a bug and erroneously jumped to a non-code ==22184== location. If you are running Memcheck and you just saw a ==22184== warning about a bad jump, it's probably your program's fault. ==22184== 2. The instruction is legitimate but Valgrind doesn't handle it, ==22184== i.e. it's Valgrind's fault. If you think this is the case or ==22184== you are not sure, please let us know and we'll try to fix it. ==22184== Either way, Valgrind will now raise a SIGILL signal which will ==22184== probably kill your program. ==22184== ==22184== Process terminating with default action of signal 4 (SIGILL) ==22184== Illegal opcode at address 0x535BBC0 ==22184== at 0x535BBC0: __udivmoddi4 (libgcc2.c:1078) ==22184== by 0x5359777: ??? (bpabi.S:258) ==22184== by 0x491D857: __sanitizer::AppendNumber(char**, char const*, unsigned long long, unsigned char, unsigned char, bool, bool) (sanitizer_printf.cc:59) ==22184== Jump to the invalid address stated on the next line ==22184== at 0x0: ??? ==22184== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==22184== ==22184== ==22184== Process terminating with default action of signal 11 (SIGSEGV) ==22184== Bad permissions for mapped region at address 0x0 ==22184== at 0x0: ??? ==22184== ==22184== HEAP SUMMARY: ==22184== in use at exit: 0 bytes in 0 blocks ==22184== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==22184== ==22184== All heap blocks were freed -- no leaks are possible ==22184== ==22184== For counts of detected and suppressed errors, rerun with: -v ==22184== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Segmentation fault The crash reported inside valgrind appears to be a crash inside AddressSanitizer, it is not clear whether AddressSanitizer has crashed or whether valgrind genuinely doesn't understand the instructions.
With the source code for glib2.0 in place, we see the following full stacktrace (cross fingers bugzilla won't trash the trace): Program received signal SIGSEGV, Segmentation fault. g_slice_alloc (mem_size=mem_size@entry=368) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gslice.c:998 998 mem = thread_memory_magazine1_alloc (tmem, ix); (gdb) bt full
+ Trace 237498
Can you provide code that reproduces this problem? Also which version of GStreamer is that? In any case, if it crashes in g_slice_alloc(), that usually means that some time before that memory corruption happened.
I can only provide the code privately unfortunately. This is gstreamer git master, brought up to date and rebuilt again today to eliminate known bugs. Google's AddressSanitizer out of the box on Raspbian's default gcc is broken and crashes. AddressSanitizer built from source as part of gcc v6.2.0 doesn't crash, but doesn't show up any problem with gstreamer, which runs fine until the segfault. Valgrind out of the box on Raspbian is broken. Valgrind v3.12.0 built from source runs but crashes due to a clash with AddressSanitizer as documented here: https://github.com/google/sanitizers/issues/810 AddressSanitizer is impossible to disable in gcc v6.2.0, and so the valgrind clash appears impossible to work around, as documented here: https://github.com/google/sanitizers/issues/811 Are there any other memory debugging tools to try that have a hope of finding memory corruption?
Turns out I had a stray LD_PRELOAD in the testing session that triggers the AddressSanitizer bug even when not compiled in. Removing that had me colliding headlong with this: https://bugs.kde.org/show_bug.cgi?id=372794 The assertion failure if listed as fixed in trunk, so rebuilding valgrind from trunk then had me colliding headlong with this: https://bugs.kde.org/show_bug.cgi?id=322935
Further progress: - By removing /etc/ld.so.preload (and thus disabling the RPi's memcpy optimisation that ultimately breaks valgrind) I can get valgrind to run. - With valgrind in place, I cannot reproduce the crash. Without valgrind in place, I can reliably reproduce the crash. I do see many entries like the following, however I am assuming they're caused the a non-valgrind malloc being used: ==1370== Conditional jump or move depends on uninitialised value(s) ==1370== at 0x6AC7A4C: gst_omx_component_add_port (gstomx.c:959) ==1370== by 0x6ADCDAF: gst_omx_video_enc_open (gstomxvideoenc.c:258) ==1370== by 0x5D52EF3: gst_video_encoder_change_state (gstvideoencoder.c:1468) ==1370== by 0x6ADDE2F: gst_omx_video_enc_change_state (gstomxvideoenc.c:522) ==1370== by 0x48CF7EF: gst_element_change_state (gstelement.c:2749) ==1370== by 0x48CF4C7: gst_element_set_state_func (gstelement.c:2703) ==1370== by 0x48CF00B: gst_element_set_state (gstelement.c:2604) ==1370== by 0x48CDBC7: gst_element_sync_state_with_parent (gstelement.c:2131) ==1370== by 0x68D8D2F: _create_stream_group (gstencodebin.c:1583) ==1370== by 0x68D5C03: request_pad_for_stream (gstencodebin.c:738) ==1370== by 0x68D5FE3: gst_encode_bin_request_pad_signal (gstencodebin.c:816) ==1370== by 0x4DCBC77: ffi_call_VFP (in /usr/lib/arm-linux-gnueabihf/libffi.so.6.0.2) ==1370== Uninitialised value was created by a heap allocation ==1370== at 0x484550C: malloc (vg_replace_malloc.c:299) ==1370== by 0x6C0FE93: vcos_generic_mem_alloc_aligned (vcos_mem_from_malloc.c:56) ==1370== by 0x6C281FB: completion_thread (in /opt/vc/lib/libvchiq_arm.so) ==1370==
Leaving the valgrind running, I eventually stumbled on this, not sure if this is breaking something: ==1574== Conditional jump or move depends on uninitialised value(s) ==1574== at 0x4BDFA74: vfprintf (vfprintf.c:1641) ==1574== by 0x4C80607: __vasprintf_chk (vasprintf_chk.c:66) ==1574== by 0x4ADB1F3: vasprintf (stdio2.h:210) ==1574== by 0x4ADB1F3: g_vasprintf (gprintf.c:316) ==1574== by 0x4AB27AB: g_strdup_vprintf (gstrfuncs.c:507) ==1574== by 0x4AB280B: g_strdup_printf (gstrfuncs.c:533) ==1574== by 0x4A1627F: value_transform_int_string (gvaluetransform.c:156) ==1574== by 0x4A15437: g_value_transform (gvalue.c:613) ==1574== by 0x496FEC7: gst_value_serialize_int (gstvalue.c:3368) ==1574== by 0x497909B: gst_value_serialize (gstvalue.c:6106) ==1574== by 0x493A14F: priv_gst_structure_append_to_gstring (gststructure.c:1811) ==1574== by 0x48AEE63: gst_caps_to_string (gstcaps.c:2248) ==1574== by 0x48E0C23: gst_debug_print_object (gstinfo.c:818) ==1574== Uninitialised value was created by a heap allocation ==1574== at 0x484550C: malloc (vg_replace_malloc.c:299) ==1574== by 0x4A9824F: g_malloc (gmem.c:97) ==1574== by 0x4A00977: g_signal_newv (gsignal.c:1660) ==1574== by 0x4A01533: g_signal_new_valist (gsignal.c:1840) ==1574== by 0x4A015B7: g_signal_new (gsignal.c:1395) ==1574== by 0x5B36487: gst_rtp_ssrc_demux_class_init (gstrtpssrcdemux.c:385) ==1574== by 0x5B35ADB: gst_rtp_ssrc_demux_class_intern_init (gstrtpssrcdemux.c:102) ==1574== by 0x4A0C053: type_class_init_Wm (gtype.c:2217) ==1574== by 0x4A0C053: g_type_class_ref (gtype.c:2932) ==1574== by 0x49F0EB3: g_object_newv (gobject.c:1869) ==1574== by 0x49F156F: g_object_new (gobject.c:1614) ==1574== by 0x48D306F: gst_element_factory_create (gstelementfactory.c:372) ==1574== by 0x48D349B: gst_element_factory_make (gstelementfactory.c:445) The code at gstrtpssrcdemux.c:385 looks like this: gst_rtp_ssrc_demux_signals[SIGNAL_NEW_SSRC_PAD] = g_signal_new ("new-ssrc-pad", G_TYPE_FROM_CLASS (klass), G_SIGNAL_RUN_LAST, G_STRUCT_OFFSET (GstRtpSsrcDemuxClass, new_ssrc_pad), NULL, NULL, g_cclosure_marshal_generic, G_TYPE_NONE, 2, G_TYPE_UINT, GST_TYPE_PAD); Not sure if there is anything uninitialised in there?
The code that triggers valgrind above does this: 0:10:39.141107388 1574 0x1136d968 DEBUG transcoder gsttranscoder.c:783:gst_transcoder_pad_added_cb:<decodebin0> Pad added from decodebin, caps: video/x-raw(memory:GLMemory), format=(string)RGBA, width=(int)544, height=(int)576, interlace-mode=(string)mixed, multiview-mode=(string)mono, multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, pixel-aspect-ratio=(fraction)64/33, colorimetry=(string)sRGB, framerate=(fraction)25/1 It appears that the above debug message prints the caps, some part of the caps having been referenced from the signal created on the following line: ==1574== by 0x5B36487: gst_rtp_ssrc_demux_class_init (gstrtpssrcdemux.c:385) Not sure which of this is uninitialised?
With all debug logging switched off, the crash moves to here: Caught SIGSEGV
+ Trace 237506
Searching for crashes inside magazine_chain_pop_head() we find this crash, seemingly when gstreamer is used to generate an MPG thumbnail: https://bugs.launchpad.net/ubuntu/+source/tumbler/+bug/1290041
The crash is trigger by a function called "gst_transcoder_request_new_pad". Even though it's using gst_ namespace, I can't find this code in any of the official repo. I thought it was PiTiVi project gsttranscodebin, but the function didn't match. It's also a function unknown by Google. Please provide additional information, otherwise we'll have to close this bug as not-gnome.
gst_transcoder_request_new_pad is our code, it wires up some ghost pads in a bin and calls gst_ghost_pad_new_no_target_from_template() a number of times for each src and sink attached to the bin. We crash on the 6th attempt to create a ghostpad, which is the second attempt to create a sink pad. Breakpoint 3, gst_transcoder_request_new_pad (element=0x18e6c8, templ=0x18ccc8, name_templ=0x0, caps=0x0) at gsttranscoder.c:1120 1120 pad = gst_ghost_pad_new_no_target_from_template (name, templ); (gdb) cont Continuing. Breakpoint 2, gst_transcoder_request_new_pad (element=0x18e6c8, templ=0x18cc68, name_templ=0x0, caps=0x0) at gsttranscoder.c:1090 1090 pad = gst_ghost_pad_new_no_target_from_template (name, templ); (gdb) Continuing. Breakpoint 2, gst_transcoder_request_new_pad (element=0x18e6c8, templ=0x18cc68, name_templ=0x0, caps=0x0) at gsttranscoder.c:1090 1090 pad = gst_ghost_pad_new_no_target_from_template (name, templ); (gdb) Continuing. Breakpoint 2, gst_transcoder_request_new_pad (element=0x18e6c8, templ=0x18cc68, name_templ=0x0, caps=0x0) at gsttranscoder.c:1090 1090 pad = gst_ghost_pad_new_no_target_from_template (name, templ); (gdb) Continuing. Breakpoint 2, gst_transcoder_request_new_pad (element=0x18e6c8, templ=0x18cc68, name_templ=0x0, caps=0x0) at gsttranscoder.c:1090 1090 pad = gst_ghost_pad_new_no_target_from_template (name, templ); (gdb) Continuing. Breakpoint 3, gst_transcoder_request_new_pad (element=0x18e6c8, templ=0x18ccc8, name_templ=0x0, caps=0x0) at gsttranscoder.c:1120 1120 pad = gst_ghost_pad_new_no_target_from_template (name, templ); (gdb) Continuing. Program received signal SIGSEGV, Segmentation fault. g_slice_alloc (mem_size=mem_size@entry=368) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gslice.c:998 998 mem = thread_memory_magazine1_alloc (tmem, ix);
As said, this is all just distraction. Somewhere before the crash, memory corruption happened and you need to use a tool like valgrind or asan to find that, or closely look at all the code in between. Ideally please provide a testcase to reproduce this problem here, otherwise you'll have to do the debugging yourself.
Not sure what we can do about this with the information we have, sorry. Please feel free to reopen this bug report if you can provide more information. Thanks!