GNOME Bugzilla – Bug 768518
crash in resampler_basic_direct_single
Last modified: 2016-11-11 14:43:23 UTC
My apps are crashing with the following stack trace. My first guess is that started to happen after upgrading from 1.8.1 to 1.8.2. More details soon. (gdb) thr a a bt
+ Trace 236451
Thread 7 (Thread 0x7fc43b7fe700 (LWP 21732))
Thread 3 (Thread 0x7fc44b04f700 (LWP 21730))
Thread 1 (Thread 0x7fc43affd700 (LWP 21733))
There were no changes in the resampler between 1.8.1 and 1.8.2
In my app I am forcing resample into audio/x-raw,rate=48000,channels=2,format=F32LE,layout=interleaved before I send it to shmsink. According to the stack trace it crashes on resample_float_resampler_process_interleaved_float, so maybe it's related.
What's the pipeline / elements before the resampler? Did you try running things through valgrind? Is it easy/reliable to reproduce for you? Is there a way for us to reproduce the issue?
I have a build and machine for which I can reproduce this 100% of times. However, it happens only there, on the target production server, where I am building GStreamer via cerbero. I am using cerbero cd2ebc2051b3eab797849d2cb16bbd6883375905. Never encountered this locally when using GStreamer from ubuntu 16.04. The pipeline is souphttpsrc ! decodebin ! [ audioconvert ! audioresample ! audiorate ! level ! queue ! shmsink shm-size=1048576 sync=true wait-for-connection=false ] square brackets mean my custom bin. I will try to create reproductible test case. BTW the app crashes with Bus error, not Segmentation fault.
When I run it via valgrind, I get ==25600== ==25600== Process terminating with default action of signal 7 (SIGBUS) ==25600== Non-existent physical address at address 0x403B000 ==25600== at 0xD237B05: resampler_basic_direct_single (resample.c:507) ==25600== by 0xD2378B5: speex_resampler_process_native (resample.c:1118) ==25600== by 0xD2390EE: resample_float_resampler_process_float (resample.c:1197) ==25600== by 0xD23961B: resample_float_resampler_process_interleaved_float (resample.c:1315) ==25600== by 0xD2358D9: gst_audio_resample_process (gstaudioresample.c:1110) ==25600== by 0xD2358D9: gst_audio_resample_transform (gstaudioresample.c:1240) ==25600== by 0xCB5EE70: default_generate_output (gstbasetransform.c:2180) ==25600== by 0xCB5EA63: gst_base_transform_chain (gstbasetransform.c:2333) ==25600== by 0x62532BD: gst_pad_chain_data_unchecked (gstpad.c:4192) ==25600== by 0x62532BD: gst_pad_push_data (gstpad.c:4429) ==25600== by 0x62582D2: gst_pad_push (gstpad.c:4548) ==25600== by 0xCB5EA40: gst_base_transform_chain (gstbasetransform.c:2369) ==25600== by 0x62532BD: gst_pad_chain_data_unchecked (gstpad.c:4192) ==25600== by 0x62532BD: gst_pad_push_data (gstpad.c:4429) ==25600== by 0x62582D2: gst_pad_push (gstpad.c:4548) ==25600== ==25600== HEAP SUMMARY: ==25600== in use at exit: 3,151,718 bytes in 25,702 blocks ==25600== total heap usage: 622,344 allocs, 596,642 frees, 116,659,491 bytes allocated ==25600== ==25600== LEAK SUMMARY: ==25600== definitely lost: 16,480 bytes in 3 blocks ==25600== indirectly lost: 136 bytes in 3 blocks ==25600== possibly lost: 6,356 bytes in 68 blocks ==25600== still reachable: 2,955,362 bytes in 24,857 blocks ==25600== of which reachable via heuristic: ==25600== length64 : 3,920 bytes in 80 blocks ==25600== newarray : 2,112 bytes in 52 blocks ==25600== suppressed: 0 bytes in 0 blocks ==25600== Rerun with --leak-check=full to see details of leaked memory ==25600== ==25600== For counts of detected and suppressed errors, rerun with: -v ==25600== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Indeed, the same with 1.8.1
Can you provide a testcase? I guess otherwise you'll have to debug that yourself, and e.g. start by adding some printfs in resample.c and gstaudioresample.c around that code to see what goes so completely wrong with the memory pointers.
I am trying to rebuild the server and packages from scratch to ensure that I am working within clean environment. I can reproduce this on only one server, on no other machine having the same setup (all are configured using the same scripts) I can reproduce this.
I've rebuilt everything, and changed format to S32LE.
+ Trace 236452
Thread 9 (Thread 0x7f344bfff700 (LWP 4546))
Thread 8 (Thread 0x7f34622a3700 (LWP 4543))
Thread 7 (Thread 0x7f346a303700 (LWP 4540))
Thread 5 (Thread 0x7f3469b02700 (LWP 4541))
Thread 3 (Thread 0x7f34612a1700 (LWP 4545))
Thread 2 (Thread 0x7f346e9b1700 (LWP 4538))
Thread 1 (Thread 0x7f3461aa2700 (LWP 4544))
I've found out that it always crashes on attempt to do gst_buffer_extract in shmsink. I start to doubt whether this is related to the resampler. I have added some debugging messages as in https://bugzilla.gnome.org/show_bug.cgi?id=768530 and before it crashes I see Copying 4800 bytes into map of size 4800 bytes. So there's no overflow there. Here's stack trace:
+ Trace 236453
BUS error could be misaligned access. What architecture is this machine ? Could it also be running with a gst build for an CPU with more capabilities than the one it's running on ?
This is this broken machine processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz stepping : 7 microcode : 0x28 cpu MHz : 1599.890 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts bugs : bogomips : 6185.76 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: This is machine with exact config and OS (ubuntu 16.04) that works fine processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i3-2130 CPU @ 3.40GHz stepping : 7 microcode : 0x28 cpu MHz : 2612.421 cache size : 3072 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm epb tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts bugs : bogomips : 6784.20 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: On both there's kernel Linux 001-can1-plumber 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux I use Vagrant virtual machine with Ubuntu 16.04 64bit to build the code using cerbero. I pass -c config/linux.config to the cerbero.
Wait, for this builds I used DigitalOcean VM which AFAIK internally uses qemu. Unfortunately I don't have access to it any more to check its capabilities.
I'm not really sure what to do with this. I'm sure you're running into an actual bug, but there's not really enough actionable information for us to debug this as far as I can see. A small test program that reproduces the issue would be good. The audioresample code has been rewritten since, the function it crashes in doesn't even exist any longer. Please feel free to re-open if you can provide a way to reproduce or have more information, thanks!