GNOME Bugzilla – Bug 774428
qtdemux: Outputting unaligned raw audio/video buffers
Last modified: 2016-11-24 11:41:40 UTC
You can find an example test here: https://ci.gstreamer.net/job/GStreamer-master-validate/lastCompletedBuild/testReport/validate.http.transcode/to_vorbis_and_theora_in_ogg/raw_video_mov/ This bug only happens when orc is enabled and only when running the GstValidate HTTP server ported to python3 (if you run the old version in python2 it seems to never happen). This issue seems to only happen when transcoding raw video(UYVY) inside MOV files. It is very simple to reproduce on master doing: gst-validate-launcher -t validate.http.transcode.to_vorbis_and_vp8_in_webm.raw_video_mov -f (it is somehow racy but it fails almost all the time still). Another way of reproducing with a sensibly smaller pipeline: gst-validate-launcher --http-only --http-server-port=8000 && gst-validate-1.0 uridecodebin uri=http://127.0.0.1:8000/defaults/mp4/raw_video.mov ! videoconvert ! video/x-raw,format=I420 ! autovideosink --- The stack trace: PID: 937 (gst-validate-tr) UID: 1001 (jenkins) GID: 1001 (jenkins) Signal: 11 (SEGV) Timestamp: Mon 2016-11-14 18:18:02 UTC (15s ago) Command Line: /home/jenkins/workspace/GStreamer-master-validate/gst-devtools/validate/tools/gst-validate-transcoding-1.0-debug -o application/ogg:video/x-theora http://127.0.0.1:8039/defaults/mp4/raw_video.mov file:///home/jenkins/workspace/GStreamer-master-validate/validate-output/rendered/validate/http/to_vorbis_and_theora_in_ogg/raw_video_mov Executable: /home/jenkins/workspace/GStreamer-master-validate/gst-devtools/validate/tools/gst-validate-transcoding-1.0-debug Control Group: / Slice: -.slice Boot ID: b8064adcba4347fbb05f72f8a3318336 Machine ID: 4d2c5bb8d53a4138bff9fd3a6cfe9d8f Hostname: london.bilboed.com Coredump: /var/lib/systemd/coredump/core.gst-validate-tr.1001.b8064adcba4347fbb05f72f8a3318336.937.1479147482000000.xz Message: Process 937 (gst-validate-tr) of user 1001 dumped core. Stack trace of thread 1058: #0 0x00007f7dea104180 n/a (n/a) Thread apply all bt: GNU gdb (GDB) Fedora 7.10.1-31.fc23 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /home/jenkins/workspace/GStreamer-master-validate/gst-devtools/validate/tools/gst-validate-transcoding-1.0-debug...done. warning: core file may not match specified executable file. [New LWP 1058] [New LWP 937] [New LWP 1029] [New LWP 1041] [New LWP 1046] [New LWP 980] [New LWP 981] [New LWP 1059] [New LWP 1132] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/home/jenkins/workspace/GStreamer-master-validate/gst-devtools/validate/tools/g'. Program terminated with signal SIGSEGV, Segmentation fault.
+ Trace 236863
Thread 9 (Thread 0x7f7dc0ece700 (LWP 1132))
Thread 8 (Thread 0x7f7dc16cf700 (LWP 1059))
Thread 7 (Thread 0x7f7dde518700 (LWP 981))
Thread 5 (Thread 0x7f7dc3147700 (LWP 1046))
Thread 4 (Thread 0x7f7dc3fff700 (LWP 1041))
Thread 3 (Thread 0x7f7dc8c93700 (LWP 1029))
Thread 1 (Thread 0x7f7dc1ed0700 (LWP 1058))
Can you also get a valgrind log (with --track-origins=yes) of it? What are the exact caps on both sides of the videoconvert here, the following? > /GstPipeline:pipeline0/GstVideoConvert:videoconvert0.GstPad:src: caps = video/x-raw, width=(int)320, height=(int)240, interlace-mode=(string)progressive, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)30/1, format=(string)I420 > /GstPipeline:pipeline0/GstVideoConvert:videoconvert0.GstPad:sink: caps = video/x-raw, format=(string)UYVY, width=(int)320, height=(int)240, interlace-mode=(string)progressive, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)jpeg, colorimetry=(string)bt601, framerate=(fraction)30/1
in caps: video/x-raw, format=(string)UYVY, width=(int)320, height=(int)240, interlace-mode=(string)progressive, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)jpeg, colorimetry=(string)bt601, framerate=(fraction)30/1 out caps: video/x-raw, width=(int)320, height=(int)240, interlace-mode=(string)progressive, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)30/1, format=(string)I420 And I confirm it's been failing on ci.gst since the switch to python3 (and not before). I can also reproduce it locally (so not specific to ci.gst machine).
Created attachment 340182 [details] Pipeline dump Output of valgrind --track-origins=yes (does not sound useful to me, am I missing something?): ==7862== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==7862== General Protection Fault ==7862== at 0x9AAD1C0: ??? (in /run/user/1000/orcexec.BpSdB0 (deleted)) ==7862== by 0x7BC80F2: convert_UYVY_I420 (video-converter.c:3074) ==7862== by 0x7BC51BC: gst_video_converter_frame (video-converter.c:2374) ==7862== by 0xB5E8A9B: gst_video_convert_transform_frame (gstvideoconvert.c:692) ==7862== by 0x7BAF45A: gst_video_filter_transform (gstvideofilter.c:271) ==7862== by 0x7E5F866: default_generate_output (gstbasetransform.c:2183) ==7862== by 0x7E5FF1E: gst_base_transform_chain (gstbasetransform.c:2336) ==7862== by 0x4E56AF6: gst_validate_pad_monitor_chain_func (gst-validate-pad-monitor.c:2121) ==7862== by 0x512CABD: gst_pad_chain_data_unchecked (gstpad.c:4206) ==7862== by 0x512D68C: gst_pad_push_data (gstpad.c:4458) ==7862== by 0x512DDB4: gst_pad_push (gstpad.c:4577) ==7862== by 0x510D698: gst_proxy_pad_chain_default (gstghostpad.c:126) ==7862== ==7862== HEAP SUMMARY: ==7862== in use at exit: 34,386,589 bytes in 62,430 blocks ==7862== total heap usage: 265,164 allocs, 202,734 frees, 135,261,758 bytes allocated ==7862== ==7862== LEAK SUMMARY: ==7862== definitely lost: 16,472 bytes in 4 blocks ==7862== indirectly lost: 1,573 bytes in 57 blocks ==7862== possibly lost: 19,635 bytes in 258 blocks ==7862== still reachable: 34,176,229 bytes in 61,295 blocks ==7862== of which reachable via heuristic: ==7862== length64 : 4,448 bytes in 86 blocks ==7862== newarray : 2,000 bytes in 45 blocks ==7862== suppressed: 0 bytes in 0 blocks ==7862== Rerun with --leak-check=full to see details of leaked memory ==7862== ==7862== For counts of detected and suppressed errors, rerun with: -v ==7862== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) [1] 7862 segmentation fault (core dumped)
That looks like valgrind did not work well for you. Usually it a) prints the address that was tried to be accessed, and b) with --track-origins=yes tells you things like "which was freed here and allocated here", or "which is 1000 bytes into an area of size 1234 allocated here".
Does it also crash when using ORC_CODE=backup?
(In reply to Sebastian Dröge (slomo) from comment #4) > That looks like valgrind did not work well for you. Usually it a) prints the > address that was tried to be accessed, and b) with --track-origins=yes tells > you things like "which was freed here and allocated here", or "which is 1000 > bytes into an area of size 1234 allocated here". Yes, let me check why (coming back to you :)) (In reply to Sebastian Dröge (slomo) from comment #5) > Does it also crash when using ORC_CODE=backup? No, it only fails when actually using orc.
Looks like I am not able to get information from valgrind, any idea why?
Doesn't crash here for me on Debian. Python 3.5.2 python-gi 3.22.0 Intel Core i7-4790K
I spoke to fast. It crashes when running in valgrind, but not otherwise.
This is an alignment problem probably. The src pointers passed in convert_UYVY_I420() to video_orc_convert_UYVY_I420() are required to be 32 bit aligned (we give them as 32 bit words to ORC!) but at the time it crashes they are even odd here. Similar the first two dest pointers must be 2-byte aligned (the two Y destinations), but that's the case unless odd width (or someone decides to do an odd stride... we don't prevent that, do we?).
Problem here is that qtdemux does not ensure any alignment of the output buffers. For raw audio and video we require that though
I can reproduce this quite easily on Debian too by serving the file with twistd -n web --path .... and then doing: gst-launch-1.0 uridecodebin uri=http://127.0.0.1:8080/raw_video.mov ! videoconvert ! video/x-raw,format=I420 ! autovideosink I'm quite certain it's an alignment issue.
commit bb35f15d44e6a881f2c64fe731345c6d840fe789 Author: Sebastian Dröge <sebastian@centricular.com> Date: Sun Nov 20 13:08:27 2016 +0200 qtdemux: Ensure that raw audio and video have properly aligned buffers That is, aligned to the basic type for audio and to 32 bytes for video. Fixes crashes if the raw buffers are passed to SIMD processing functions. https://bugzilla.gnome.org/show_bug.cgi?id=774428
It's correct in matroskademux, we should check how it is in other demuxers.
commit b8265e95a7c75f7a932a421419d6eeda67645d8e Author: Sebastian Dröge <sebastian@centricular.com> Date: Sun Nov 20 13:14:08 2016 +0200 avidemux: Ensure that raw video have properly aligned buffers That is, aligned to to 32 bytes for video. Fixes crashes if the raw buffers are passed to SIMD processing functions. https://bugzilla.gnome.org/show_bug.cgi?id=774428
I think we should on top run an allocation query and re-use the allocation params there. Even better would be using a downstream pool when available.
There are/were problems with having demuxers do allocation queries because of the blocking nature of the allocation queries.
Yes, see commit b001da292626c16c3cfa995585673380f65a9f4f Author: Sebastian Dröge <slomo@circular-chaos.org> Date: Wed Jun 19 11:06:37 2013 +0200 qtdemux: Disable usage of allocation queries This can only reliably work if demuxers have a separate streaming thread per srcpad. This should be done in a demuxer base class, which integrates parts of multiqueue https://bugzilla.gnome.org/show_bug.cgi?id=701856