GNOME Bugzilla – Bug 773546
vaapiencoders: race condition when multiple encoders
Last modified: 2016-12-01 14:46:34 UTC
Created attachment 338533 [details] test program Generate a sample file: $ gst-launch-1.0 videotestsrc num-buffers=292 pattern=smpte ! video/x-raw\,\ format\=\(string\)NV12\,\ width\=\(int\)1920\,\ height\=\(int\)1080\,\ framerate\=\(fraction\)30/1 ! filesink location=/tmp/buf.raw Run the following in a loop $ gst-launch-1.0 -f filesrc location=/tmp/buf.raw blocksize=3110400 ! videoparse width=1920 height=1080 framerate=30 format=nv12 ! tee name=encoder ! queue name=enc_1 max-size-buffers=1 ! vaapih264enc rate-control=2 bitrate=20000 keyframe-period=30 ! fakesink encoder. ! queue name=enc_2 max-size-buffers=1 ! vaapih264enc rate-control=2 bitrate=20000 keyframe-period=30 ! fakesink encoder. ! queue name=enc_3 max-size-buffers=1 ! vaapih264enc rate-control=2 bitrate=20000 keyframe-period=30 ! fakesink Unrepairable overflow! Unrepairable overflow! Unrepairable overflow! Unrepairable overflow! After some time: *** Error in `gst-launch-1.0': double free or corruption (fasttop): 0x00007f5b380011e0 *** Also noticed with vaapivp8enc. To reproduce, run attached script
Created attachment 338534 [details] sample backtrace
The crash happens after 10 to 20 runs on a nuc6i3 and nuc6i5 (right after the boot). Once it has happened, it crashes after 2 (unless you close the ssh session).
Note that it does not always crash that fast, the first time it can take more time (138 loops for instance). There is a leak somewhere. Note that i tested with and without the libva-intel-driver patch from https://bugs.freedesktop.org/show_bug.cgi?id=97872
It is possible for you to compile gstreamer-vaapi with symbols? Because that backtrace is not meaningful :S
Created attachment 338612 [details] sample backtrace (with debug symbols)
Is that better ?
Actually i can't even complete my benchmark on most hardware because of that: https://github.com/UbiCastTeam/gstreamer-benchmarks $ ./run_tests.py -t tests/encoding-h264-nv12.py
Comment on attachment 338612 [details] sample backtrace (with debug symbols) That's happening internally at g_array_append_vals(), when the array is resizing itself with g_realloc. Perhaps we are leaking and the heap gets corrupted.
Could you reproduce the issue ?
A proper stack trace from gdb would be more useful, especially one that includes all active threads (thread apply all bt) :) Perhaps valgrind could also help track down what's wrong here. IIRC there are some environment variables which make glibc abort and/or check harder for malloc inconsistencies.
(In reply to Florent Thiéry from comment #9) > Could you reproduce the issue ? I can test it right now. Busy with other stuff. (In reply to Tim-Philipp Müller from comment #10) > A proper stack trace from gdb would be more useful, especially one that > includes all active threads (thread apply all bt) :) yes, please! :)
> I can test it right now. Busy with other stuff. * can't
I don't really understand why, but while i can have the thing happen with my test script from python, i can't seem to reproduce with gst-launch alone. Crashes: gst-master gdb /usr/bin/python thread apply all bt run test.py I get the attached backtrace2.log Does not crash (trying to loop gst-launch alone) but freezes: gst-master gdb ./gstreamer/tools/.libs/gst-launch-1.0 thread apply all bt set pagination off break exit commands run end run filesrc location=/tmp/buf.raw blocksize=3110400 ! videoparse width=1920 height=1080 framerate=30 format=nv12 ! tee name=encoder ! queue max-size-buffers=1 ! vaapih264enc rate-control=2 bitrate=20000 keyframe-period=30 ! fakesink encoder. ! queue max-size-buffers=1 ! vaapih264enc rate-control=2 bitrate=20000 keyframe-period=30 ! fakesink encoder. ! queue max-size-buffers=1 ! vaapih264enc rate-control=2 bitrate=20000 keyframe-period=30 ! fakesink encoder. ! queue max-size-buffers=1 ! vaapih264enc rate-control=2 bitrate=20000 keyframe-period=30 ! fakesink encoder. ! queue max-size-buffers=1 ! vaapih264enc rate-control=2 bitrate=20000 keyframe-period=30 ! fakesink encoder. ! queue max-size-buffers=1 ! vaapih264enc rate-control=2 bitrate=20000 keyframe-period=30 ! fakesink
Created attachment 338722 [details] gst-launch backtrace
Not entirely sure it's the same thing, but my 2nd method did crash (see 'gst-launch backtrace)
Created attachment 338723 [details] gst-launch backtrace
(gdb) bt
+ Trace 236779
Created attachment 339254 [details] test program
Using systemd's coredumps makes it possible to debug the python execve-spawned gst-launch, and confirms it is the same issue (leak in ensure_profiles): $ coredumpctl list | tail -n 1 lun. 2016-11-07 16:31:29 CET 3708 1000 1000 6 * /home/fthiery/gst/master/gstreamer/tools/.libs/lt-gst-launch-1.0 $ coredumpctl gdb 3708 (gdb) bt
+ Trace 236839
Slightly different on GT3e (SKL) (gdb) bt
+ Trace 236840
Created attachment 339305 [details] [review] display: take display lock during ensuring profiles/formats Since vaapi plugins are sharing VaapiDisplay, should be careful to handle variables, which could be accessed by others. This patch fixes crash, which can happen when using multiple vaapi encoder/decoder in one pipeline. @Florent, could you test with this patch? I confirmed it's ok on my laptop, but need more test. @Victor, does this patch make sense? I'm wondering because of a bit performance degradation.
(In reply to Hyunjun Ko from comment #21) > @Victor, does this patch make sense? I'm wondering because of a bit > performance degradation. Actually it does not, since I cannot replicate the error in my HSW so far.
(In reply to Hyunjun Ko from comment #21) > This patch fixes crash, which can happen when using multiple vaapi > encoder/decoder in one pipeline. > > @Florent, could you test with this patch? I think it works (tested on 1.10 and master on SKL GT2) because i can finally run a lot of consecutive tests without the crash, i don't see any regression so far. Seems to also work on GT3e
Here are my results with and without this patch on SKL/GT2 (i3-6100U) of vaapih264enc 1080p encoding performance (NV12); to me there is no major performance regression. +-----------+---------+----------+------------+-----------+---------+ | tune | pattern | encoders | fps before | fps after | delta | +-----------+---------+----------+------------+-----------+---------+ | none | black | 1 | 107 | 107 | 0,00% | | low-power | black | 1 | 196 | 176 | -10,20% | | none | black | 2 | 133 | 138 | 3,76% | | low-power | black | 2 | 198 | 195 | -1,52% | | none | black | 3 | 141 | 141 | 0,00% | | low-power | black | 3 | 198 | 197 | -0,51% | | none | smpte | 1 | 101 | 98 | -2,97% | | low-power | smpte | 1 | 186 | 187 | 0,54% | | none | smpte | 2 | 128 | 129 | 0,78% | | low-power | smpte | 2 | 186 | 190 | 2,15% | | none | smpte | 3 | 131 | 133 | 1,53% | | low-power | smpte | 3 | 189 | 190 | 0,53% | | none | snow | 1 | 103 | 104 | 0,97% | | low-power | snow | 1 | 119 | 118 | -0,84% | | none | snow | 2 | 130 | 131 | 0,77% | | low-power | snow | 2 | 126 | 126 | 0,00% | | none | snow | 3 | 137 | 136 | -0,73% | | low-power | snow | 3 | 121 | 120 | -0,83% | +-----------+---------+----------+------------+-----------+---------+ Here are the results on SKL/GT3e (i5-6260U) +-----------+---------+----------+------------+-----------+--------+ | tune | pattern | encoders | fps before | fps after | delta | +-----------+---------+----------+------------+-----------+--------+ | none | black | 1 | 115 | 113 | -1,74% | | low-power | black | 1 | 269 | 266 | -1,12% | | none | black | 2 | 130 | 134 | 3,08% | | low-power | black | 2 | 230 | 222 | -3,48% | | none | black | 3 | 138 | 137 | -0,72% | | low-power | black | 3 | 220 | 236 | 7,27% | | none | smpte | 1 | 105 | 107 | 1,90% | | low-power | smpte | 1 | 235 | 237 | 0,85% | | none | smpte | 2 | 121 | 127 | 4,96% | | low-power | smpte | 2 | 235 | 239 | 1,70% | | none | smpte | 3 | 127 | 129 | 1,57% | | low-power | smpte | 3 | 203 | 209 | 2,96% | | none | snow | 1 | 109 | 110 | 0,92% | | low-power | snow | 1 | 110 | 108 | -1,82% | | none | snow | 2 | 128 | 129 | 0,78% | | low-power | snow | 2 | 112 | 112 | 0,00% | | none | snow | 3 | 131 | 132 | 0,76% | | low-power | snow | 3 | 110 | 110 | 0,00% | +-----------+---------+----------+------------+-----------+--------+ Note that this is a single run per test because of the leak. I think this patch does not harm performance (at least not on SKL)
Btw i reran the test on GT2 for the -10% line (line 2 of GT2), and it went back to normal (196 fps), so that was a fluctuation due to not running several runs.
Perhaps we should call ensure_profiles() as soon as possible, rather of a lazy evaluation, thus we could avoid the mutexes in that function.
(In reply to Víctor Manuel Jáquez Leal from comment #26) > Perhaps we should call ensure_profiles() as soon as possible, rather of a > lazy evaluation, thus we could avoid the mutexes in that function. What do you think about calling ensure_profiles/image_formats during creation of vaapi display?
(In reply to Hyunjun Ko from comment #27) > (In reply to Víctor Manuel Jáquez Leal from comment #26) > > Perhaps we should call ensure_profiles() as soon as possible, rather of a > > lazy evaluation, thus we could avoid the mutexes in that function. > > What do you think about calling ensure_profiles/image_formats during > creation of vaapi display? Noup. My idea is to keep that in the encoders, which are the ones that show that behavior. But if that is not possible, and we ought keep this mutex-based solution, I would change the patch renaming the function ensure_profile() to ensure_profile_unlock() and create a new function named ensure_profile() with a single all to the unlock version inside a LOCK/UNLOCK pair.
(In reply to Víctor Manuel Jáquez Leal from comment #28) > Noup. My idea is to keep that in the encoders, which are the ones that show > that behavior. > The function gst_vaapiencode_open is the earliest place, but not proper place logically, no api to call in this stage either. IMO, start vmethod is proper place, it's called one by one from base encoder even though there are multiple encoders.
Created attachment 340959 [details] [review] vaapiencode: call ensure_encoder at start Currently, specific encoder is created during set_format. This might lead to race condition when creating profiles with multiple encoders This patch moves ensure_encoder to start method to ensure avoding this race condition.
(In reply to Hyunjun Ko from comment #30) > Created attachment 340959 [details] [review] [review] > vaapiencode: call ensure_encoder at start > > Currently, specific encoder is created during set_format. > This might lead to race condition when creating profiles with > multiple encoders > > This patch moves ensure_encoder to start method to ensure > avoding this race condition. If this would be acceptable, I think need to implement stop vmethod also.
Comment on attachment 339305 [details] [review] display: take display lock during ensuring profiles/formats Let's figure out if we could fix this without need of adding a mutex
@Florent: could you run your tests with this new patch rather than the old one, please??? O:)
looks good to me (both) -- keep in mind this is a single run -- on average -0.51% regression for patch1, and +0.08% on patch2. Looks like patch 2 is slightly better indeed. +-----------+--------+--------------+--------------+--------------+ | Tune | Sample | fps (before) | fps (patch1) | fps (patch2) | +-----------+--------+--------------+--------------+--------------+ | none | black | 114 | 114 | 117 | | none | black | 141 | 138 | 140 | | none | black | 146 | 145 | 144 | | low-power | black | 187 | 182 | 200 | | low-power | black | 204 | 206 | 200 | | low-power | black | 203 | 203 | 202 | | none | smpte | 104 | 104 | 104 | | none | smpte | 131 | 131 | 131 | | none | smpte | 135 | 134 | 134 | | low-power | smpte | 196 | 194 | 193 | | low-power | smpte | 200 | 198 | 191 | | low-power | smpte | 194 | 194 | 195 | | none | snow | 107 | 109 | 109 | | none | snow | 137 | 136 | 138 | | none | snow | 140 | 140 | 139 | | low-power | snow | 128 | 124 | 128 | | low-power | snow | 129 | 129 | 130 | | low-power | snow | 124 | 124 | 124 | +-----------+--------+--------------+--------------+--------------+
PS: this was GT2 SKL
(In reply to Florent Thiéry from comment #34) > looks good to me (both) -- keep in mind this is a single run -- on average > -0.51% regression for patch1, and +0.08% on patch2. Looks like patch 2 is > slightly better indeed. Cool. Pushing then. Thanks.