GNOME Bugzilla – Bug 740149
rtspsrc, rtpjitterbuffer: cpu optimization
Last modified: 2018-11-03 14:56:14 UTC
Created attachment 290741 [details] [review] basesrc: workaround to save CPU in udpsrc This bug is to keep track of this discussion and the patch http://gstreamer-devel.966125.n4.nabble.com/rtspsrc-cpu-optimization-td4668972.html
So question is: why does this help and where are the cpu cycles actually spent and why? 4-5% sounds way too much, unless it just leads to fewer packets being captured and processed in the end. Only sounds plausible if something is busy-looping somewhere IMHO. It would be good if this could be narrowed down to something smaller even, just involving udpsrc for example. An strace might also be instructive. Unlikely, but perhaps bug #732439 is relevant, which avoids unnecessary poll/select on high-throughput sockets (ie. where chances are high that a packet is available to read).
I think this is a general regression in 1.0 for example if I compare a simple rtpsrc ! fakesink pipeline I see on my laptop 1.5% cpu usage with 1.0 and 0.5% with 0.10
Review of attachment 290741 [details] [review]: ::: libs/gst/base/gstbasesrc.c @@ +2392,3 @@ + /* FIXME: workaround to save some CPU in udpsrc + * sleep 1ms before reading sockets. */ + usleep (1000); No, clearly not an acceptable fix. Though, this may be a sign that accumulating more data on the socket reduce CPU usage (could be double parsing or other common bugs). This small CPU increase may also mean latency is actually better in 1.0+. Better measure both, and see if it's worth calling this a regression.
I agree that the fix is not acceptable, regarding the latency even at 30 fps a frame is sended every 33ms so a sleep of 1 ms is not noticeable
At high bitrates a frame might be made up of hundreds if not thousands of packets (so multiply the sleep accordingly). In any case, I would recommend focusing on investigating the actual underlying issue.
Thanks for your comments. The use of sleep is definitely not a solution. It was used for me to understand how we could save some CPU in udpsrc thread. I added rtpstats module. Having lot of camera (8) lead to packet dropped using the sleep even increasing buffer sizes.So definitely this is not a solution. Using same cameras we used libav module. On each camera it consumes around 5% while it consumes around 12-15% with gstreamer. It also works fine with 8 simultaneous cameras. It proves that some improvement regarding CPU consumption in gstreamer. I have to admit that I have issues to investigate gstreamer cpu issues. Profilers so far did not point to something relevant. As udpsrc thread is the major consumer, I should investigate it. Any help or advice would be helpful. We have to find a solution soon so it looks like we will use libav if we don't have significant improvement.
For profiling, I would like to suggest "per top -g". It's live and intuitive, but it's a matter of perception. You can also do "perf record <command>", and then use "perf report perf.data" to analyze afterward. You could experiment with the rtspsrc properties. Few that may make a difference are: "udp-buffer-size" This will play on the amount of data the socket can accumulate. Not sure of the impact on CPU, but clearly will help compensate for high CPU load. "do-rtcp" / "do-rtsp-keep-alive" Not doing RTCP may of course help reducing CPU (hopefully this isn't why libav RTSP stack is faster) "rtp-blocksize" The smaller the buffers, that higher the CPU, I think this is just a suggestion toward the server, don't put to much hope there "buffer-mode" Only make sense if profiling points to buffering code Anything that make sense base on what exact part is using CPU. Of course you need debug symbols of everything involved (kernel vmlinuz is highly recommended, since CPU utilization can be due to hammering the socket API). Also, if you attached profiling data here, you are more likely to get helped in this investigation of potential performance bottleneck. That's what we mean by NEEDINFO. If you prefer to use another stack, that would be your choice, please close that bug in that case.
Created attachment 291492 [details] perf report rtspsrc ! fakesink
Created attachment 291493 [details] top -H
Thanks for your advices and sorry to reply lately. I actually already posted perf stats on the mailing list. But it did not include kernel symbols. Please find in attachment perf report output. I used it with vmlinux debug symbol. But kernel symbol such as 0x7ffff936 can not be resolved. This symbol is not listed in kernel System.map file. I try to figure out what it could be. Also I attached top output with pthread name. I already played with rtspsrc properties without CPU improvement. According to the code, the main difference with libav is : - minimal jitterbuffer (no reordering...) - no timestamp rewrites (use of RTP timestamps) leading to bigger gap in video recorded file. - minimal RTCP implementation As we manage to have it working with libav, it proves that CPU performances are not due to hammering the socket API. My gstreamer pipeline is live tainted as it was designed to deliver video frame to video sink without glitching. As I am using files, I could have more latency in video frame delivery. Thanks for you help
From perf, you're bottleneck is "unknown" 27.93% gst-launch-1.0 [unknown] [k] 0x7ffff936 I think it may be in kernel, have you installed kernel debug info ? Also, install debug info for Gst/GLib and libc. You can run "per top -g" to get call graph. That will give you the origin of the calls.
I had a quick look at this the other day and found 1.x take about 2-3% points more CPU than the 0.10 version (for the exact same stream at the same time). Not everything else equal, but a good first approximation. rtpjitterbuffer seemed to be the culprit at first glance.
(In reply to comment #11) > From perf, you're bottleneck is "unknown" > 27.93% gst-launch-1.0 [unknown] [k] 0x7ffff936 > > I think it may be in kernel, have you installed kernel debug info ? Also, > install debug info for Gst/GLib and libc. You can run "per top -g" to get call > graph. That will give you the origin of the calls. Yep. My vmlinux file is 62mo and perform nm on it list symbols. 0x7ffff936 is not part of my System.map file. I used perf report -k <path to vmlinux>
(In reply to comment #12) > I had a quick look at this the other day and found 1.x take about 2-3% points > more CPU than the 0.10 version (for the exact same stream at the same time). > Not everything else equal, but a good first approximation. rtpjitterbuffer > seemed to be the culprit at first glance. I tested 0.10. I see improvement on udpsrc (from 8% to 1.0 to 5% on 0.10) but jitterbuffer consumme a bit more (from 3% to 1.0 to 5% on 0.10). So in total it is similar. I tried to reduce jitterbuffer on 0.10 for test removing gtask and only pushing buffer in the chain function. But then I saw that udpsrc increased from 5% to 9% ! I do not figure out how gtasks are scheduled. I need to read the code but a quick explanation would be appreciated. Also I played on pipeline sink elements (filesink) using sync=true and max-bitrate, I see some CPU improvement. Is it this element that drive scheduling ? As I am not in a live rendering such as ximagesink, would it be good to look there ?
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/143.