Bug 740149 – rtspsrc, rtpjitterbuffer: cpu optimization

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 740149 - rtspsrc, rtpjitterbuffer: cpu optimization


Summary:	rtspsrc, rtpjitterbuffer: cpu optimization


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	1.4.4
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:	751636
Blocks:

Reported:	2014-11-14 22:25 UTC by Nicola
Modified:	2018-11-03 14:56 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
basesrc: workaround to save CPU in udpsrc (901 bytes, patch) 2014-11-14 22:25 UTC, Nicola	rejected	Details \| Review
perf report rtspsrc ! fakesink (44.41 KB, text/x-log) 2014-11-25 20:07 UTC, Eloi BAIL		Details
top -H (1.49 KB, text/x-log) 2014-11-25 20:09 UTC, Eloi BAIL		Details

Description Nicola 2014-11-14 22:25:12 UTC

Created attachment 290741 [details] [review]
basesrc: workaround to save CPU in udpsrc

This bug is to keep track of this discussion and the patch

http://gstreamer-devel.966125.n4.nabble.com/rtspsrc-cpu-optimization-td4668972.html

Comment 1 Tim-Philipp Müller 2014-11-14 23:08:10 UTC

So question is: why does this help and where are the cpu cycles actually spent and why? 4-5% sounds way too much, unless it just leads to fewer packets being captured and processed in the end. Only sounds plausible if something is busy-looping somewhere IMHO. It would be good if this could be narrowed down to something smaller even, just involving udpsrc for example. An strace might also be instructive.

Unlikely, but perhaps bug #732439 is relevant, which avoids unnecessary poll/select on high-throughput sockets (ie. where chances are high that a packet is available to read).

Comment 2 Nicola 2014-11-15 00:03:40 UTC

I think this is a general regression in 1.0 for example if I compare a simple rtpsrc ! fakesink pipeline I see on my laptop 1.5% cpu usage with 1.0 and 0.5% with 0.10

Comment 3 Nicolas Dufresne (ndufresne) 2014-11-16 15:14:08 UTC

Review of attachment 290741 [details] [review]:

::: libs/gst/base/gstbasesrc.c
@@ +2392,3 @@
+  /* FIXME: workaround to save some CPU in udpsrc
+   * sleep 1ms before reading sockets. */
+  usleep (1000);

No, clearly not an acceptable fix. Though, this may be a sign that accumulating more data on the socket reduce CPU usage (could be double parsing or other common bugs). This small CPU increase may also mean latency is actually better in 1.0+. Better measure both, and see if it's worth calling this a regression.

Comment 4 Nicola 2014-11-16 17:26:27 UTC

I agree that the fix is not acceptable, regarding the latency even at 30 fps a frame is sended every 33ms so a sleep of 1 ms is not noticeable

Comment 5 Tim-Philipp Müller 2014-11-16 17:43:10 UTC

At high bitrates a frame might be made up of hundreds if not thousands of packets (so multiply the sleep accordingly). In any case, I would recommend focusing on investigating the actual underlying issue.

Comment 6 Eloi BAIL 2014-11-19 17:16:24 UTC

Thanks for your comments. The use of sleep is definitely not a solution. It was used for me to understand how we could save some CPU in udpsrc thread.
I added rtpstats module. Having lot of camera (8) lead to packet dropped using the sleep even increasing buffer sizes.So definitely this is not a solution.

Using same cameras we used libav module. On each camera it consumes around 5% while it consumes around 12-15% with gstreamer. It also works fine with 8 simultaneous cameras. It proves  that some improvement regarding CPU consumption in gstreamer.

I have to admit that I have issues to investigate gstreamer cpu issues. Profilers so far did not point to something relevant. As udpsrc thread is the major consumer, I should investigate it. Any help or advice would be helpful.

We have to find a solution soon so it looks like we will use libav if we don't have significant improvement.

Comment 7 Nicolas Dufresne (ndufresne) 2014-11-19 17:36:26 UTC

For profiling, I would like to suggest "per top -g". It's live and intuitive, but it's a matter of perception. You can also do "perf record <command>", and  then use "perf report perf.data" to analyze afterward.

You could experiment with the rtspsrc properties. Few that may make a difference are:

"udp-buffer-size" This will play on the amount of data the socket can accumulate. Not sure of the impact on CPU, but clearly will help compensate for high CPU load.

"do-rtcp" / "do-rtsp-keep-alive" Not doing RTCP may of course help reducing CPU (hopefully this isn't why libav RTSP stack is faster)

"rtp-blocksize" The smaller the buffers, that higher the CPU, I think this is just a suggestion toward the server, don't put to much hope there

"buffer-mode" Only make sense if profiling points to buffering code

Anything that make sense base on what exact part is using CPU. Of course you need debug symbols of everything involved (kernel vmlinuz is highly recommended, since CPU utilization can be due to hammering the socket API).

Also, if you attached profiling data here, you are more likely to get helped in this investigation of potential performance bottleneck. That's what we mean by NEEDINFO. If you prefer to use another stack, that would be your choice, please close that bug in that case.

Comment 8 Eloi BAIL 2014-11-25 20:07:32 UTC

Created attachment 291492 [details]
perf report rtspsrc ! fakesink

Comment 9 Eloi BAIL 2014-11-25 20:09:11 UTC

Created attachment 291493 [details]
top -H

Comment 10 Eloi BAIL 2014-11-25 20:09:24 UTC

Thanks for your advices and sorry to reply lately. I actually already posted
perf stats on the mailing list. But it did not include kernel symbols.

Please find in attachment perf report output. I used it with vmlinux debug
symbol. But kernel symbol such as 0x7ffff936 can not be resolved. This symbol
is not listed in kernel System.map file. I try to figure out what it could be.

Also I attached top output with pthread name.
I already played with rtspsrc properties without CPU improvement.

According to the code, the main difference with libav is :
- minimal jitterbuffer (no reordering...)
- no timestamp rewrites (use of RTP timestamps) leading to bigger gap in video
recorded file.
- minimal RTCP implementation

As we manage to have it working with libav, it proves that CPU performances are
not due to hammering the socket API.

My gstreamer pipeline is live tainted as it was designed to deliver video frame
to video sink without glitching. As I am using files, I could have more latency
in video frame delivery.

Thanks for you help

Comment 11 Nicolas Dufresne (ndufresne) 2014-11-25 20:15:17 UTC

From perf, you're bottleneck is "unknown"
27.93%  gst-launch-1.0  [unknown]                        [k] 0x7ffff936 

I think it may be in kernel, have you installed kernel debug info ? Also, install debug info for Gst/GLib and libc. You can run "per top -g" to get call graph. That will give you the origin of the calls.

Comment 12 Tim-Philipp Müller 2014-11-25 20:20:56 UTC

I had a quick look at this the other day and found 1.x take about 2-3% points more CPU than the 0.10 version (for the exact same stream at the same time). Not everything else equal, but a good first approximation. rtpjitterbuffer seemed to be the culprit at first glance.

Comment 13 Eloi BAIL 2014-11-25 20:48:39 UTC

(In reply to comment #11)
> From perf, you're bottleneck is "unknown"
> 27.93%  gst-launch-1.0  [unknown]                        [k] 0x7ffff936 
> 
> I think it may be in kernel, have you installed kernel debug info ? Also,
> install debug info for Gst/GLib and libc. You can run "per top -g" to get call
> graph. That will give you the origin of the calls.

Yep. My vmlinux file is 62mo and perform nm on it list symbols. 0x7ffff936 is not part of my System.map file. I used perf report -k <path to vmlinux>

Comment 14 Eloi BAIL 2014-11-27 22:33:52 UTC

(In reply to comment #12)
> I had a quick look at this the other day and found 1.x take about 2-3% points
> more CPU than the 0.10 version (for the exact same stream at the same time).
> Not everything else equal, but a good first approximation. rtpjitterbuffer
> seemed to be the culprit at first glance.

I tested 0.10. I see improvement on udpsrc (from 8% to 1.0 to 5% on 0.10) but jitterbuffer consumme a bit more (from 3% to 1.0 to 5% on 0.10). So in total it is similar. I tried to reduce jitterbuffer on 0.10 for test removing gtask and only pushing buffer in the chain function. But then I saw that udpsrc increased from 5% to 9% ! I do not figure out how gtasks are scheduled. I need to read the code but a quick explanation would be appreciated.

Also I played on pipeline sink elements (filesink) using sync=true and max-bitrate, I see some CPU improvement. Is it this element that drive  scheduling ?
As I am not in a live rendering such as ximagesink, would it be good to look there ?

Comment 15 GStreamer system administrator 2018-11-03 14:56:14 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/143.