GNOME Bugzilla – Bug 694405
souphttpsrc: improve performance / efficiency
Last modified: 2017-11-10 09:34:22 UTC
Hello everyone, I am using souphttpsrc to download files from http servers and then play. But I found that the speed downloading is low comparing to other http tools. --------------------- So I did a test for the download speed. My device hardware capability: Arm11 Freq 804 MHz DDR Freq 498 MHz ------------------------ (1) gst-launch souphttpsrc location=http://10.209.162.77/yt_aware.mp4 ! fakesink The soup version is 2.4. The downloading speed is about 16Mb, and the cpu usage is almost 100%. ------------------------ (2)Use curl curl -o /dev/null http://10.209.162.77/yt_war.mp4 The curl version is 7.21.7. The downloading speed is about 55Mb, and the cpu usage is the same almost 100%. --------------------------------- I think the cpu usage diffs too much. I can not play HD video on my device because soup cost too much cpu. This is not a good news for gstreamer. Would you give me a solution on this? Is there any other http source plugin in gstreamer? Or how to optimize the souphttpsrc ? Thanks. Best Regards.
Use wget to test it: wget http://10.209.162.77/yt_war.mp4 -O /dev/null CPU usage: 100% Download speed: 54Mb This is the same as curl test.
You should not be using GStreamer 0.11.x for anything any more. It was a short-lived development branch with unstable API, before it was made stable and turned into 1.0.0. You should be using and testing GStreamer 1.x instead. I'm not sure why souphttpsrc would perform so badly, have you profiled it to see where it takes up cycles? (Maybe some unnecessary memcpy()s, or unaligned memcpy()s?) Many people are playing HD video over http on embedded devices just fine, so it's unlikely to be a general problem. You could write your own http source plugin based on libcurl and compare the performance (I know others who have done that, but don't know if the code is available anywhere). Setting to NEEDINFO for re-testing with 1.x and profiling results.
Hello Müller, Did you mean this someone write a curl http src? https://bugzilla.gnome.org/show_bug.cgi?id=558450 I will move to 1.0.x recently. If I play a video of 10Mbps, it is OK. But if I want to play a video of 15Mbps, it will blocked because the cpu is too high. I think most people do not test 15Mbps. Most video web site such as youtute, the average video bitrate is 2Mbps ~ 3Mbps. So nobody feed back this information. Now I have a memory optimizing problem to deal. When I finish it, I will move to 1.0.x. Then I will give you the test result. Thanks for your replay.
I know that there are some inefficiencies moving data between libsoup and gstreamer, specifically having to memcpy buffers. (Libsoup doesn't have the API to do this well.) I'd be happy to work on any issues here, either in libsoup or in gstreamer, or both. However, before doing that, I'd like to see good profiling data (i.e., from oprofile, with backtraces) showing where the CPU is being used, with both recent gstreamer and libsoup. Profiling on x86-64 is fine, doesn't need to be on the embedded device.
I thought the chunk allocator stuff was to avoid exactly that. Apparently there is suitable new API in libsoup now (also see bug #693911). (For what it's worth, and going slightly off-topic, I have had some dealings with a MIPS-based system where we ran into exactly the same problem, only with a custom curl-based curlhttpsrc. The custom curlhttpsrc was consistently slower , i.e. delivered less throughput, than the curl command line utility, which affected high-bitrate playback, but it only became an issue at considerably higher bitrates. I am not sure if anyone ever fully got to the bottom of it, but I seem to remember that there were some issues with memory throughput affected by unaligned memory accesses. In that scenario there were some memcpys we couldn't avoid though, and queue2 was also in the mix, which also necessitated extra possibly unaligned memcpys).
Note that soup API is sufficient to prevent pushing memory unaligned buffer into GStreamer, though on X86 it's not going to be as disastrous as on MIPS where the kernel get called on every unaligned access.
Hello everyone, I test in 1.0.5, the result is the same. The blocksize of souphttpsrc is default 4096Byte. The min memory page of my device is also 4096. So I don't think there are any memory unaligned problem. And I add logs to print how much time that gst_soup_http_src_create cost. When pipeline stable running, 420 times "gst_soup_http_src_create" was called in one second. 420*4096*8/1024/1024=13.125Mbps So the bitrate is 13.125Mbps. And in the 420 times, 355 times spent 0 ms, and the rest 65 times spend 10ms at every calling. So 10ms*65=650ms. I think this is what cost the cpu. I use gdb to trace it. This is the call stack. =====================================
+ Trace 231570
===================================== 1. I found that g_main_loop_run will be called in gst_push_src_create. I think this will cost some time. 2. read_body_chunk will call priv->chunk_allocator (msg, io->read_length, priv->chunk_allocator_data); chunk_allocator is implement by gst_soup_http_src_chunk_allocator. So I don't think chunk allocator is the bottleneck.
Thanks, that looks useful. Hopefully the new soup api additions allow us to do things in a better, more efficient way.
Hello all, I tested this problem again. My gstreamer version is 1.0.5. I found that the argment "blocksize" is very important for this problem. I use this command to test: gst-launch-1.0 souphttpsrc location=http://10.209.162.77/yt_aware.mp4 blocksize=4096 ! fakesink When blocksize = 4096, average bitrate is 8Mbps, CPU nearly 100% blocksize = 8192, average bitrate is 27M, CPU nearly 100% blocksize = 16384, average bitrate is 36M, CPU nearly 100% blocksize = 32768 , average bitrate is 44M, CPU nearly 100% blocksize = 65536 , average bitrate is 47M, CPU nearly 100% blocksize = 131072, average bitrate is 47M, CPU nearly 100% So I changed the default blocksize for souphttpsrc to 64K. I think there are some looping cost the time. For 64Kb data, if blocksize is 64Kb, just one loop can push it to sink. If blocksize is 4Kb, there will be 16 times loop. The souphttpsrc average bitrate(47M) is less than curl(54M). There may be some efficence loss during gstreamer and libsoup. But I don't care this. 47M is enough. So I think this issue can be closed. Thanks.
Thanks for posting your findings. Let's keep this open for now though. I think we should figure out a way to make this work better automatically by default. There's a latency/throughput trade-off here I guess, but maybe we can do something clever like estimate the consumption bitrate and adjust the blocksize based on that or so.
Some thoughts: * Have a maximum block size (instead of a fixed one) * Have a deadline (or latency?) property * Instead of getting a fixed amount, grab as much as possible within those two constraints and push out the resulting buffer. With something like max-blocksize=64k and deadline=100ms: * if it gets 64k in less than 100ms => push it * if it gets less than 64k in 100ms => push it Or maybe we just want to automatically adjust the block size based on observed bitrate ?
We have adaptive blocksize in souphttpsrc nowadays. Closing this bug. If this is still an issue with current gstreamer, please reopen.