After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 694405 - souphttpsrc: improve performance / efficiency
souphttpsrc: improve performance / efficiency
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal enhancement
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2013-02-22 02:22 UTC by zhangyanping
Modified: 2017-11-10 09:34 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description zhangyanping 2013-02-22 02:22:41 UTC
Hello everyone,

    I am using souphttpsrc to download files from http servers and then play. 


But I found that the speed downloading is low comparing to other http tools.
---------------------
So I did a test for the download speed.

My device hardware capability:

Arm11 Freq  804 MHz
DDR   Freq  498 MHz
------------------------
(1)
gst-launch souphttpsrc location=http://10.209.162.77/yt_aware.mp4 ! fakesink

The soup version is 2.4.

The downloading speed is about 16Mb, and the cpu usage is almost 100%.
------------------------
(2)Use curl
curl -o /dev/null http://10.209.162.77/yt_war.mp4

The curl version is 7.21.7.

The downloading speed is about 55Mb, and the cpu usage is the same almost 100%.

---------------------------------
I think the cpu usage diffs too much.  

I can not play HD video on my device because soup cost too much cpu. This is not a good news for gstreamer.

Would you give me a solution on this? Is there any other http source plugin in gstreamer? Or how to optimize the souphttpsrc ?

Thanks. 

Best Regards.
Comment 1 zhangyanping 2013-02-22 02:55:34 UTC
Use wget to test it:

wget http://10.209.162.77/yt_war.mp4 -O /dev/null

CPU usage: 100%
Download speed: 54Mb

This is the same as curl test.
Comment 2 Tim-Philipp Müller 2013-02-22 14:08:35 UTC
You should not be using GStreamer 0.11.x for anything any more. It was a short-lived development branch with unstable API, before it was made stable and turned into 1.0.0. You should be using and testing GStreamer 1.x instead.

I'm not sure why souphttpsrc would perform so badly, have you profiled it to see where it takes up cycles? (Maybe some unnecessary memcpy()s, or unaligned memcpy()s?)

Many people are playing HD video over http on embedded devices just fine, so it's unlikely to be a general problem.

You could write your own http source plugin based on libcurl and compare the performance (I know others who have done that, but don't know if the code is available anywhere).

Setting to NEEDINFO for re-testing with 1.x and profiling results.
Comment 3 zhangyanping 2013-02-25 02:04:01 UTC
Hello Müller,

     Did you mean this someone write a curl http src?
     https://bugzilla.gnome.org/show_bug.cgi?id=558450

     I will move to 1.0.x recently.  If I play a video of 10Mbps, it is OK. But if I want to play a video of 15Mbps, it will blocked because the cpu is too high. I think most people do not test 15Mbps. Most video web site such as youtute, the average video bitrate is 2Mbps ~ 3Mbps. So nobody feed back this information.

    Now I have a memory optimizing problem to deal. When I finish it, I will move to 1.0.x. Then I will give you the test result. Thanks for your replay.
Comment 4 David Schleef 2013-02-25 20:33:49 UTC
I know that there are some inefficiencies moving data between libsoup and gstreamer, specifically having to memcpy buffers.  (Libsoup doesn't have the API to do this well.)  I'd be happy to work on any issues here, either in libsoup or in gstreamer, or both.

However, before doing that, I'd like to see good profiling data (i.e., from oprofile, with backtraces) showing where the CPU is being used, with both recent gstreamer and libsoup.  Profiling on x86-64 is fine, doesn't need to be on the embedded device.
Comment 5 Tim-Philipp Müller 2013-02-25 20:52:06 UTC
I thought the chunk allocator stuff was to avoid exactly that. Apparently there is suitable new API in libsoup now (also see bug #693911).

(For what it's worth, and going slightly off-topic, I have had some dealings with a MIPS-based system where we ran into exactly the same problem, only with a custom curl-based curlhttpsrc. The custom curlhttpsrc was consistently slower , i.e. delivered less throughput, than the curl command line utility, which affected high-bitrate playback, but it only became an issue at considerably higher bitrates. I am not sure if anyone ever fully got to the bottom of it, but I seem to remember that there were some issues with memory throughput affected by unaligned memory accesses. In that scenario there were some memcpys we couldn't avoid though, and queue2 was also in the mix, which also necessitated extra possibly unaligned memcpys).
Comment 6 Nicolas Dufresne (ndufresne) 2013-02-25 21:00:52 UTC
Note that soup API is sufficient to prevent pushing memory unaligned buffer into GStreamer, though on X86 it's not going to be as disastrous as on MIPS where the kernel get called on every unaligned access.
Comment 7 zhangyanping 2013-02-26 09:48:23 UTC
Hello everyone,

        I test in 1.0.5, the result is the same.

       The blocksize of souphttpsrc is default 4096Byte. The min memory page of my device is also 4096. So I don't think there are any memory unaligned problem.

       And I add logs to print how much time that gst_soup_http_src_create cost.

      When pipeline stable running, 420 times "gst_soup_http_src_create" was called in one second. 

     420*4096*8/1024/1024=13.125Mbps

    So the bitrate is 13.125Mbps. 

    And in the 420 times, 355 times spent 0 ms, and the rest 65 times spend 10ms at every calling. So 10ms*65=650ms. I think this is what cost the cpu.
   
    I use gdb to trace it. This is the call stack.

=====================================
  • #0 read_from_network
    at soup-socket.c line 1371
  • #1 soup_socket_read
    at soup-socket.c line 1456
  • #2 read_body_chunk
    at soup-message-io.c line 492
  • #3 io_read
    at soup-message-io.c line 989
  • #4 io_unpause_internal
    at soup-message-io.c line 1214
  • #5 g_idle_dispatch
    from /app/qtBrowser/lib/libglib-2.0.so.0
  • #6 g_main_context_dispatch
    from /app/qtBrowser/lib/libglib-2.0.so.0
  • #7 g_main_context_iterate
    from /app/qtBrowser/lib/libglib-2.0.so.0
  • #8 g_main_loop_run
    from /app/qtBrowser/lib/libglib-2.0.so.0
  • #9 gst_soup_http_src_create
    at gstsouphttpsrc.c line 1291
  • #10 gst_push_src_create
    at gstpushsrc.c line 132
  • #11 gst_base_src_get_range
    at gstbasesrc.c line 2400
  • #12 gst_base_src_loop
    at gstbasesrc.c line 2640
  • #13 gst_task_func
    at gsttask.c line 316
  • #14 default_func
    at gsttaskpool.c line 70
  • #15 g_thread_pool_thread_proxy
    from /app/qtBrowser/lib/libglib-2.0.so.0
  • #16 g_thread_proxy
    from /app/qtBrowser/lib/libglib-2.0.so.0
  • #17 start_thread
    from /lib/libpthread.so.0


=====================================
1.
I found that g_main_loop_run will be called in gst_push_src_create. I think this will cost some time.

2. read_body_chunk will call priv->chunk_allocator (msg, io->read_length, priv->chunk_allocator_data);
   chunk_allocator is implement by gst_soup_http_src_chunk_allocator.

So I don't think chunk allocator is the bottleneck.
Comment 8 Tim-Philipp Müller 2013-03-13 10:37:15 UTC
Thanks, that looks useful. Hopefully the new soup api additions allow us to do things in a better, more efficient way.
Comment 9 zhangyanping 2013-06-09 02:11:43 UTC
Hello all,
    
        I tested this problem again. My gstreamer version is 1.0.5.

        I found that the argment "blocksize" is very important for this problem.

        I use this command to test:

gst-launch-1.0 souphttpsrc location=http://10.209.162.77/yt_aware.mp4 blocksize=4096 ! fakesink

When 
blocksize = 4096, average bitrate is 8Mbps, CPU nearly 100%
blocksize = 8192, average bitrate is 27M, CPU nearly 100%
blocksize = 16384, average bitrate is 36M, CPU nearly 100%
blocksize = 32768 , average bitrate is 44M, CPU nearly 100%
blocksize = 65536 , average bitrate is 47M, CPU nearly 100%
blocksize = 131072, average bitrate is 47M, CPU nearly 100%

So I changed the default blocksize for souphttpsrc to 64K. I think there are some looping cost the time. For 64Kb data, if blocksize is 64Kb, just one loop can push it to sink. If blocksize is 4Kb, there will be 16 times loop.

The souphttpsrc average bitrate(47M) is less than curl(54M). There may be some efficence loss during gstreamer and libsoup. But I don't care this. 47M is enough.
So I think this issue can be closed.

Thanks.
Comment 10 Tim-Philipp Müller 2013-06-09 11:09:35 UTC
Thanks for posting your findings. Let's keep this open for now though. I think we should figure out a way to make this work better automatically by default. There's a latency/throughput trade-off here I guess, but maybe we can do something clever like estimate the consumption bitrate and adjust the blocksize based on that or so.
Comment 11 Edward Hervey 2014-01-30 10:26:18 UTC
Some thoughts:

* Have a maximum block size (instead of a fixed one)
* Have a deadline (or latency?) property
* Instead of getting a fixed amount, grab as much as possible within those two constraints and push out the resulting buffer.

With something like max-blocksize=64k and deadline=100ms:
* if it gets 64k in less than 100ms => push it
* if it gets less than 64k in 100ms => push it

Or maybe we just want to automatically adjust the block size based on observed bitrate ?
Comment 12 Edward Hervey 2017-11-10 09:34:22 UTC
We have adaptive blocksize in souphttpsrc nowadays. Closing this bug. If this is still an issue with current gstreamer, please reopen.