After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 702380 - basertppayload: provide a GstBufferPool for output rtp buffers
basertppayload: provide a GstBufferPool for output rtp buffers
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
unspecified
Other Linux
: Normal enhancement
: 1.1.3
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2013-06-16 08:07 UTC by Edward Hervey
Modified: 2013-07-16 19:50 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
adapter: Add function to return buffer composed of multiple memories (6.75 KB, patch)
2013-07-15 20:30 UTC, Olivier Crête
committed Details | Review
rtpbaseaudiopayload: Avoid copying the data (1.75 KB, patch)
2013-07-15 20:31 UTC, Olivier Crête
committed Details | Review

Description Edward Hervey 2013-06-16 08:07:48 UTC
Right now most (all?) payloaders will create new rtp buffers for their output.

The size of those buffers is always limited by the MTU, we could therefore have a convenience GstBufferPool which those payloaders could use, providing pre-allocated buffers of that size, with the proper Meta set on them and therefore avoiding creating/freeing buffers all the time.

A quick glance at profiles shows that creating/freeing all those buffers (instead of re-using them) account for 25-50% of the time spent in payloaders :(
Comment 1 Olivier Crête 2013-06-16 16:41:17 UTC
Most payloaders should just allocate a new header and for the payload, use the GstMemory from the incoming buffer.
Comment 2 Wim Taymans 2013-06-17 08:42:25 UTC
(In reply to comment #1)
> Most payloaders should just allocate a new header and for the payload, use the
> GstMemory from the incoming buffer.

Some do, others don't. One of the patterns is to accumulate data on an adapter and then copy them into the target RTP packet, there's no good adapter api yet to get a block of memory and append it to the rtp packet.

I'm curious what the difference is now between allocating fresh buffers (+memory) and recycling them from a pool..
Comment 3 Wim Taymans 2013-06-17 09:12:14 UTC
Recycling buffers from a pool is about 4 times slower than just allocating and freeing them:

*** total 0:00:01.009697402 - average 0:00:00.000000504  - Done creating 2000000 buffers
*** total 0:00:04.009326025 - average 0:00:00.000002004  - Done creating 2000000 buffers
Comment 4 Wim Taymans 2013-06-17 09:14:47 UTC
(In reply to comment #3)
> Recycling buffers from a pool is about 4 times slower than just allocating and
> freeing them:
> 
> *** total 0:00:01.009697402 - average 0:00:00.000000504  - Done creating
> 2000000 buffers
> *** total 0:00:04.009326025 - average 0:00:00.000002004  - Done creating
> 2000000 buffers

See tests/benchmarks/gstpoolstress.c
Comment 5 Wim Taymans 2013-06-17 09:25:39 UTC
Ok, when making sure we never completely empty the pool (and need to read the socket) we can get somewhat faster (but it jitters, sometimes slower):

*** total 0:00:00.960529176 - average 0:00:00.000000480  - Done creating 2000000 buffers
*** total 0:00:00.671104842 - average 0:00:00.000000335  - Done creating 2000000 buffers
Comment 6 Edward Hervey 2013-06-17 09:48:51 UTC
(In reply to comment #5)
> Ok, when making sure we never completely empty the pool (and need to read the
> socket) we can get somewhat faster (but it jitters, sometimes slower):
> 
> *** total 0:00:00.960529176 - average 0:00:00.000000480  - Done creating
> 2000000 buffers
> *** total 0:00:00.671104842 - average 0:00:00.000000335  - Done creating
> 2000000 buffers

That looks more realistic than the first one (because you were mostly creating buffers all the time instead of re-using, so it obviously adds more overhead).

It does make it 45% faster, which seems on par with the overhead I was seeing when profiling rtpmp2tpay for example.
Comment 7 Wim Taymans 2013-06-17 09:55:22 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Ok, when making sure we never completely empty the pool (and need to read the
> > socket) we can get somewhat faster (but it jitters, sometimes slower):
> > 
> > *** total 0:00:00.960529176 - average 0:00:00.000000480  - Done creating
> > 2000000 buffers
> > *** total 0:00:00.671104842 - average 0:00:00.000000335  - Done creating
> > 2000000 buffers
> 
> That looks more realistic than the first one (because you were mostly creating
> buffers all the time instead of re-using, so it obviously adds more overhead).

No, it was writing and reading from the socket to signal pool full/empty, only one buffer was allocated.

> 
> It does make it 45% faster, which seems on par with the overhead I was seeing
> when profiling rtpmp2tpay for example.

Here's another typical run, now I added (2nd line) simply doing gst_buffer_new() without memory:

*** total 0:00:01.250591428 - average 0:00:00.000000625  - Done creating 2000000 buffers
*** total 0:00:00.397951348 - average 0:00:00.000000198  - Done creating 2000000 buffers
*** total 0:00:01.017563303 - average 0:00:00.000000508  - Done creating 2000000 buffers
Comment 8 Wim Taymans 2013-06-17 10:07:55 UTC
In short, for optimizing allocations, the bufferpool is quite terrible unless you can avoid the never-empty case. We should optimize that case more, it's quite common. I think it's also all the atomic operations that slow it down, maybe it would be faster to use a lock there...

So if you really need it fast, the best option right now, I think, is by simply doing buffer_new() and making methods on adapter to move the memory on the buffer or so.
Comment 9 Sebastian Dröge (slomo) 2013-06-18 11:57:35 UTC
Just moving memories from the adapter to buffers would also prevent the need of copying the input memories into a newly allocated, larger memory... potentially speeding up things even more.
Comment 10 Olivier Crête 2013-07-15 20:30:25 UTC
Created attachment 249238 [details] [review]
adapter: Add function to return buffer composed of multiple  memories

One option here is to have the adapter return a GstBuffer with multiple GstMemory objects in it, see attached patch, I haven't tested the performance.
Comment 11 Olivier Crête 2013-07-15 20:31:04 UTC
Created attachment 249239 [details] [review]
rtpbaseaudiopayload: Avoid copying the data
Comment 12 Sebastian Dröge (slomo) 2013-07-16 06:58:03 UTC
Review of attachment 249238 [details] [review]:

Looks good to me

::: libs/gst/base/gstadapter.c
@@ +804,3 @@
  * This function is potentially more performant than gst_adapter_take()
  * since it can reuse the memory in pushed buffers by subbuffering
+ * or merging. This function will always return a contiguous buffer.

Maybe "always return a buffer with a single memory region"?
Comment 13 Sebastian Dröge (slomo) 2013-07-16 06:59:23 UTC
..and the same for rtpbasevideopayload? :)
Comment 14 Olivier Crête 2013-07-16 14:21:01 UTC
(In reply to comment #13)
> ..and the same for rtpbasevideopayload? :)

No such thing, but I do plan to go over every video payloader as I guess they're the biggest offenders.
Comment 15 Olivier Crête 2013-07-16 19:38:48 UTC
Core:
commit 592049159773d5377d1ce85f322c5751a4fac597
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Mon Jul 15 15:41:44 2013 -0400

    adapter: Add function to return buffer composed of multiple memories
    
    API: gst_adapter_take_fast()

Base:
commit c599c45b584188d48c7469c1743d55c2edc60538
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Mon Jul 15 16:13:11 2013 -0400

    rtpbaseaudiopayload: Avoid copying the data

Good:
commit 54c5a7f69019376c20e5304371209e4e1df37b86
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Tue Jul 16 15:37:49 2013 -0400

    rtp: Use gst_adapter_take_buffer_fast() where possible in RTP payloaders
Comment 16 Olivier Crête 2013-07-16 19:42:45 UTC
Core:
commit 592049159773d5377d1ce85f322c5751a4fac597
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Mon Jul 15 15:41:44 2013 -0400

    adapter: Add function to return buffer composed of multiple memories
    
    API: gst_adapter_take_fast()

Base:
commit c599c45b584188d48c7469c1743d55c2edc60538
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Mon Jul 15 16:13:11 2013 -0400

    rtpbaseaudiopayload: Avoid copying the data

Good:
commit 54c5a7f69019376c20e5304371209e4e1df37b86
Author: Olivier Crête <olivier.crete@collabora.com>
Date:   Tue Jul 16 15:37:49 2013 -0400

    rtp: Use gst_adapter_take_buffer_fast() where possible in RTP payloaders