Bug 772841 – udpsrc: high CPU usage at high packet rate

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 772841 - udpsrc: high CPU usage at high packet rate


Summary:	udpsrc: high CPU usage at high packet rate


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	1.8.3
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:	769994
Blocks:

Reported:	2016-10-13 08:40 UTC by Petr Kulhavy
Modified:	2018-11-03 15:12 UTC

See Also:	769994
GNOME target:	---
GNOME version:	---

Attachments
Callgrind output (37.19 KB, application/x-fluid) 2016-10-13 09:00 UTC, Petr Kulhavy		Details
Optimization to receive messages only in multicast (4.29 KB, patch) 2016-11-12 23:00 UTC, Petr Kulhavy	committed	Details \| Review
Buffer pool patch (11.11 KB, patch) 2016-11-12 23:03 UTC, Petr Kulhavy	none	Details \| Review
Remove redundant saddr unref (916 bytes, patch) 2016-11-14 12:32 UTC, Petr Kulhavy	none	Details \| Review
Remove redundant saddr unref (1.46 KB, patch) 2016-11-14 13:25 UTC, Petr Kulhavy	committed	Details \| Review
Buffer pool patch (13.94 KB, patch) 2016-11-14 16:44 UTC, Petr Kulhavy	none	Details \| Review
GstBufferPool modification (1.21 KB, patch) 2016-11-14 16:46 UTC, Petr Kulhavy	committed	Details \| Review
Buffer pool patch (16.56 KB, patch) 2016-11-15 08:47 UTC, Petr Kulhavy	none	Details \| Review
Optimize GstUdpSrc cache performance (1.81 KB, patch) 2016-11-15 08:49 UTC, Petr Kulhavy	committed	Details \| Review
Process control messages with a callback (9.03 KB, patch) 2016-11-24 11:31 UTC, Petr Kulhavy	needs-work	Details \| Review
benchmark application (2.85 KB, text/x-csrc) 2017-08-29 15:24 UTC, George Kiagiadakis		Details
udpsrc: switch to using a buffer pool (15.49 KB, patch) 2018-03-06 01:27 UTC, Mathieu Duponchelle	none	Details \| Review
udpsrc: switch to using a buffer pool (15.32 KB, patch) 2018-03-06 16:42 UTC, Mathieu Duponchelle	none	Details \| Review
udpsrc: switch to using a buffer pool (15.33 KB, patch) 2018-03-06 17:01 UTC, Mathieu Duponchelle	committed	Details \| Review

Description Petr Kulhavy 2016-10-13 08:40:32 UTC

A simple pipeline:

gst-launch-1.0 udpsrc uri="udp://224.100.1.1:5004" ! fakesink

is causing 30% CPU usage on a Cortex A7 @528MHz at 1000 packets/second

Such a high CPU usage is unacceptable, especially if a C program doing the same - reading from a socket and discarding the data - needs just 3-4% CPU

This use case is crucial for handling low latency audio.

Comment 1 Petr Kulhavy 2016-10-13 08:41:35 UTC

I've also tested the "retrieve-sender-address" patch and disabling the sender address saves about 2%.

Comment 2 Tim-Philipp Müller 2016-10-13 08:46:06 UTC

I have been working on this and hope to land some improvements early in the 1.11 cycle (once 1.10 is out).

Comment 3 Petr Kulhavy 2016-10-13 09:00:10 UTC

Created attachment 337560 [details]
Callgrind output

I've done some profiling with valgrind and it turned out that udpsrc allocates a buffer with every packet. Actually, there are about 8 malloc calls per g_socket_receive_message call.

Comment 4 Petr Kulhavy 2016-10-13 09:06:04 UTC

(In reply to Tim-Philipp Müller from comment #2)
> I have been working on this and hope to land some improvements early in the
> 1.11 cycle (once 1.10 is out).

How much improvement do you expect?

Would you mind to share the patches here? I'm kind of stuck and can't wait as I need to significantly reduce the CPU usage to get my use-case working. So I will be happy to test and possibly optimize further. It would save my time and avoid doubling the effort. Thanks!

Comment 5 Petr Kulhavy 2016-11-12 23:00:48 UTC

Created attachment 339726 [details] [review]
Optimization to receive messages only in multicast

Here is a small optimization for non-multicast traffic. It does not get control messages from g_socket_receive_message() and therefore does not release them.

Comment 6 Petr Kulhavy 2016-11-12 23:03:35 UTC

Created attachment 339727 [details] [review]
Buffer pool patch

Here is my attempt to do the buffer pool. So far with non-configurable buffer size of 1500 (for the usual MTU). However it has the problem that the buffer is freed due to the resize to the actual packet size. Any idea how to avoid this issue and take the full advantage of the buffer pool?

Comment 7 Sebastian Dröge (slomo) 2016-11-14 08:49:50 UTC

(In reply to Petr Kulhavy from comment #5)
> Created attachment 339726 [details] [review] [review]
> Optimization to receive messages only in multicast
> 
> Here is a small optimization for non-multicast traffic. It does not get
> control messages from g_socket_receive_message() and therefore does not
> release them.

See also https://bugzilla.gnome.org/show_bug.cgi?id=769994

Comment 8 Sebastian Dröge (slomo) 2016-11-14 09:06:06 UTC

Comment on attachment 339726 [details] [review]
Optimization to receive messages only in multicast

commit 1fc572d7408c339a922817a1f99579d6228533df
Author: Petr Kulhavy <brain@jikos.cz>
Date:   Sat Nov 12 23:34:23 2016 +0100

    udpsrc: receive control messages only in multicast
    
    Control messages are used only in multicast mode - to detect if the destination
    address is not ours and possibly drop the packet. However in non-multicast
    modes the messages are still allocated and freed even if not used. Therefore
    request control messages from g_socket_receive_message() only in multicast
    mode.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=772841

Comment 9 Sebastian Dröge (slomo) 2016-11-14 09:08:42 UTC

(In reply to Petr Kulhavy from comment #6)
> Created attachment 339727 [details] [review] [review]
> Buffer pool patch
> 
> Here is my attempt to do the buffer pool. So far with non-configurable
> buffer size of 1500 (for the usual MTU). However it has the problem that the
> buffer is freed due to the resize to the actual packet size. Any idea how to
> avoid this issue and take the full advantage of the buffer pool?

The buffer pool should probably resize the buffer back (if it can) and check the memory maxsize instead of the size.

Comment 10 Petr Kulhavy 2016-11-14 10:03:31 UTC

(In reply to Sebastian Dröge (slomo) from comment #9)
> (In reply to Petr Kulhavy from comment #6)
> > Created attachment 339727 [details] [review] [review] [review]
> > Buffer pool patch
> > 
> > Here is my attempt to do the buffer pool. So far with non-configurable
> > buffer size of 1500 (for the usual MTU). However it has the problem that the
> > buffer is freed due to the resize to the actual packet size. Any idea how to
> > avoid this issue and take the full advantage of the buffer pool?
> 
> The buffer pool should probably resize the buffer back (if it can) and check
> the memory maxsize instead of the size.

That is what I have actually done in my local copy. Was there any particular reason for releasing it and not resizing it back?

Can the maxsize change? How?

Comment 11 Sebastian Dröge (slomo) 2016-11-14 10:10:47 UTC

(In reply to Petr Kulhavy from comment #10)
> (In reply to Sebastian Dröge (slomo) from comment #9)
> > (In reply to Petr Kulhavy from comment #6)
> > > Created attachment 339727 [details] [review] [review] [review] [review]
> > > Buffer pool patch
> > > 
> > > Here is my attempt to do the buffer pool. So far with non-configurable
> > > buffer size of 1500 (for the usual MTU). However it has the problem that the
> > > buffer is freed due to the resize to the actual packet size. Any idea how to
> > > avoid this issue and take the full advantage of the buffer pool?
> > 
> > The buffer pool should probably resize the buffer back (if it can) and check
> > the memory maxsize instead of the size.
> 
> That is what I have actually done in my local copy. Was there any particular
> reason for releasing it and not resizing it back?

I don't think so. You also need to check that it's the same memory still btw, but maybe the UDP buffer pool should be able to be less strict here.

> Can the maxsize change? How?

Different memory

Comment 12 Petr Kulhavy 2016-11-14 12:32:20 UTC

Created attachment 339793 [details] [review]
Remove redundant saddr unref

This is not causing any performance issue, but I found it during the optimizations. The unref of saddr before calling receive message seems to be redundant because the saddr is unref-ed just before goto retry.

Comment 13 Sebastian Dröge (slomo) 2016-11-14 12:51:15 UTC

Comment on attachment 339793 [details] [review]
Remove redundant saddr unref

It would probably make more sense to move it right after the "retry:" and remove it from before the "goto retry" places.

Comment 14 Petr Kulhavy 2016-11-14 13:25:08 UTC

Created attachment 339798 [details] [review]
Remove redundant saddr unref

Agree, makes sense. Here moved just after retry:

Comment 15 Sebastian Dröge (slomo) 2016-11-14 13:27:00 UTC

Comment on attachment 339798 [details] [review]
Remove redundant saddr unref

commit 89ad2de93ec6b5834e2111e58fc77868033f7ada
Author: Petr Kulhavy <brain@jikos.cz>
Date:   Mon Nov 14 12:13:14 2016 +0100

    udpsrc: remove redundant saddr unref
    
    The g_object_unref (saddr) before receiving message seems to be redundant as it
    is done just before jumping to retry
    
    Though not directly related, part of
    https://bugzilla.gnome.org/show_bug.cgi?id=772841

Comment 16 Petr Kulhavy 2016-11-14 16:44:17 UTC

Created attachment 339819 [details] [review]
Buffer pool patch

Here is the updated version of the buffer pool modifications with configurable buffer size (as a parameter).

Comment 17 Petr Kulhavy 2016-11-14 16:46:11 UTC

Created attachment 339820 [details] [review]
GstBufferPool modification

Here is a related modification of the default_reset_buffer() function in GstBufferPool to reset the buffer size if the memory hasn't changed. To be applied on gstreamer.

Comment 18 Petr Kulhavy 2016-11-14 17:17:30 UTC

I'm also wondering if reducing the number of syscalls would bring any further performance boost. I mean try to combine the wait (poll) and receive, e.g. by setting SO_RCVTIMEO on the socket. What do you think?

Comment 19 Tim-Philipp Müller 2016-11-14 17:24:21 UTC

> I'm also wondering if reducing the number of syscalls would bring any
> further performance boost. I mean try to combine the wait (poll) and
> receive, e.g. by setting SO_RCVTIMEO on the socket. What do you think?

We should use recvmmsg(). I have patches for that which I need to rebase, this is next on my list, give me a day or so please :)

Comment 20 Petr Kulhavy 2016-11-14 17:29:52 UTC

(In reply to Tim-Philipp Müller from comment #19)
> > I'm also wondering if reducing the number of syscalls would bring any
> > further performance boost. I mean try to combine the wait (poll) and
> > receive, e.g. by setting SO_RCVTIMEO on the socket. What do you think?
> 
> We should use recvmmsg(). I have patches for that which I need to rebase,
> this is next on my list, give me a day or so please :)

I have been looking at recvmmsg(). But the timeout seems to be buggy:

man recvmmsg:
"BUGS: The timeout argument does not work as intended.  The timeout is checked only after the receipt of each datagram, so that if up to vlen-1 datagrams are received before the timeout expires, but then no further data‐grams are received, the call will block forever."

So I'm afraid it is useless :-(

Comment 21 Nicolas Dufresne (ndufresne) 2016-11-14 18:47:38 UTC

(In reply to Petr Kulhavy from comment #20)
> I have been looking at recvmmsg(). But the timeout seems to be buggy:
> 
> man recvmmsg:
> "BUGS: The timeout argument does not work as intended.  The timeout is
> checked only after the receipt of each datagram, so that if up to vlen-1
> datagrams are received before the timeout expires, but then no further
> data‐grams are received, the call will block forever."
> 
> So I'm afraid it is useless :-(

Not totally true. When using it non-blocking, you can first poll, and then you call recvmmsg with N messages allocated. As it's non blocking, it will give you back M messages (where M <= N). You don't need to use the timeout.

Comment 22 Tim-Philipp Müller 2016-11-14 18:55:05 UTC

You can just optimise for the high-packetrate case, in which case you don't first poll, but just try to read packets, and if you don't get anything (or not enough) then you wait with poll.

Comment 23 Petr Kulhavy 2016-11-14 19:42:59 UTC

(In reply to Nicolas Dufresne (stormer) from comment #21)
> (In reply to Petr Kulhavy from comment #20)
> > I have been looking at recvmmsg(). But the timeout seems to be buggy:
> > 
> > man recvmmsg:
> > "BUGS: The timeout argument does not work as intended.  The timeout is
> > checked only after the receipt of each datagram, so that if up to vlen-1
> > datagrams are received before the timeout expires, but then no further
> > data‐grams are received, the call will block forever."
> > 
> > So I'm afraid it is useless :-(
> 
> Not totally true. When using it non-blocking, you can first poll, and then
> you call recvmmsg with N messages allocated. As it's non blocking, it will
> give you back M messages (where M <= N). You don't need to use the timeout.

That might make sense for usecases where latency is not critical. However for low latency streams you want to possibly receive individual packets (i.e. M ==  1), so there recvmmsg won't gain anything.

Comment 24 Tim-Philipp Müller 2016-11-14 19:55:12 UTC

Not sure I follow. The bug summary talks about a high packet rate. If that's the case, you will likely be able to retrieve multiple packets with a single system call, which improves throughput, latency and cpu usage. And in no case should you be worse off than with the current implementation.

Comment 25 Nicolas Dufresne (ndufresne) 2016-11-14 20:29:30 UTC

Petr, basically we don't need the timeout, and not using the timeout is what will prevent adding latency. Tim, I totally agree with you that we should revc first.

Comment 26 Petr Kulhavy 2016-11-14 21:20:08 UTC

(In reply to Nicolas Dufresne (stormer) from comment #25)
> Petr, basically we don't need the timeout, and not using the timeout is what
> will prevent adding latency. Tim, I totally agree with you that we should
> revc first.

Nicolas, I agree to that, calling recv first is the right approach.

Comment 27 Petr Kulhavy 2016-11-14 21:22:11 UTC

(In reply to Tim-Philipp Müller from comment #24)
> Not sure I follow. The bug summary talks about a high packet rate. If that's
> the case, you will likely be able to retrieve multiple packets with a single
> system call, which improves throughput, latency and cpu usage. And in no
> case should you be worse off than with the current implementation.

Tim, the use case I'm talking about is for example AES67 with 1k packets/second, 1ms each. To reduce the latency as much as possible ideally you would receive every single packet as fast as possible and send it down the pipe for further processing individually. So things like interrupt coalescence or receiving multiple packets are not desired in the first place since they increase the latency. The primary goal is then not to gather packets but to make the processing lightweight so it can sustain the high packet rate. Now that is the theory. The other thing is how much realtime Linux able to provide, how much overhead are the syscalls going to create and eventually how many packets per second is the system able to process with the given MIPS. So when it turns out the system is not responsive enough to handle that packet rate things like recvmmsg would actually help to save the situation (where packets cumulated). But ideally it should never come to that point. Does it make more sense now?

Comment 28 Petr Kulhavy 2016-11-15 08:47:00 UTC

Created attachment 339900 [details] [review]
Buffer pool patch

Here is a modified version, which reads first and polls only on EWOULDBLOCK.

Comment 29 Petr Kulhavy 2016-11-15 08:49:16 UTC

Created attachment 339902 [details] [review]
Optimize GstUdpSrc cache performance

Here is a small patch, which optimizes GstUdpSrc for cache performance:

- hot properties (used in read) are moved to the top
- reorder to avoid holes where needed
- unused property @ttl removed

Comment 30 Petr Kulhavy 2016-11-15 08:58:23 UTC

With all these changes I'm at 1-3% CPU for unicast (udpsrc ! fakesink @ 1kpps) with occasional temporary jump to 12% (using top). Multicast is moving between 2-9% and 17-22% due to the heavy allocations described in bug 769994.

It is a significant improvement but to sort it out completely we should get rid of the kernel message allocations as described in bug 769994. Because the typical use case with RTSP is multicast :-S

Comment 31 Petr Kulhavy 2016-11-24 11:31:24 UTC

Created attachment 340677 [details] [review]
Process control messages with a callback

I have created a patch, which uses my gsocket API improvement in bug 769994, and processes the control messages with a callback rather than getting them via the receive_message() interface. This reduces the CPU usage significantly as the overhead of serialization and memory allocation/deallocation is bypassed.
The patch works with both versions of the API.

Comment 32 George Kiagiadakis 2017-08-29 13:32:11 UTC

I have recently encountered similar performance-related issues with udpsrc. In my case, the visible issue was that it couldn't handle high bitrates (50-60 Mbps) with the default MTU of 1500... I had to enable jumbo frames to get just a small improvement and then realized that udpsrc is really inefficient in reading packets from the kernel buffer.

In order to improve the situation, I wrote a series of improvements:
https://git.collabora.com/cgit/user/gkiagia/gst-plugins-good.git/log/?h=udpsrc-highbitrate

These basically include:
* Use recvmmsg(), through recent GSocket API
* Use a separate reader thread to ensure we return execution to recvmmsg() as fast as possible
* Use a buffer pool to avoid a lot of new allocations

It's possible that some of it is overkill, however it is **still** losing a very small percentage of packets sometimes with a 57 Mbps RTP stream that I have been testing with (on a link-local connection). Thankfully, it is too small to notice (it's in the order of tens of packets out of a few million - with retransmission enabled it's not noticeable at all).

Comment 33 Nicolas Dufresne (ndufresne) 2017-08-29 14:53:50 UTC

The the extra thread part, why couldn't you just add queue after udpsrc ? Having 2 threads per source all the time seems like an important overhead for the other use cases.

Comment 34 George Kiagiadakis 2017-08-29 15:21:20 UTC

(In reply to Nicolas Dufresne (stormer) from comment #33)
> The the extra thread part, why couldn't you just add queue after udpsrc ?
> Having 2 threads per source all the time seems like an important overhead
> for the other use cases.

Perhaps it is overhead, yes. My initial idea was also to use a queue, but it failed my benchmark application, so I went for the separate thread...

After implementing the separate reader thread my benchmark was still failing, though, but later I realized that it works on real ethernet (benchmark is on the lo interface). Still, I decided to keep the thread anyway, because it feels safer... and since it's still losing some packets (!) I didn't want to risk it for this application.

You have a good point, though. I can do some real tests without the separate thread and see if it makes any real impact.

Comment 35 George Kiagiadakis 2017-08-29 15:24:31 UTC

Created attachment 358691 [details]
benchmark application

This is the benchmark I used initially. It's not good enough, but it shows the difference between using recvmmsg() and not using it.

Comment 36 Mathieu Duponchelle 2018-03-06 01:27:47 UTC

Created attachment 369368 [details] [review]
udpsrc: switch to using a buffer pool

This exposes a new property, mtu, which is used to determine the
initial size of buffers from the buffer pool. If received data
exceeds this, the element gracefully handles that in a manner similar
to what we had previously: a large memory gets filled and reallocated
at the next call to "fill".

The default size is set to 1500, which should cover most use cases.

With contributions from Mathieu Duponchelle <mathieu@centricular.com>

Comment 37 Mathieu Duponchelle 2018-03-06 01:34:12 UTC

I updated the buffer pool patch to make it upstreamable, with the following changes:

* At least on its own, the proposed patch incorrectly blocked on receive_message calls, as the udpsrc socket is blocking. We can not make it blocking, at least not by default, as it might be used by other code such as udpsink when sending; it would be possible to follow two different code paths depending on the socket's refcount, but that's complicated and IMO unrelated to the bufferpool work.

* I changed the property name to MTU

* I removed the conditions for calling config_set_params, caps are not required to be non-NULL there, and this made the udpsrc tests fail

* I added support for larger than MTU messages, by falling back to a larger memory, very similar to what we had before, except we no longer cache the allocator / allocation params

@Petr, I kept you as the author, and added myself as a contributor in the commit message, I hope that's fine by you :)

Comment 38 Nicolas Dufresne (ndufresne) 2018-03-06 01:46:12 UTC

Review of attachment 369368 [details] [review]:

Even though I'm sure the fallback is unlikely to be used, I think the implementation of that fallback should be better. As we request a pool from downstream, we should also do better with buffer that would have multiple memory. A unit test with downstream providing pool would also be nice.

::: gst/udp/gstudpsrc.c
@@ +837,3 @@
   p_saddr = (udpsrc->retrieve_sender_address) ? &saddr : NULL;
 
+  if (!gst_buffer_map (outbuf, &info, GST_MAP_READWRITE))

That is not ideal, if downstream setup buffers with multiple memory in it, this will merge them. You can use the ivec to prevent this.

@@ +843,3 @@
+  ivec[0].size = info.size;
+
+  /* Prepare memory in case the data size exceeds mtu */

s/mtu/buffer size/ because the size could have been set by downstream allocation reply.

@@ +855,3 @@
+
+    udpsrc->extra_mem =
+        gst_allocator_alloc (allocator, MAX_IPV4_UDP_PACKET_SIZE, &params);

I'm not sure what the size limit for the vector, but wouldn't it be better to add multiple buffers, this way if you have a 1400 mtu and downstream allocates 700, you have two zero copy buffers. All we need is to setup enough buffer in the vect so that MAX_IPV4_UDP_PACKET_SIZE is reached.

@@ +983,2 @@
+  if (res > udpsrc->mtu) {
+    gst_buffer_append_memory (outbuf, udpsrc->extra_mem);

The way our pool works, this will kill it. All memory will be freed when they come back because the "memory" got modified. Better use a buffer list.

Comment 39 Nicolas Dufresne (ndufresne) 2018-03-06 01:49:20 UTC

Seems like a bit of work, what about dropping downstream pool support ? This bug seems to care about caching, not doing fancy zero-copy.

Comment 40 Sebastian Dröge (slomo) 2018-03-06 07:16:26 UTC

(In reply to Nicolas Dufresne (stormer) from comment #38)

> @@ +983,2 @@
> +  if (res > udpsrc->mtu) {
> +    gst_buffer_append_memory (outbuf, udpsrc->extra_mem);
> 
> The way our pool works, this will kill it. All memory will be freed when
> they come back because the "memory" got modified. Better use a buffer list.

It can't be a buffer list, otherwise you break the packetization. Each buffer in a bufferlist is separate.

Comment 41 Sebastian Dröge (slomo) 2018-03-06 07:18:26 UTC

(In reply to Mathieu Duponchelle from comment #37)

> * At least on its own, the proposed patch incorrectly blocked on
> receive_message calls, as the udpsrc socket is blocking. We can not make it
> blocking, at least not by default, as it might be used by other code such as
> udpsink when sending; it would be possible to follow two different code
> paths depending on the socket's refcount, but that's complicated and IMO
> unrelated to the bufferpool work.

Maybe add a property for that? It seems like a valuable optimization (in a separate patch), that would prevent one syscall per packet if packets are arriving fast.

> * I added support for larger than MTU messages, by falling back to a larger
> memory, very similar to what we had before, except we no longer cache the
> allocator / allocation params

As Nicolas mentioned already this breaks the pool. We need to do something more clever here. Maybe only ever use a bufferpool if MTU is set, and don't set it by default. If MTU is set we always allocate that much and don't need to do any magic with different memories.

Comment 42 Mathieu Duponchelle 2018-03-06 14:47:43 UTC

(In reply to Sebastian Dröge (slomo) from comment #41)

> As Nicolas mentioned already this breaks the pool. We need to do something
> more clever here. Maybe only ever use a bufferpool if MTU is set, and don't
> set it by default. If MTU is set we always allocate that much and don't need
> to do any magic with different memories.

My thinking here was to optimize the default case, while not breaking the more exotic ones, meaning that the buffer pool can be used by default without any user interaction. This doesn't really "break" or "kill" anything, just means the larger than 1500 MTU case still requires allocation, which we can't really work around anyway?

Comment 43 Nicolas Dufresne (ndufresne) 2018-03-06 14:55:51 UTC

My main concern is your use of downstream pool, I would drop that part, you don't really need it, it's not aligned with the MTU. It is also totally untested. The rest is a bit ugly, the fact that the fallback will cause the memory to be freed is definitely worth a comment in the code.

Comment 44 Mathieu Duponchelle 2018-03-06 15:01:20 UTC

(In reply to Nicolas Dufresne (stormer) from comment #43)
> My main concern is your use of downstream pool, I would drop that part, you
> don't really need it, it's not aligned with the MTU. It is also totally
> untested.

FWIW I didn't change that from the originally proposed patch, I don't mind dropping it :)

> The rest is a bit ugly, the fact that the fallback will cause the
> memory to be freed is definitely worth a comment in the code.

Sure, though I think the ugliness is actually toned down with that patch compared to the currently existing code ;)

Comment 45 Mathieu Duponchelle 2018-03-06 16:42:08 UTC

Created attachment 369386 [details] [review]
udpsrc: switch to using a buffer pool

This exposes a new property, mtu, which is used to determine the
initial size of buffers from the buffer pool. If received data
exceeds this, the element gracefully handles that in a manner similar
to what we had previously: a large memory gets filled and reallocated
at the next call to "fill".

The default size is set to 1500, which should cover most use cases.

With contributions from Mathieu Duponchelle <mathieu@centricular.com>

Comment 46 Mathieu Duponchelle 2018-03-06 17:01:46 UTC

Created attachment 369387 [details] [review]
udpsrc: switch to using a buffer pool

This exposes a new property, mtu, which is used to determine the
initial size of buffers from the buffer pool. If received data
exceeds this, the element gracefully handles that in a manner similar
to what we had previously: a large memory gets filled and reallocated
at the next call to "fill".

The default size is set to 1500, which should cover most use cases.

With contributions from Mathieu Duponchelle <mathieu@centricular.com>

Comment 47 Nicolas Dufresne (ndufresne) 2018-03-06 17:18:37 UTC

Review of attachment 339820 [details] [review]:

I think this one we can take.

Comment 48 Nicolas Dufresne (ndufresne) 2018-03-06 17:19:54 UTC

Comment on attachment 339900 [details] [review]
Buffer pool patch

Mathieu reworked this one.

Comment 49 Nicolas Dufresne (ndufresne) 2018-03-06 17:21:36 UTC

Review of attachment 339902 [details] [review]:

Does that have a measurable impact ? Also, removing unused ttl should be a sep patch.

Comment 50 Mathieu Duponchelle 2018-03-06 20:04:13 UTC

(In reply to Nicolas Dufresne (stormer) from comment #49)
> Review of attachment 339902 [details] [review] [review]:
> 
> Does that have a measurable impact ? Also, removing unused ttl should be a
> sep patch.

Thanks, I did observe with George's test application, where the throughput is improved, and the percentage of lost packets measurably goes down, I'm unsure why to be honest.

Regarding CPU usage, I didn't notice an improvement, but that's with a simplistic test case, I expect improvements should be more remarkable in scenarios with more complex allocation scenarios, I'll let you know :)

Comment 51 Nicolas Dufresne (ndufresne) 2018-03-06 20:58:17 UTC

(In reply to Mathieu Duponchelle from comment #50)
> Thanks, I did observe with George's test application, where the throughput
> is improved, and the percentage of lost packets measurably goes down, I'm
> unsure why to be honest.

A wild guess, we are jumping between location in the object and endup with with cash miss, as the loops are slower, we get preempted and the socket internal buffer overrun. It's well known that we need to read the socket data asap to avoid overrun, though you could in theory hide all this by using a larger socket buffer. If it does improve thing with such a trivial patch, I'd say let's merge it. Second though, the ttl removal is find, and it's in the comment.

Comment 52 Nicolas Dufresne (ndufresne) 2018-03-08 02:13:42 UTC

Review of attachment 340677 [details] [review]:

::: gst/udp/gstudpsrc.c
@@ +889,3 @@
+#if GLIB_CHECK_VERSION(2,52,0)
+    /* use callback from receive_message() */
+    g_socket_set_cmsg_handler (udpsrc->used_socket, gst_udpsrc_check_dst_addr,

This never made it to 2.52, I believe it requires a tad more work upstream, have you pinged Philip Whitnall about this ?

Comment 53 Nicolas Dufresne (ndufresne) 2018-03-08 02:17:14 UTC

Review of attachment 369387 [details] [review]:

No more comment, I think it's good to merge in the next dev window.

Comment 54 Nicolas Dufresne (ndufresne) 2018-03-21 21:51:40 UTC

Comment on attachment 339820 [details] [review]
GstBufferPool modification

commit 7d75d3ef4181de6658fe7d4dbbe633082257315d (HEAD -> master)
Author: Petr Kulhavy <brain@jikos.cz>
Date:   Mon Nov 14 15:35:50 2016 +0100

    gstbuffer: reset buffer to its original size if intact
    
    Enhance default_reset_buffer() to resize the buffer to its full size if the
    memory hasn't changed. This allows to reuse the buffer even if the offset has
    changed or the size has shrunk, rather than freeing the buffer.
    
    Change related to: https://bugzilla.gnome.org/show_bug.cgi?id=772841

Comment 55 Nicolas Dufresne (ndufresne) 2018-03-21 21:52:29 UTC

commit 589019d8f5072d4470ea2ed76cfffa75f45e9426
Author: Petr Kulhavy <brain@jikos.cz>
Date:   Tue Nov 15 09:39:31 2016 +0100

    udpsrc: optimize GstUdpSrc object for cache performance
    
    Optimize GstUdpSrc for cache performance.
    
    Move the hot properties, which are used by the read function, to the top:
    @used_socket, @addr, @cancellable, @skip_first_bytes, @timeout,
    @retrieve_sender_address.
    
    Remove the unused property @ttl.
    
    Where needed reorder so that holes are avoided (the 64-bit @timeout)
    
    https://bugzilla.gnome.org/show_bug.cgi?id=772841

Attachment 369387 [details] pushed as 738eb0d - udpsrc: switch to using a buffer pool

Comment 56 Nicolas Dufresne (ndufresne) 2018-03-21 21:53:20 UTC

So only the callback based control message is missing, but that's blocked upstream.

Comment 57 GStreamer system administrator 2018-11-03 15:12:54 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/309.