GNOME Bugzilla – Bug 664778
souphttpsrc: chunk allocator: memory waste
Last modified: 2016-06-02 10:50:53 UTC
Hi There, I observed memory waste related to souphttpsrc’s chunk allocator, plz don’t misunderstand that this is not a memory leak but just a heavy memory usages. Let’s assume below configuration in case of playbin2 Basesrc block size = 24KB Queue2 max-level-bytes = 24MB The reason for little large “block size” is in order to improve socket performance. large socket read size will reduce syscall and get a large data at once, if the socket buffer has enough packets. So it’s reasonable value, 24KB. The problem, memory waste, is occurs below circumstance. 1. Network bandwidth is low 2. So socket buffer has small packets less than block size (24KB). Let’s assume that the available socket buffer is always 12KB to make a simple scenario. In this scenario, The buffer which is allocated by “gst_pad_alloc_buffer” always has 24 KB. But actually, it contains only 12 KB so half of buffer will be wasted. The worst case is, Playbin2 links souphttpsrc and queue2 elements if the URI start with HTTP scheme. In a queue2, it calculate buffer limit, in other words max-bytes-limit, using GST_BUFFER_SIZE() not a really allocated size. - buffer size (GST_BUFFER_SIZE) will modified by got chunk CB to actually reading size - Let apply it to our scenario, buf->length = 12KB, real allocated size = 24KB. If the queue2 is FULL, actually allocated memory size will be… 48 MB And the 24 MB will be wasted also. As far as I know, gst_pad_alloc_buffer was applied to reduce memory allocation (zero-copy?) But it causes memory waste and it’s a critical factor in embedded system which has small physical ram. Actually, I’m very confusing what is the right way. So I’d like to hear what you think about this case. One more question is how many performance improvement you get after apply zero-copy scheme (chunk allocator)? BR, Davy
That reminds me. I once wrote a patch about this, but I think it got lost in a hard drive crash. IIRC we cannot get larger amounts of data from libsoup. But what we can do is be smarter with our allocations in souphttpsrc. It's basically as you say, but there is even a worse case: when the network is really slow, you can also get 0 bytes often (even for the default 4KB buffers). In that case, the buffer is thrown away and immediately reallocated... The solution is to keep the buffer around if it's not full yet and keep filling it up before pushing. That reduces the overhead from re-allocations for zero reads and pushes larger, fewer buffers into the pipeline.
...and possibly increases latency. But that shouldn't be a too big problem and better than the current behaviour in any case, especially for the default 4k block-size value.
Yes, but somebody who tweaks buffer sizes is well aware of the implications :)
Not really sure why we do a gst_pad_alloc_buffer() here, seems a bit pointless.
alloc_buffer is probably not so useful, but the fixes I mentioned still pay off performance wise. Allocating the buffer with malloc directly would give even better performance when you think about it.
Thanks for all kindly feedbacks. In my opinion, we should consider performance trade-off between “socket performance” and “malloc performance”. Regarding, socket performance related to block size, I performed some internal test to compare socket’s throughput (bps), increasing the block size start form default size (4KB) In case of LAN, the throughput was significantly improved in proportion to buffer size increment. e.g. 24 KB has pretty good performance compare with 4KB. The gap was almost 2.5 times. Another profit of larger block size, it reduce the loop count of gst_base_src_loop(). The only problem is memory waste. So my final opinion is.. I don’t want to give up both socket performance and memory utilization. On the other hand, obviously I should to endure the additional memcpy() overhead and latency of making full buffer. – it will occurred when network bandwidth is too low cause “socket read” is just a copy buffer from Kernel to user-space.
I’d like to know more detail regarding 0 bytes receive in got_chunk_cb. Could you give me briefly answer about below questions? Q1. Why 0 bytes flow into souphttpsrc? As far as I know, libsoup read socket if the socket goes to readable state using some polling method, select, poll, epoll and so on. Q2. Could you explain more detail regarding reallocation scenario including related codes? And what’s the exactly overhead? Q3. Could you guess any side-effect if I rollback the chunk_allocator patch?
(In reply to comment #7) > I’d like to know more detail regarding 0 bytes receive in got_chunk_cb. > > Could you give me briefly answer about below questions? > > Q1. Why 0 bytes flow into souphttpsrc? As far as I know, libsoup read socket if > the socket goes to readable state using some polling method, select, poll, > epoll and so on. Yes, but you don't know how many bytes you can read when the socket is signalled as readable. In order to have zero-copy between libsoup and souphttpsrc, this is what is done: - libsoup asks the source to allocate a buffer to write into - libsoup asks the os to fill the buffer with data from the socket - the buffer is filled partially or fully, in non-blocking fashion - if the buffer is filled fully (easily doable for 4KB), libsoup asks for another buffer (since there *could* be more data to read right now) - libsoup asks the os again to fill the next buffer, but now it is very possible that 0 bytes are read - libsoup tells us to deallocate the buffer without being used That patch I had solved this by recycling partially filled (including empty) buffers. I need to look around a bit in old backups, maybe it's sitting somewhere. Otherwise I can rewrite it I guess (after the next releases though, since a freeze is coming). > Q2. Could you explain more detail regarding reallocation scenario including > related codes? And what’s the exactly overhead? malloc() followed by free(), not using the buffer in between, is useless overhead. Producing smaller GStreamer buffers than asked/needed is overhead because you need more calls to process the same data. > Q3. Could you guess any side-effect if I rollback the chunk_allocator patch? Not sure what you mean actually.
How about deploy FIONREAD ioctl? Through this syscall, we can guess how many “unread” data is remained in the TCP socket. Please refer the kernel code at /net/ipv4/tcp.c tcp_ioctl() As a result we can eliminate "zero read" case which cause malloc() / free() overhead. Regarding question #3, What I really mean was What will be happen if I delete chunk_allocator. just worry about the unexpected malfunction in last revision.
(In reply to comment #9) > How about deploy FIONREAD ioctl? > Through this syscall, we can guess how many “unread” data is remained in the > TCP socket. > Please refer the kernel code at /net/ipv4/tcp.c tcp_ioctl() > As a result we can eliminate "zero read" case which cause malloc() / free() > overhead. Yes, but this has to be used at the libsoup level. > Regarding question #3, > What I really mean was > What will be happen if I delete chunk_allocator. > just worry about the unexpected malfunction in last revision. As far as I can see, there is no commit you can just revert (the allocator stuff was added a very long time ago). So you have to modify the element yourself, which is of course possible. As Tim pointed out, the pad_alloc path won't provide a performance gain, so you should be able to remove it as a quick fix.
Alright I verified everything works fine after reverting chunk_allocator in my local patch. Thanks a lot. Regarding FIONREAD case, I’m not sure it has to be implemented under libsoup level due to libsoup support synchronous session (blocking socket) also. So I think, souphttpsrc is better position to apply FIONREAD if we want to leave current chunk_allocator.
*** Bug 675010 has been marked as a duplicate of this bug. ***
Hi, Davy: I also have the same problem about the memory waste. I am a newcomer to souphttpsrc, so, could you please send me the patch about the problem if convenient?
Hi, everyone: I am urgent to solve the problem,someboby else could tell me where I can find the related patch?thanks!
Created attachment 213480 [details] [review] Patch to improve souphttpsrc allocation method I dug around and actually found my old forgotten patch; now rebased on latest 0.10 branch. Maybe someone wants to try it, I want to commit it when I'm fully sure there are no regressions. I guess due to the nature of TCP, nobody could face an issue because of increased latency here. Like for example, streaming from an MP3 radio station gives me 1400 byte buffers, with the patch 4096 bytes (blocksize property).
Got it! Next Monday I will try it. thanks for you kindly help.
Created attachment 213644 [details] log info-without patch
Created attachment 213645 [details] loginfo-with patch
Hi,René Stadler: I verified the code with your patch and got advanced improvement. But another problem occurs: I have tested some files before adding the patch and they all can be played well. After adding the patch, two file can not be played out. I debug one day and got nothing. I cteated two log files by attaching "GST_DEBUG=*basesrc*:4" when play the same problem: one log file is got with code without patch, and another with patch. could you please check them? Sorry to bother you.
Please make a log where it fails with GST_DEBUG=*src*:5, that should give more information.
Created attachment 213716 [details] loginfo-with patch-src5 Hi: I create another log information with GST_DEBUG=*src*:5, please check attachment. thanks a lot.
Yamin: That's really weird. Could you try if it works with the patch at much lower blocksize? Like blocksize=4096, which would be close to running without patch. Maybe I'm missing something obvious, but I think that the problem lies elsewhere and is triggered by the larger buffers.
Hi,René Stadle: I have tried 4K blocksize, and got the same result. By checking the debug information, I get the following information: When playing the stream that can not be played with gstreamer package without patch, ordinal event ‘flush start','flush stop','newsegment' can be got, but the stream can be played well. But if playing the this stream with gstreamer package with patch, different ordinal event got as following: 'flush start','flush stop','eos'. So I think event 'eos' substitutes event 'newsegment' and results in the stream not playing out. So what do you think about this?
This bug has an assigned developer but has not received activity in almost a year. Is the assigned person still working on this ?
Dear Yamin I am a newcomer to souphttpsrc also. I want to try check this patch. From the log message, I guess it's DLNA playback. Can you share the issue files?
Hi René Stadler, I am facing similar issue mentioned in bug #675010 on 1.5.2. Can you please provide updated solution? I tried to write similar modification for 1.5.2 but it's not working. Thanks in advance
i think the better and simple way is to allocate a small buffer of received size if its not similar to what was allocated.
Created attachment 316577 [details] [review] patch to fix this issue issue is resolved with attached patch. for more memory save, queue2 patch from https://bugzilla.gnome.org/show_bug.cgi?id=758891
please review and verify the attached.
I think this bug is probably obsolete after the refactoring of souphttpsrc in bug #693911 around the new soup API. I think Thiago was still experimenting with dynamic buffer size adjustments for a good/automatic latency/throughput trade-off, but that should probably be handled elsewhere. I'm not in favour of copying into a smaller buffer, that can be quite costly as well - it's a trade-off I guess. I say let's close this bug and open a new one for the new souphttpsrc code if it's still a problem.