Bug 664778 – souphttpsrc: chunk allocator: memory waste

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 664778 - souphttpsrc: chunk allocator: memory waste


Summary:	souphttpsrc: chunk allocator: memory waste


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Duplicates:	675010 (view as bug list)
Depends on:
Blocks:

Reported:	2011-11-25 06:21 UTC by Davy
Modified:	2016-06-02 10:50 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch to improve souphttpsrc allocation method (8.09 KB, patch) 2012-05-04 23:36 UTC, René Stadler	needs-work	Details \| Review
log info-without patch (709.66 KB, text/plain) 2012-05-08 07:06 UTC, yamin.sun.wz		Details
loginfo-with patch (13.18 KB, text/plain) 2012-05-08 07:08 UTC, yamin.sun.wz		Details
loginfo-with patch-src5 (50.34 KB, text/plain) 2012-05-09 01:51 UTC, yamin.sun.wz		Details
patch to fix this issue (1.64 KB, patch) 2015-12-01 09:16 UTC, pavan goyal	none	Details \| Review

Description Davy 2011-11-25 06:21:28 UTC

Hi There,

I observed memory waste related to souphttpsrc’s chunk allocator, plz don’t misunderstand that this is not a memory leak but just a heavy memory usages.

Let’s assume below configuration in case of playbin2

Basesrc block size = 24KB
Queue2 max-level-bytes = 24MB

The reason for little large “block size” is in order to improve socket performance.
large socket read size will reduce syscall and get a large data at once, if the socket buffer has enough packets. So it’s reasonable value, 24KB.

The problem, memory waste, is occurs below circumstance.

1. Network bandwidth is low
2. So socket buffer has small packets less than block size (24KB).

Let’s assume that the available socket buffer is always 12KB to make a simple scenario.

In this scenario,

The buffer which is allocated by “gst_pad_alloc_buffer” always has 24 KB. But actually, it contains only 12 KB so half of buffer will be wasted.

The worst case is,
Playbin2 links souphttpsrc and queue2 elements if the URI start with HTTP scheme.
In a queue2, it calculate buffer limit, in other words max-bytes-limit, using GST_BUFFER_SIZE() not a really allocated size.
- buffer size (GST_BUFFER_SIZE) will modified by got chunk CB to actually reading size -

Let apply it to our scenario,
buf->length = 12KB, real allocated size = 24KB.
If the queue2 is FULL, actually allocated memory size will be… 48 MB And the 24 MB will be wasted also.

As far as I know, gst_pad_alloc_buffer was applied to reduce memory allocation (zero-copy?)
But it causes memory waste and it’s a critical factor in embedded system which has small physical ram.
Actually, I’m very confusing what is the right way. So I’d like to hear what you think about this case.
One more question is how many performance improvement you get after apply zero-copy scheme (chunk allocator)?

BR,
Davy

Comment 1 René Stadler 2011-11-29 12:06:56 UTC

That reminds me. I once wrote a patch about this, but I think it got lost in a hard drive crash. IIRC we cannot get larger amounts of data from libsoup. But what we can do is be smarter with our allocations in souphttpsrc. It's basically as you say, but there is even a worse case: when the network is really slow, you can also get 0 bytes often (even for the default 4KB buffers). In that case, the buffer is thrown away and immediately reallocated...

The solution is to keep the buffer around if it's not full yet and keep filling it up before pushing. That reduces the overhead from re-allocations for zero reads and pushes larger, fewer buffers into the pipeline.

Comment 2 Sebastian Dröge (slomo) 2011-11-29 12:19:03 UTC

...and possibly increases latency. But that shouldn't be a too big problem and better than the current behaviour in any case, especially for the default 4k block-size value.

Comment 3 René Stadler 2011-11-29 12:40:29 UTC

Yes, but somebody who tweaks buffer sizes is well aware of the implications :)

Comment 4 Tim-Philipp Müller 2011-11-29 12:58:00 UTC

Not really sure why we do a gst_pad_alloc_buffer() here, seems a bit pointless.

Comment 5 René Stadler 2011-11-29 13:16:56 UTC

alloc_buffer is probably not so useful, but the fixes I mentioned still pay off performance wise. Allocating the buffer with malloc directly would give even better performance when you think about it.

Comment 6 Davy 2011-11-30 00:40:28 UTC

Thanks for all kindly feedbacks. 

In my opinion, we should consider performance trade-off between “socket performance” and “malloc performance”.

Regarding, socket performance related to block size,
I performed some internal test to compare socket’s throughput (bps), increasing the block size start form default size (4KB)

In case of LAN, the throughput was significantly improved in proportion to buffer size increment.
e.g. 24 KB has pretty good performance compare with 4KB. The gap was almost 2.5 times.
Another profit of larger block size, it reduce the loop count of gst_base_src_loop(). 
The only problem is memory waste. 

So my final opinion is..
I don’t want to give up both socket performance and memory utilization.
On the other hand, obviously I should to endure the additional memcpy() overhead and latency of making full buffer. – it will occurred when network bandwidth is too low cause “socket read” is just a copy buffer from Kernel to user-space.

Comment 7 Davy 2011-11-30 02:20:57 UTC

I’d like to know more detail regarding 0 bytes receive in got_chunk_cb.

Could you give me briefly answer about below questions?

Q1. Why 0 bytes flow into souphttpsrc? As far as I know, libsoup read socket if the socket goes to readable state using some polling method, select, poll, epoll and so on.

Q2. Could you explain more detail regarding reallocation scenario including related codes? And what’s the exactly overhead?

Q3. Could you guess any side-effect if I rollback the chunk_allocator patch?

Comment 8 René Stadler 2011-11-30 09:49:23 UTC

(In reply to comment #7)
> I’d like to know more detail regarding 0 bytes receive in got_chunk_cb.
> 
> Could you give me briefly answer about below questions?
> 
> Q1. Why 0 bytes flow into souphttpsrc? As far as I know, libsoup read socket if
> the socket goes to readable state using some polling method, select, poll,
> epoll and so on.

Yes, but you don't know how many bytes you can read when the socket is signalled as readable. In order to have zero-copy between libsoup and souphttpsrc, this is what is done:

 - libsoup asks the source to allocate a buffer to write into
 - libsoup asks the os to fill the buffer with data from the socket
 - the buffer is filled partially or fully, in non-blocking fashion
 - if the buffer is filled fully (easily doable for 4KB), libsoup asks for another buffer (since there *could* be more data to read right now)
 - libsoup asks the os again to fill the next buffer, but now it is very possible that 0 bytes are read
 - libsoup tells us to deallocate the buffer without being used

That patch I had solved this by recycling partially filled (including empty) buffers. I need to look around a bit in old backups, maybe it's sitting somewhere. Otherwise I can rewrite it I guess (after the next releases though, since a freeze is coming).

> Q2. Could you explain more detail regarding reallocation scenario including
> related codes? And what’s the exactly overhead?

malloc() followed by free(), not using the buffer in between, is useless overhead. Producing smaller GStreamer buffers than asked/needed is overhead because you need more calls to process the same data. 

> Q3. Could you guess any side-effect if I rollback the chunk_allocator patch?

Not sure what you mean actually.

Comment 9 Davy 2011-12-03 01:10:49 UTC

How about deploy FIONREAD ioctl?
Through this syscall, we can guess how many “unread” data is remained in the TCP socket.
Please refer the kernel code at /net/ipv4/tcp.c tcp_ioctl()
As a result we can eliminate "zero read" case which cause malloc() / free() overhead.

Regarding question #3,
What I really mean was
What will be happen if I delete chunk_allocator.
just worry about the unexpected malfunction in last revision.

Comment 10 René Stadler 2011-12-05 10:31:25 UTC

(In reply to comment #9)
> How about deploy FIONREAD ioctl?
> Through this syscall, we can guess how many “unread” data is remained in the
> TCP socket.
> Please refer the kernel code at /net/ipv4/tcp.c tcp_ioctl()
> As a result we can eliminate "zero read" case which cause malloc() / free()
> overhead.

Yes, but this has to be used at the libsoup level.

> Regarding question #3,
> What I really mean was
> What will be happen if I delete chunk_allocator.
> just worry about the unexpected malfunction in last revision.

As far as I can see, there is no commit you can just revert (the allocator stuff was added a very long time ago). So you have to modify the element yourself, which is of course possible. As Tim pointed out, the pad_alloc path won't provide a performance gain, so you should be able to remove it as a quick fix.

Comment 11 Davy 2011-12-05 12:21:34 UTC

Alright

I verified everything works fine after reverting chunk_allocator in my local patch.
Thanks a lot.

Regarding FIONREAD case, I’m not sure it has to be implemented under libsoup level due to libsoup support synchronous session (blocking socket) also.
So I think, souphttpsrc is better position to apply FIONREAD if we want to leave current chunk_allocator.

Comment 12 Tim-Philipp Müller 2012-05-04 06:08:11 UTC

*** Bug 675010 has been marked as a duplicate of this bug. ***

Comment 13 yamin.sun.wz 2012-05-04 06:35:21 UTC

Hi, Davy:

I also have the same problem about the memory waste.

I am a newcomer to souphttpsrc, so, could you please send me the patch about the problem if convenient?

Comment 14 yamin.sun.wz 2012-05-04 08:53:27 UTC

Hi， everyone：

I am urgent to solve the problem，someboby else could tell me where I can find the related patch？thanks！

Comment 15 René Stadler 2012-05-04 23:36:10 UTC

Created attachment 213480 [details] [review]
Patch to improve souphttpsrc allocation method

I dug around and actually found my old forgotten patch; now rebased on latest 0.10 branch. Maybe someone wants to try it, I want to commit it when I'm fully sure there are no regressions.

I guess due to the nature of TCP, nobody could face an issue because of increased latency here. Like for example, streaming from an MP3 radio station gives me 1400 byte buffers, with the patch 4096 bytes (blocksize property).

Comment 16 yamin.sun.wz 2012-05-05 08:40:59 UTC

Got it!

Next Monday I will try it.

thanks for you kindly help.

Comment 17 yamin.sun.wz 2012-05-08 07:06:45 UTC

Created attachment 213644 [details]
log info-without patch

Comment 18 yamin.sun.wz 2012-05-08 07:08:25 UTC

Created attachment 213645 [details]
loginfo-with patch

Comment 19 yamin.sun.wz 2012-05-08 07:12:53 UTC

Hi,René Stadler:

I verified the code with your patch and got advanced improvement.

But another problem occurs:
I have tested some files before adding the patch and they all can be played well.
After adding the patch, two file can not be played out.

I debug one day and got nothing.

I cteated two log files by attaching "GST_DEBUG=*basesrc*:4" when play the same problem:
one log file is got with code without patch, and another with patch.

could you please check them? Sorry to bother you.

Comment 20 René Stadler 2012-05-08 11:27:37 UTC

Please make a log where it fails with GST_DEBUG=*src*:5, that should give more information.

Comment 21 yamin.sun.wz 2012-05-09 01:51:42 UTC

Created attachment 213716 [details]
loginfo-with patch-src5

Hi:

I create another log information with GST_DEBUG=*src*:5, please check attachment.
thanks a lot.

Comment 22 René Stadler 2012-05-18 08:36:06 UTC

Yamin: That's really weird. Could you try if it works with the patch at much lower blocksize? Like blocksize=4096, which would be close to running without patch.

Maybe I'm missing something obvious, but I think that the problem lies elsewhere and is triggered by the larger buffers.

Comment 23 yamin.sun.wz 2012-05-24 09:06:35 UTC

Hi，René Stadle:

I have tried 4K blocksize, and got the same result.

By checking the debug information, I get the following information:

When playing the stream that can not be played with gstreamer package without patch, ordinal event ‘flush start','flush stop','newsegment' can be got, but the stream can be played well.
But if playing the this stream with gstreamer package with patch, different ordinal event got as following:
'flush start','flush stop','eos'.

So I think event 'eos' substitutes event 'newsegment' and results in the stream not playing out.

So what do you think about this?

Comment 24 Edward Hervey 2013-08-13 18:45:12 UTC

This bug has an assigned developer but has not received activity in almost a year.

Is the assigned person still working on this ?

Comment 25 my-yang-1 2015-06-08 02:04:40 UTC

Dear Yamin

I am a newcomer to souphttpsrc also.
I want to try check this patch.
From the log message, I guess it's DLNA playback.
Can you share the issue files?

Comment 26 Chandra Prakash 2015-10-20 04:15:23 UTC

Hi René Stadler,

I am facing similar issue mentioned in bug #675010 on 1.5.2. 
Can you please provide updated solution?
I tried to write similar modification for 1.5.2 but it's not working.

Thanks in advance

Comment 27 pavan goyal 2015-12-01 09:12:54 UTC

i think the better and simple way is to allocate a small buffer of received size if its not similar to what was allocated.

Comment 28 pavan goyal 2015-12-01 09:16:16 UTC

Created attachment 316577 [details] [review]
patch to fix this issue

issue is resolved with attached patch.

for more memory save, queue2 patch from 
https://bugzilla.gnome.org/show_bug.cgi?id=758891

Comment 29 pavan goyal 2015-12-01 09:16:38 UTC

please review and verify the attached.

Comment 30 Tim-Philipp Müller 2016-06-02 10:50:53 UTC

I think this bug is probably obsolete after the refactoring of souphttpsrc in bug #693911 around the new soup API.

I think Thiago was still experimenting with dynamic buffer size adjustments for a good/automatic latency/throughput trade-off, but that should probably be handled elsewhere.

I'm not in favour of copying into a smaller buffer, that can be quite costly as well - it's a trade-off I guess.

I say let's close this bug and open a new one for the new souphttpsrc code if it's still a problem.