After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 666143 - SoupCache: add blockfile storage for small resources
SoupCache: add blockfile storage for small resources
Status: RESOLVED OBSOLETE
Product: libsoup
Classification: Core
Component: Misc
unspecified
Other Linux
: Normal normal
: ---
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
Depends on: 720184
Blocks:
 
 
Reported: 2011-12-14 09:20 UTC by Sergio Villar
Modified: 2018-09-21 16:10 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Basic implementation for block-file storage (10.90 KB, patch)
2013-01-08 15:40 UTC, Sergio Villar
none Details | Review
Store block_index and start_block in cache index (4.12 KB, patch)
2013-01-08 15:40 UTC, Sergio Villar
none Details | Review
Replaced GOutputStream by GIOStream to write data (7.72 KB, patch)
2013-01-08 15:41 UTC, Sergio Villar
none Details | Review
Store small resources in block files (8.38 KB, patch)
2013-01-08 15:42 UTC, Sergio Villar
none Details | Review
Added a define to refer to files not stored in block files (4.82 KB, patch)
2013-01-08 15:43 UTC, Sergio Villar
none Details | Review
Basic implementation for block-file storage (13.82 KB, patch)
2013-02-27 16:42 UTC, Sergio Villar
none Details | Review
SoupCache working with block file storage (15.99 KB, patch)
2013-02-27 16:46 UTC, Sergio Villar
none Details | Review

Description Sergio Villar 2011-12-14 09:20:50 UTC
Both Mozilla and Chromium use some blockfiles to store small resources together. This approach has some advantages:

1- less disk fragmentation. Each small file is stored in a block of the hard disk (block size is commonly 4k) wasting most of its space. For example, in my epy cache I have around 6000 files and ~1300 are smaller than 1k. In that case I'm loosing more than 3K * 1300 = 3900K

2- speed. Reading the contents of small resources from a single mmap'ed file is way faster than opening a bunch of small files. I implemented a small program that reads the contents of ~1300 small files and then the same contents but from a single block file. These are the results (note that I cleared the buffer cache before each test)

* SSD hard disk:
Reading 1369 files lasted: 66109 microseconds
Reading a large file lasted: 27557 microseconds  (2,4x faster)

* "normal" laptop hard disk
Reading 1369 files lasted: 235907 microseconds
Reading a large file lasted: 103494 microseconds (2,28x faster)
Comment 1 Sergio Villar 2011-12-21 18:38:02 UTC
I have a patch ready for this bug. Will wait for the new streaming changes in libsoup to upload it tough.
Comment 2 Dan Winship 2012-04-18 15:06:03 UTC
(In reply to comment #1)
> I have a patch ready for this bug. Will wait for the new streaming changes in
> libsoup to upload it tough.

the new streaming changes are landed
Comment 3 Dan Winship 2012-12-12 11:45:39 UTC
Sergio, do you still have the patch for this lying around somewhere? Even if it doesn't compile any more, it would be good to attach here for reference...
Comment 4 Sergio Villar 2012-12-12 11:53:00 UTC
(In reply to comment #3)
> Sergio, do you still have the patch for this lying around somewhere? Even if it
> doesn't compile any more, it would be good to attach here for reference...

Yes I have a local branch. I don't rebase it since a lot of time, but I might attach patches here...
Comment 5 Sergio Villar 2013-01-08 15:40:02 UTC
Created attachment 232973 [details] [review]
Basic implementation for block-file storage
Comment 6 Sergio Villar 2013-01-08 15:40:47 UTC
Created attachment 232974 [details] [review]
Store block_index and start_block in cache index
Comment 7 Sergio Villar 2013-01-08 15:41:26 UTC
Created attachment 232975 [details] [review]
Replaced GOutputStream by GIOStream to write data
Comment 8 Sergio Villar 2013-01-08 15:42:04 UTC
Created attachment 232976 [details] [review]
Store small resources in block files
Comment 9 Sergio Villar 2013-01-08 15:43:02 UTC
Created attachment 232977 [details] [review]
Added a define to refer to files not stored in block files
Comment 10 Sergio Villar 2013-01-08 15:52:17 UTC
So these are the changes I have (I think) on top of bug 665884 in order to have block size storage for small resources. I don't think they apply anymore as it has been a long time since I don't rebase them. Also I know that I was waiting for the libsoup "streamization" to go on working on this stuff, so even rebasing them is not enough and will require some extra work.

The basic idea is the following, we have 3 special files called block files. Each one is meant to store resources of a particular size. Instead of creating a different file for each resource we allocate space in one of those files (if the resources are too big then we keep storing them in separate files) to store the resource with its headers (that's because I developed it on top of the changes for bug 665884). Depending on the size of the resource, we allocate some "blocks" on one of these files to store the data. The allocation uses a bitmap stored on each file in order to manage the free blocks on each block file.

The cache index needs now to store the block_index (the file were the resource is stored or -1 for independent file) and the start_block, the position inside the block file.

Hope this could be used by someone to advance it, I will try to find some time also to go on working on this.
Comment 11 Sergio Villar 2013-02-27 16:42:14 UTC
Created attachment 237534 [details] [review]
Basic implementation for block-file storage
Comment 12 Sergio Villar 2013-02-27 16:46:06 UTC
Created attachment 237536 [details] [review]
SoupCache working with block file storage

So I have finally found some time to work on this. The first thing I did was to decouple this patch from the "do not store headers on index" because the other one needs more work.

Now block file storage should work out of the box with the SoupCache. Apart from all the basic functions to allocate/deallocate blocks the changes to the SoupCache are minimal as we keep storing/reading resources using streams, so we just need to change the base stream used by the cache stream.
Comment 13 Dan Winship 2015-02-10 11:58:42 UTC
[mass-moving all "UNCONFIRMED" libsoup bugs to "NEW" after disabling the "UNCONFIRMED" status for this product now that bugzilla.gnome.org allows that. bugspam-libsoup-20150210]
Comment 14 GNOME Infrastructure Team 2018-09-21 16:10:00 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/libsoup/issues/41.