GNOME Bugzilla – Bug 666143
SoupCache: add blockfile storage for small resources
Last modified: 2018-09-21 16:10:00 UTC
Both Mozilla and Chromium use some blockfiles to store small resources together. This approach has some advantages: 1- less disk fragmentation. Each small file is stored in a block of the hard disk (block size is commonly 4k) wasting most of its space. For example, in my epy cache I have around 6000 files and ~1300 are smaller than 1k. In that case I'm loosing more than 3K * 1300 = 3900K 2- speed. Reading the contents of small resources from a single mmap'ed file is way faster than opening a bunch of small files. I implemented a small program that reads the contents of ~1300 small files and then the same contents but from a single block file. These are the results (note that I cleared the buffer cache before each test) * SSD hard disk: Reading 1369 files lasted: 66109 microseconds Reading a large file lasted: 27557 microseconds (2,4x faster) * "normal" laptop hard disk Reading 1369 files lasted: 235907 microseconds Reading a large file lasted: 103494 microseconds (2,28x faster)
I have a patch ready for this bug. Will wait for the new streaming changes in libsoup to upload it tough.
(In reply to comment #1) > I have a patch ready for this bug. Will wait for the new streaming changes in > libsoup to upload it tough. the new streaming changes are landed
Sergio, do you still have the patch for this lying around somewhere? Even if it doesn't compile any more, it would be good to attach here for reference...
(In reply to comment #3) > Sergio, do you still have the patch for this lying around somewhere? Even if it > doesn't compile any more, it would be good to attach here for reference... Yes I have a local branch. I don't rebase it since a lot of time, but I might attach patches here...
Created attachment 232973 [details] [review] Basic implementation for block-file storage
Created attachment 232974 [details] [review] Store block_index and start_block in cache index
Created attachment 232975 [details] [review] Replaced GOutputStream by GIOStream to write data
Created attachment 232976 [details] [review] Store small resources in block files
Created attachment 232977 [details] [review] Added a define to refer to files not stored in block files
So these are the changes I have (I think) on top of bug 665884 in order to have block size storage for small resources. I don't think they apply anymore as it has been a long time since I don't rebase them. Also I know that I was waiting for the libsoup "streamization" to go on working on this stuff, so even rebasing them is not enough and will require some extra work. The basic idea is the following, we have 3 special files called block files. Each one is meant to store resources of a particular size. Instead of creating a different file for each resource we allocate space in one of those files (if the resources are too big then we keep storing them in separate files) to store the resource with its headers (that's because I developed it on top of the changes for bug 665884). Depending on the size of the resource, we allocate some "blocks" on one of these files to store the data. The allocation uses a bitmap stored on each file in order to manage the free blocks on each block file. The cache index needs now to store the block_index (the file were the resource is stored or -1 for independent file) and the start_block, the position inside the block file. Hope this could be used by someone to advance it, I will try to find some time also to go on working on this.
Created attachment 237534 [details] [review] Basic implementation for block-file storage
Created attachment 237536 [details] [review] SoupCache working with block file storage So I have finally found some time to work on this. The first thing I did was to decouple this patch from the "do not store headers on index" because the other one needs more work. Now block file storage should work out of the box with the SoupCache. Apart from all the basic functions to allocate/deallocate blocks the changes to the SoupCache are minimal as we keep storing/reading resources using streams, so we just need to change the base stream used by the cache stream.
[mass-moving all "UNCONFIRMED" libsoup bugs to "NEW" after disabling the "UNCONFIRMED" status for this product now that bugzilla.gnome.org allows that. bugspam-libsoup-20150210]
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/libsoup/issues/41.