GNOME Bugzilla – Bug 363695
EMemPool, EStrv and EPoolv are obsolete
Last modified: 2021-05-19 11:04:04 UTC
libedataserver provides two data-structures for efficient string storage: EStrv and EPoolv. In the interest of removing old cruft, I believe these should be dropped or at least deprecated. EStrv ----- EStrv provides memory-efficient storage of strings similar to GLib's GStringChunk [1], except the strings can be "unpacked" into a string vector where they can modified freely, and then re-packed into a string "chunk". The API seems rather clumsy, and it's limited to 254 strings per instance. EStrv is no longer used by Evolution, Evolution-Data-Server, or Evolution-Exchange. EPoolv ------ An EPoolv is a vector of strings that live in a global string pool. The idea is to reduce memory consumption by sharing constant strings that are used frequently (such as email headers). This is similar to GLib's GQuark [2] except that EPoolv "quarks" (or indexes) are only unique to an EPoolv instance, whereas GQuarks are globally unique. There is a single case of EPoolv usage in Evolution's message-list.c. It looks like it's trying to maintain a unique copy of normalized "from", "to", and "subject" strings for each CamelMessageInfo. I think this could be replaced with a simple g_string_chunk_insert_const() [3], which maintains a unique copy of the string and also stores it efficiently. [1] http://developer.gnome.org/doc/API/2.0/glib/glib-String-Chunks.html [2] http://developer.gnome.org/doc/API/2.0/glib/glib-Quarks.html [3] http://developer.gnome.org/doc/API/2.0/glib/glib-String-Chunks.html#g-string-chunk-insert-const
Created attachment 75093 [details] [review] Proposed patch for evolution This patch replaces the EPoolv use case in message-list.c with a GStringChunk. There's a few performance and memory usage trade-offs at work here: - Each MessageList now has its own GStringChunk instead of a global one shared by all. So strings are not shared as much as they could be, but they do get freed along with their MessageList rather than lingering in memory until program termination. - The get_normalised_string() function always normalises the string now, instead of trying to fetch a cached copy first. But the logic is much shorter and simpler. I haven't taken any serious measurements comparing this patch against the EPoolv approach, but I would expect it to be close to a wash.
Created attachment 75094 [details] [review] Proposed patch for evolution-data-server This patch just removes EStrv and EPoolv from libedataserver. We can defer this patch if the data structures are to be deprecated but not yet removed.
the main point of that code was to reduce processing time needed to sort the message-list, I'm afraid that your new implementation will balloon the processing time a huge amount. There's no use for a string pool if you're gonna recalculate the collated strings for each comparison.
Created attachment 75141 [details] [review] Revised patch for evolution Jeff is right, I midjudged the intent of the code and especially the expense of calling g_utf8_collate_key(). Here's a revised patch that brings back the normalised string cache, but as a simple string-to-collation-key mapping. This will make the cache grow larger during runtime and precludes us from doing garbage collection on it, but the comparison should be faster than the EPoolv version. Both key and value strings are stored in the local string chunk, which lives and dies with the MessageList instance. This seems like an acceptable trade-off to me. Any thoughts Jeff?
this patch looks good to me (I haven't tested it, but the logic looks right to me this time) I'll have to leave it up to the Evo team to accept tho
I'm adding EMemPool to my hit list here, since it looks like all current uses of it could easily be replaced by GStringChunk.
Created attachment 75406 [details] [review] Revised patch for evolution This patch contains all previous changes and also replaces an EMemPool with a GStringChunk.
Created attachment 75412 [details] [review] Revised patch for evolution-data-server This patch contains all previous changes and also replaces all EMemPool instances with GStringChunk and removes EMemPool from libedataserver.
After updating tinymail's camel-lite with #75412
+ Trace 78326
Thread NaN (LWP 26323)
Continuing. (tinymail:26174): GLib-CRITICAL **: g_string_chunk_free: assertion `chunk != NULL' failed Breakpoint 1, IA__g_log (log_domain=0xb75981d7 "GLib", log_level=-1218870825, format=0xb75981d7 "GLib") at gmessages.c:516 516 in gmessages.c (gdb)
It also looks like this (probably caused by whathever is causing the the warning too) introduces a memory-leak of ~200kb per 2,000 headers received (by imap-rescan, for example)
Created attachment 75421 [details] [review] Revised patch for evolution-data-server Here's a revised patch that fixes the GLib-CRITICAL warning, and hopefully the memory leak that Philip observed. Philip, can you confirm this?
ps. I haven't yet tested this version. I will try to do this tomorrow.
The warning is gone, but I'm still experiencing the memory leak. Memory consumption when downloading new messages is also something like 200% higher.
See Jeff's analysis of the pitfalls of GStringChunk [1] and subsequent discussion on gtk-devel-list starting at [2]. The lesson I'm taking away is that the chunk size needs to be significantly larger than the average size of the strings being stored. My patch tends to allocate string chunks of only 1024 bytes, a number I didn't give much consideration to. This is probably causing large amounts of memory to be wasted (as Jeff pointed out), which may account for the growth in memory consumption that Philip is seeing. Perhaps the chunk size should be more in the neighborhood of 16K or larger. I don't see anything else in the patch that would be causing a memory leak. [1] http://mail.gnome.org/archives/evolution-hackers/2006-November/msg00003.html [2] http://mail.gnome.org/archives/gtk-devel-list/2006-November/msg00015.html
Created attachment 87336 [details] [review] Revised patch for evolution A flaw in my Evolution patch was the cause of [1] and [2] for Fedora users. This revision corrects the flaw found by Peter Jones (see [2] for details). [1] http://bugzilla.gnome.org/show_bug.cgi?id=373713 [2] http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238497
The latest patch that you have sent in Comment #15 doesn't change Attachment #75421 [details], right? Because I tested 75421 with an unmodified Tinymail, which Attachment #87336 [details] of course wont change, yet I experienced this 'more memory usage'. So I'm queuing this patch for Tinymail's camel-lite (as it needs more investigation on why this memory increase took place) before I apply it. More memory consumption (either on average or caused by a leak) is not interesting for me :-(. Sorry: not surprisingly is my priority of memory consumption higher than the one for reducing old cruft. Bookmarked, though.
(In reply to comment #16) > The latest patch that you have sent in Comment #15 doesn't change Attachment > #75421 [edit], right? Correct. The Evolution patch only modifies mail/message-list.c by replacing EMemPool with GStringChunk. The chunk size needs to be tuned though. 1K chunks are too wasteful, as Jeff pointed out (see comment #14). > So I'm queuing this patch for Tinymail's camel-lite (as it needs more > investigation on why this memory increase took place) before I apply it. More > memory consumption (either on average or caused by a leak) is not interesting > for me :-(. Sorry: not surprisingly is my priority of memory consumption higher > than the one for reducing old cruft. Here again, I'm thinking the string chunk size just needs to be tuned in the Evolution-Data-Server patch. In some spots in that patch we're allocating chunks of only 256 bytes, so it's no wonder the memory consumption doubled! My guess is a chunk size of 16K or 32K will at least get the numbers moving in the right direction.
So I hereby invite you to experiment with Tinymail's camel-lite. On this page you can get some information on how I measured its memory consumption: http://tinymail.org/trac/tinymail/wiki/MemoryStats Although the consumption will differ quite a lot from Evolution's Camel (mostly due to the mmap patch being applied), you will have a much more fine grained environment to perform regression testing on memory consumption, with Tinymail and its camel-lite implementation/ adaptations of the Camel API.
Awesome, thanks! I've been meaning to ask you how you generated those measurements for camel-lite. I'll try to reproduce what you observed in comment #13 and take it from there.
Bookmarked this bug for synchronizing it with Tinymail's camel-lite
Bumping version to a stable release.
@ Philip is there any update ?
No updates from my side
Patches doesn't apply to actual sources these day. I guess a little rethink on the bug itself, and definitely an update for patches is necessary.
I certainly wouldn't advocate the use of GStringChunk these days, but it would still be nice to see if anything is still using them and deprecate them if not. (Or move them to Camel if appropriate. I've been trying to kill Camel's libedataserver dependency for years now.)
Looks like EStrv is the only one left these days.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/Community/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/evolution-data-server/-/issues/ Thank you for your understanding and your help.