After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 358077 - save memory by strippping out common text in multipart message-ids
save memory by strippping out common text in multipart message-ids
Status: RESOLVED FIXED
Product: Pan
Classification: Other
Component: general
pre-1.0 betas
Other Linux
: Normal normal
: 1.0
Assigned To: Charles Kerr
Pan QA Team
Depends on:
Blocks:
 
 
Reported: 2006-09-28 02:20 UTC by Charles Kerr
Modified: 2006-09-30 01:43 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
0.114 patch (14.21 KB, patch)
2006-09-28 02:21 UTC, Charles Kerr
none Details | Review
0.114 patch (14.60 KB, patch)
2006-09-28 18:44 UTC, Charles Kerr
none Details | Review
0.114 patch (18.67 KB, patch)
2006-09-28 23:14 UTC, Charles Kerr
none Details | Review

Description Charles Kerr 2006-09-28 02:20:20 UTC
Message-Ids in multipart articles are usually nearly identical, like this:

   <JIudnQRwg-iopJbYnZ2dnUVZ_v-dnZ2d@giganews.com>
   <JIudnQdwg-ihpJbYnZ2dnUVZ_v-dnZ2d@giganews.com>
   <JIudnQZwg-jepJbYnZ2dnUVZ_v-dnZ2d@giganews.com>
   <JIudnQFwg-jXpJbYnZ2dnUVZ_v-dnZ2d@giganews.com>
   <JIudnQBwg-jMpJbYnZ2dnUVZ_v-dnZ2d@giganews.com>
   <JIudnQNwg-jFpJbYnZ2dnUVZ_v-dnZ2d@giganews.com>

In large newsgroups, _many_ megs can be saved by stripping out common text.

There are lots of ways to do this, but the implementation in the
following attachment uses this scheme:

We assign Article::Part's Message-Id by passing in its real Message-Id and
a reference key (which currently is always the owner Article's message_id).
The identical chars at the beginning (b) and end (e) of the two are counted.
b and e have an upper bound of UCHAR_MAX (255).
Article::Part::folded_message_id's first byte holds 'b'.
The unique middle characters follow, then the last byte holds 'e'.

As a special case, when the Part's Message-Id is equal to the key,
part.folded_message_id is set to "=".
Comment 1 Charles Kerr 2006-09-28 02:21:10 UTC
Created attachment 73528 [details] [review]
0.114 patch

First draft.
Comment 2 Charles Kerr 2006-09-28 02:40:23 UTC
From a 30 day sampling of a.b.drwho:

0.114:         109 meg
0.114 + patch:  91 meg

Given the large memory win, I'd like to get this into 1.0
if the patch proves to be stable enough.
Comment 3 Charles Kerr 2006-09-28 18:44:56 UTC
Created attachment 73577 [details] [review]
0.114 patch
Comment 4 Charles Kerr 2006-09-28 18:45:58 UTC
Comment on attachment 73577 [details] [review]
0.114 patch

Second draft.
Comment 5 Charles Kerr 2006-09-28 23:14:03 UTC
Created attachment 73599 [details] [review]
0.114 patch

Third draft.

* save more memory (cost of a.b.drwho goes from 130M to 101M)
  by having Part use char* instead of std::strings

* faster Part loading from disk.

* avoid unnecessary string cloning during xover's load_part.

This draft looks good in valgrind & sysprof.
Comment 6 Charles Kerr 2006-09-28 23:24:04 UTC
BTW, that's 130M in the second draft, not 130M in 0.114.
We've now cut the footprint by over half in large groups.

Here's top looking at 0.114 vs 0.114 + third draft. This
was taken after starting up each and loading a 30 day
snapshot of a.b.dvd:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
10331 charles   15   0  917m 910m 9.8m S    0 25.8   0:21.99 pan-old
10319 charles   16   0  400m 392m 9992 S    0 11.1   0:18.80 pan-new