Bug 347425 – memory usage reduction

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 347425 - memory usage reduction


Summary:	memory usage reduction


Status:	RESOLVED FIXED

Product:	rhythmbox
Classification:	Other
Component:	general
Version:	HEAD
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	RhythmBox Maintainers
QA Contact:	RhythmBox Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-07-13 16:33 UTC by William Jon McCann
Modified:	2008-10-01 21:04 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
valgrind start up with plugins enabled (164.58 KB, application/postscript) 2006-07-13 16:36 UTC, William Jon McCann		Details
valgrind start up with all plugins disabled (166.79 KB, application/postscript) 2006-07-13 16:37 UTC, William Jon McCann		Details
use RBRefString for LOCATION property (35.46 KB, patch) 2006-07-24 10:06 UTC, James "Doc" Livingston	none	Details \| Review
resync with CVS (38.11 KB, patch) 2006-07-25 16:00 UTC, William Jon McCann	committed	Details \| Review
experiment with refwords (102.86 KB, patch) 2006-07-26 16:25 UTC, William Jon McCann	none	Details \| Review
Use refstrings for DAAP playlists too (3.47 KB, patch) 2006-07-28 08:37 UTC, James "Doc" Livingston	committed	Details \| Review
reduce property model memory usage (4.56 KB, patch) 2006-09-03 22:21 UTC, Jonathan Matthew	committed	Details \| Review
remove reverse_map (21.60 KB, patch) 2007-02-18 06:31 UTC, James "Doc" Livingston	none	Details \| Review
updated 'remove reverse map' (21.51 KB, patch) 2007-03-21 07:45 UTC, James "Doc" Livingston	none	Details \| Review
Reduce definitely lost leakage without plugins (1.97 KB, patch) 2007-11-25 05:40 UTC, John Daiker	committed	Details \| Review

Description William Jon McCann 2006-07-13 16:33:35 UTC

Taking a look at memory usage...

Comment 1 William Jon McCann 2006-07-13 16:36:34 UTC

Created attachment 68877 [details]
valgrind start up with plugins enabled

Profile of RB startup and then exit from menu.  Most plugins are enabled.

Comment 2 William Jon McCann 2006-07-13 16:37:55 UTC

Created attachment 68878 [details]
valgrind start up with all plugins disabled

Same but with all plugins disabled.

Comment 3 James "Doc" Livingston 2006-07-24 10:06:05 UTC

Created attachment 69465 [details] [review]
use RBRefString for LOCATION property

One thing that might help is using RBRefStrings for the entry URIs. It does mean that there will be an additional integer and two pointers for every entry in the DB, however this is offset by a few things:

During startup the db g_strdup()s the URIs for several things, which a g_free()d a short time later. Using RBRefString stop the allocation/deallocation which reduces heap-fragmentation. Although the amount of memory "in use" is slightly higher, the actual size of the heap seems (for me) to be slightly smaller due to a less fragmented heap.

RBStaticPlaylistSource holds a g_strdup()d copy of the URI of each entry in it. Which means we have an extra URI for each time the entry is in a static playlist. Using RBRefString would allow us to just ref it, instead of copying. For me this saves ~500k of space, and potentially more for someone who heavily uses static playlists.


This patch makes RhythmDBEntry use RBRefStrings for the location and playback-error properties, refs the instead of strdup()ing thoughout the rhtyhmdb and static playlist code.

RBRefString also has an advantage when used in hash tables/maps, it only requires a strcmp when created, not whenever it is looked up.

Comment 4 Jonathan Matthew 2006-07-24 12:08:12 UTC

Looks like a good idea to me.

Comment 5 William Jon McCann 2006-07-25 16:00:31 UTC

Created attachment 69591 [details] [review]
resync with CVS

Resync with CVS after the leak fix changes.

This looks good to me as a first step.

The next step I think is to break up the URI into parts and store those refstrings separately.  Most of the URIs will have the exact same prefix if they are in the "Library".  We should be able to store that once.  Then, most of the time, each track in an album has the same album name in the URI too.  Thought that is a smaller effect.

I think it would be hard to add something like this internally to RefEntry since the segmentation of the string would be difficult to do generally.  On the other hand it is pretty easy for URIs since the separator is well known and there are no translation issues.

We might be able to change the location field to be a GPtrArray of RefStrings and change the db->priv->entries hash back to a string hash of uris.

Comment 6 William Jon McCann 2006-07-26 16:25:56 UTC

Created attachment 69675 [details] [review]
experiment with refwords

On the other hand.  One of the potential problems with using consts is that it requires that the strings (all strings) stay loaded at all times.  Here's an experiment that changes refstrings to optionally be segmented into "words".  The idea would be to only create the strings on demand.  To use this most effectively we'd have to modify all the other data models to store the refstring directly instead of a copy of the generated string.

I wonder if this scales better this way.  With the library I have on my development system (~1000 songs) the memory usage with this patch is about the same as HEAD maybe a little higher - probably due to fragmentation or some leaks introduced by un-consting.  Anyway food for thought.

Comment 7 James "Doc" Livingston 2006-07-27 02:12:17 UTC

I've committed my patch to cvs.

I'd though about something like that before, but in a naive implementation the overhead is probably too great. A RefString will need a pointer to each of the RefWords, an int for the number of RefWords (or a pointer to be NULL terminated) and the overhead of malloc()ing the GPtrArray block. On a 32 bit system that adds up to almost as much as we save by not duplicating data, and on a 64 bit system (where pointers are twice as big) it's usually much worse.

I had an idea last night, on a potentially smarted way to do this for URIs. We would make the location be split into exactly two RefStrings, where the first is always a valid prefix listed by the entry-type or NULL.

When adding a new entry we see if the URI starts with any of the valid prefixes, and split it there if it is, or set the prefix to NULL and the suffix to the entire URI.

For the SONG entry-type, the prefixes would be the user-set library location(s), as that's where most people keep their music. For removable media, it would be the mount point.

This would probably save a fair bit of memory because most entries would have a valid prefix. There wouldn't be that much overhead because we can just have two RBRefString pointers in RhythmDBEntry, and we don't have to malloc() a GPtrArray for it.

In any case, I think getting rid of the const char* use would be helpful, and make everything use either char* or RBRefString*. I guess a first pass would be to get rid of rhythmdb_entry_get_string and replace it with either _dup_string or _get_refstring.

Comment 8 Nguyen Thai Ngoc Duy 2006-07-27 07:03:45 UTC

There is a problem with this commit:

backtrace of rhythmbox --g-fatal-warnings
RhythmDB-WARNING **: attempting to create entry that already exists: \xa4\u0001

gdb> bt full

+ Trace 69680

#0 __kernel_vsyscall
#1 waitpid
from /lib/libc.so.6
#2 libgnomeui_module_info_get
from /usr/lib/libgnomeui-2.so.0
#3 <signal handler called>
#4 __kernel_vsyscall
#5 raise
from /lib/libc.so.6
#6 abort
from /lib/libc.so.6
#7 g_logv
from /usr/lib/libglib-2.0.so.0
#8 g_log
from /usr/lib/libglib-2.0.so.0
#9 rhythmdb_entry_new
at rhythmdb.c line 1088
#10 rhythmdb_idle_poll_events
at rhythmdb.c line 1666
#11 g_main_context_is_owner
from /usr/lib/libglib-2.0.so.0
#12 g_main_context_dispatch
from /usr/lib/libglib-2.0.so.0
#13 g_main_context_acquire
from /usr/lib/libglib-2.0.so.0
#14 g_main_loop_run
from /usr/lib/libglib-2.0.so.0
#15 IA__gtk_main
at gtkmain.c line 1000
#16 main
at main.c line 375

Comment 9 Nguyen Thai Ngoc Duy 2006-07-27 07:17:07 UTC

Without --g-fatal-warnings (and within valgrind, I can't run it because glibc detected, invalid free)

(rhythmbox:23735): RhythmDB-CRITICAL **: rb_refstring_unref: assertion `val->refcount > 0' failed

(rhythmbox:23735): RhythmDB-CRITICAL **: rb_refstring_unref: assertion `val->refcount > 0' failed
RhythmDB-ERROR **: file rhythmdb.c: line 1978 (queue_stat_uri): assertion failed: (uri && *uri)
aborting...

Comment 10 Nguyen Thai Ngoc Duy 2006-07-27 08:14:49 UTC

Hmm.. I somehow got entries with empty locations :(

Comment 11 James "Doc" Livingston 2006-07-27 10:48:07 UTC

I've committed a fix to cvs that might (hopefully) fix the crash. I missed a g_free->rb_refstring_unref change.

Comment 12 James "Doc" Livingston 2006-07-28 08:37:37 UTC

Created attachment 69802 [details] [review]
Use refstrings for DAAP playlists too

Similar to the change for static playlists, this uses RBRefString for daap playlists URIs.

Comment 13 James "Doc" Livingston 2006-08-02 12:38:31 UTC

I've committed the daap playlist patch.

Comment 14 James "Doc" Livingston 2006-08-07 07:28:45 UTC

I've taken some measurements of RB's heap usage with different things used:

* Baseline (no plugins, empty library): 5Mb
About half of this seems to be Pango/Freetype related.

* 5000 song library (no plugins, no playlists): 8Mb
About 1Mb of refstrings (most of which are URIs), 750k of RhythmDBEntry structures, and assorted other stuff (more Pango/FT, maps for the query models etc).

* 5000 song library, 5 large auto playlists (no plugins): 10Mb
Most of the extra space is taken up for the query models' forward (GSequence) and reverse (GHashTable) maps, and a lesser amount by the RhythmDBPropertyModel's entry-present table.

* 5000 song library, 1 static playlist containing all 5k tracks (no plugins): 9.5Mb:
As above, most of the extra space was taken up by the query model's per-entry data, but RBPlaylistSource's entry-map added data too.

* 5000 song library (All C plugins, no playlists): ~8Mb
Turning on all the C plugins makes basically no difference

* 5000 song library (Sample Python plugins, no playlists): 11Mb
Loading Python and PyGTK is fairly expensive.

* 5000 song library (All Python plugins, no playlists): ~11Mb
Turning on extra Python plugins increase heap size significantly.

Results:

* Activating python plugins support is expensive; but once you have one loaded, extra python plugins don't make much of a difference.

From what I've heard PyGTK loads *everything* when you use any of it, which uses a fair bit of memory, and the PyGTK people are currently working on this. I'm not sure how much that will reduce it though.

* The data structures RhythmDBQueryModel uses take up a fair bit of space, however they are used because they reduce a large number of operations' running times (e.g. logN instead of N, NlogN instead of N^2, etc).

It would be *great* if we could find some structure that used less memory without making the asymoptotic running time worse. But it would also be good if we looked at how they are used, to see whether the tradeoff is actually worth it - increasing from logN to N may not make much of a difference (but it might).

* How useful is RBPlaylistSource's entry map? could we replace the "is in map" operation by "rhythmdb_query_model_entry_to_iter (rhythmdb_lookup_by_location())" ?

* Does RhythmDBPropertyModel really need a hash table of all the entries it is counting? The information is necessary because otherwise it doesn't know what it has counted, but there might be a smarter way of storing it.

Comment 15 Jonathan Matthew 2006-08-07 22:12:49 UTC

RBPlaylistSource's entry map is used to handle entries that don't yet exist - if the entry location is in the map when the entry is created, the entry gets added to the playlist.  We could use a separate map for pending entries instead, or just a list.  Using rhythmdb_query_model_entry_to_iter would require some reworking of the playlist add/remove functions, but that's probably a good thing.

Measurements with the DAAP server enabled, both with and without playlists, would also be interesting.

Comment 16 Jonathan Matthew 2006-09-03 22:21:52 UTC

Created attachment 72160 [details] [review]
reduce property model memory usage

This changes the property model entries hash table to only store hidden entries.  I've tested it with hidden<->visible changes and property value changes, and it seems to work correctly, but more testing would be good.

This reduces heap usage by about 2M for me (somewhere around 15%).

Comment 17 James "Doc" Livingston 2006-09-16 15:30:49 UTC

Looks okay to me, and reduces my heap usage a reasonable amount too.

Comment 18 Jonathan Matthew 2006-09-17 06:36:04 UTC

OK, committed.

Comment 19 Soren Sandmann Pedersen 2007-02-16 21:49:22 UTC

Would it be possible to get rid of the reverse_map's by simply storing an EggSequenceIter in RhythmDBEntry?

(FWIW; I don't know the rhythmbox code at all)

Comment 20 James "Doc" Livingston 2007-02-18 03:31:40 UTC

That wouldn't work, as an entry (i.e. a track) can be in multiple query models.

If the query model is sorted (which all except the ones for static playlists are), we could use egg_sequence_search to do the reverse lookup. The important question is whether the log(n) time version the current O(1) time will make any noticable difference.

Comment 21 Soren Sandmann Pedersen 2007-02-18 04:28:58 UTC

Maybe each entry could store a GPtrArray of all the EggSequenceIters it has and then use egg_sequence_iter_get_sequence() to find out which of the models the iter corresponds to? Assuming the number of models is small, a linear search in such an array should not be too bad. This would also be O(log n) since egg_sequence_iter_get_sequence() is O(log n).

I don't know if that O(log n) would make a noticable difference vs. current O(1), but the treap based GSequence in current glib trunk is quite a bit faster than the EggSequence in rhythmbox, in particular for things like get_sequence() which could be rather slow with the splay tree.

Comment 22 James "Doc" Livingston 2007-02-18 06:31:58 UTC

Created attachment 82783 [details] [review]
remove reverse_map

(In reply to comment #21)
> Maybe each entry could store a GPtrArray of all the EggSequenceIters it has and
> then use egg_sequence_iter_get_sequence() to find out which of the models the
> iter corresponds to? Assuming the number of models is small, a linear search in
> such an array should not be too bad. This would also be O(log n) since
> egg_sequence_iter_get_sequence() is O(log n).

That could work, although query-models sit at the layer above the core database and entries - so we'd be pushing upper layer stuff down.


> I don't know if that O(log n) would make a noticable difference vs. current
> O(1), but the treap based GSequence in current glib trunk is quite a bit faster
> than the EggSequence in rhythmbox, in particular for things like get_sequence()
> which could be rather slow with the splay tree.

The attached patch replaces the hash-table use with a sequence search, for query models which are sorted.

For me (~6000 songs, ~10 large auto playlists, optimisation disabled) this make Rhythmbox use around 5% more CPU-time on startup, for a 400-500 Kb reduction in memory use. With an optimised build


The extra CPU time doesn't really cause a noticable delay, since most of it is processing my auto playlists, which I'm not looking at when Rhythmbox starts. Query model operations apart from the initial "build everything" at startup are pretty short, and the speed difference is lost in the noise.

I'll have to look into glib's GSequence to see whether the treap makes a difference.

Comment 23 James "Doc" Livingston 2007-03-21 07:45:51 UTC

Created attachment 85020 [details] [review]
updated 'remove reverse map'

Using GSequence, this seems to make my startup about 0.1 seconds slower and reduces the memory usage by a few hundred kb.

Comment 24 John Daiker 2007-11-25 05:40:03 UTC

Created attachment 99604 [details] [review]
Reduce definitely lost leakage without plugins

This patch reduces "definitely lost" memory from "92,109 bytes in 344 blocks" to "81,966 bytes in 169 blocks".  This is with a small 41 song library, and also a no plugins enabled.  I used the following valgrind configurations:
G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind --tool=memcheck --leak-check=full --leak-resolution=high --show-reachable=yes /usr/local/bin/rhythmbox

Comment 25 James "Doc" Livingston 2007-11-25 07:27:48 UTC

Committed to svn, with a couple of others I had found locally.

Comment 26 Jonathan Matthew 2007-11-26 11:09:48 UTC

I've reverted this change:

===================================================================
--- shell/rb-tray-icon.c	(revision 5455)
+++ shell/rb-tray-icon.c	(working copy)
@@ -218,6 +218,7 @@
 	g_object_ref (icon->priv->playing_image);
 	g_object_ref (icon->priv->not_playing_image);
 
+	gtk_container_add (GTK_CONTAINER (icon->priv->ebox), icon->priv->playing_image);
 	gtk_container_add (GTK_CONTAINER (icon->priv->ebox), icon->priv->not_playing_image);
 
 	gtk_container_add (GTK_CONTAINER (icon), icon->priv->ebox);

because it causes gtk warnings on startup.  John, if this change was intentional, could you explain why it's necessary?

Comment 27 William Jon McCann 2008-10-01 21:04:59 UTC

This bug has gotten a bit general and hasn't seen action in a while.  Going to mark it fixed for now.  Specific issues can be addressed in new bugs.