After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 555937 - The database should be regularly ANALYZE'd
The database should be regularly ANALYZE'd
Status: RESOLVED FIXED
Product: banshee
Classification: Other
Component: general
git master
Other Linux
: High normal
: 1.x
Assigned To: Banshee Maintainers
Banshee Maintainers
: 573071 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2008-10-11 18:21 UTC by Bertrand Lorentz
Modified: 2009-03-26 20:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to ANALYZE the DB at startup (4.22 KB, patch)
2008-12-27 02:24 UTC, Aaron Bockover
none Details | Review
banshee --debug-sql following ANALYZE (70.72 KB, text/plain)
2009-03-06 03:38 UTC, Dave B
  Details
Analyze the DB if needed (4.18 KB, patch)
2009-03-07 16:13 UTC, Bertrand Lorentz
committed Details | Review

Description Bertrand Lorentz 2008-10-11 18:21:50 UTC
The ANALYZE statement might improve performance with sqlite indexes.
It is run in BansheeDbFormatMigrator.cs when a migration add an index, but it is not run with a fresh database.

I've seen one example of a banshee.db where after ANALYZE, a query went from 300ms execution time to 22ms.
Comment 1 Gabriel Burt 2008-10-14 14:58:17 UTC
Do you mean just after migrating from 0.13.2, or just after a fresh db w/o any migration, or should we do this after every import of more than N (200?) songs?  You planning on working on this?
Comment 2 Bertrand Lorentz 2008-10-14 17:45:48 UTC
I meant a fresh w/o any migration, but, on second thoughts, it's probably useless to ANALYZE an empty db.
So, as you suggested, we should do an ANALYZE when more than N (to be determined) tracks were imported.
Updating summary to reflect that.

I have no immediate plans to work on this, so feel free to have a go at it ! ;)
Comment 3 Aaron Bockover 2008-12-26 21:00:12 UTC
That is awesome. We should do this ASAP.
Comment 4 Aaron Bockover 2008-12-27 02:24:49 UTC
Created attachment 125373 [details] [review]
Patch to ANALYZE the DB at startup

This patch ANALYZEs the database if the number of tracks that have been modified since the last ANALYZE is greater than some configurable threshold.

The default threshold is 100. I have no idea what an appropriate one would be. It adds two entries to CoreConfiguration:

 - Database.AnalyzeThreshold (number of tracks that require updating before triggering ANALYZE)
 - Database.LastAnalyzed (timestamp of the last ANALYZE, used to build the count)

Please test! It is useful to back up your database first, not because I think this patch will eat your database, but it would be good to have something to compare.
Comment 5 Aaron Bockover 2008-12-27 02:31:57 UTC
aaron@giggidy:~/svn/banshee$ (for ((i=0;i<100;i++)); do (time $SQLITE ~/.config/banshee-1/banshee.db 'select * from CoreTracks' 1>/dev/null) 2>&1 | awk '/^real/{sub(/^real[0-9\t]+m/,""); sub(/s$/,""); print}'; done;) | awk '{sum = sum + $0; } END { print sum / NR }'
0.14137

aaron@giggidy:~/svn/banshee$ (for ((i=0;i<100;i++)); do (time $SQLITE ~/.config/banshee-1/banshee.db.orig 'select * from CoreTracks' 1>/dev/null) 2>&1 | awk '/^real/{sub(/^real[0-9\t]+m/,""); sub(/s$/,""); print}'; done;) | awk '{sum = sum + $0; } END { print sum / NR }'
0.14247

I don't get much difference:

 OLD: 0.14247
 NEW: 0.14137

Maybe there's more difference with INSERT or UPDATE, but I really can't be bothered to test there right now :)

Bertrand: can you provide a test case where ANALYZE really helps?
Comment 6 Sandy Armstrong 2008-12-27 02:35:20 UTC
Very inconclusive for me; I did two runs each:

OLD: 0.14374, 0.14518
NEW: 0.14998, 0.14221
Comment 7 Bertrand Lorentz 2008-12-27 10:54:02 UTC
From what I understand, ANALYZE helps sqlite make better index choices. So the query has to use indexes for it to make any difference.

I'm using the following query, modified from --debug-sql :

SELECT * FROM CoreTracks,CoreAlbums,CoreArtists WHERE CoreArtists.ArtistID = CoreTracks.ArtistID AND CoreAlbums.AlbumID = CoreTracks.AlbumID AND CoreTracks.PrimarySourceID = 1;

The data from "analyze" is stored in the sqlite_stat1 table. From my tests, it seems it's worse to have bad data than none at all. So a scenario would be :
1. Create an empty DB
2. Import a few tracks
3. analyze the DB (banshee does it in some schema migrations)
4. Import lots of tracks

I have a test DB (from a user, so I can send it by e-mail if needed), with 862 tracks in CoreTracks :
- banshee_carp3_orig.db : the original file from the user, sqlite_stat1 says CoreTracks has 146 lines
- banshee_carp3_analyzed.db : the same db, after ANALYZE

(for ((i=0;i<100;i++)); do (time sqlite3 banshee_carp3_orig.db < query.sql 1>/dev/null) 2>&1 | awk '/^real/{sub(/^real[0-9\t]+m/,""); sub(/s$/,""); print}'; done;) | awk '{sum = sum + $0; } END { print sum / NR }'
0.18002
(for ((i=0;i<100;i++)); do (time sqlite3 banshee_carp3_analyzed.db < query.sql 1>/dev/null) 2>&1 | awk '/^real/{sub(/^real[0-9\t]+m/,""); sub(/s$/,""); print}'; done;) | awk '{sum = sum + $0; } END { print sum / NR }'
0.03384

Comment 8 Aaron Bockover 2009-01-07 19:11:00 UTC
That looks pretty significant to me. Maybe this is a good idea to add? I am also contemplating doing something similar with VACUUM.
Comment 9 Bertrand Lorentz 2009-01-07 20:06:38 UTC
I do think the patch should be applied. At the very least, we'll have one less thing to ask users who are complaining about banshee being slow.

Btw, the patch works fine for me.

A quick test about VACUUM :
With another DB from a user (I'm starting a collection), the original file was 12MB, and 8.5MB after VACUUM. I used the same query as before.

(for ((i=0;i<100;i++)); do (time sqlite3 banshee_dn_analyzed.db  < query.sql 1>/dev/null) 2>&1 | awk '/^real/{sub(/^real[0-9\t]+m/,""); sub(/s$/,"");print}'; done;) | awk '{sum = sum + $0; } END { print sum / NR }'

0.01611

(for ((i=0;i<100;i++)); do (time sqlite3 banshee_dn_analyzed_vacuumed.db < query.sql 1>/dev/null) 2>&1 | awk '/^real/{sub(/^real[0-9\t]+m/,""); sub(/s$/,"");print}'; done;) | awk '{sum = sum + $0; } END { print sum / NR }'

0.01497

So an improvement of less than 10% in this case.
Comment 10 Gabriel Burt 2009-01-14 15:45:41 UTC
This patch is pretty complicated.  I mentioned to Aaron that we could just run ANALYZE every 10th startup or something.

Here are some numbers for general consideration:

size of d | count(coreTracks)| ANALYZE | 2nd ANALYZE | VACUUM | size after
7.6M           806             0.257s    0.144s         3.096s  4.5M
31M          22582             0.946s    0.197s         6.121s  29M
2.6M          1212             0.179s    0.129s         0.905s  2.6M
3.2M             7             0.092s    0.105s         0.653s  1.8M
6.5M          9193             0.228s    0.108s         1.333s  6.3M
68M          31391             6.239s    0.265s        11.185s  65M
110M         58099             17.320s   0.445s        19.169s  108M
Comment 11 Bertrand Lorentz 2009-03-05 06:49:01 UTC
*** Bug 573071 has been marked as a duplicate of this bug. ***
Comment 12 Bertrand Lorentz 2009-03-05 19:59:33 UTC
This idea just hit me while reading Gabriel's comment on bug #573071 :

ANALYZE creates a table named sqlite_stat1 and populates it with some stats. I has 3 columns :
- tbl : table name
- idx : index name
- stat : a space separated list of integers. The first number seems to be the number of records in the table when ANALYZE was last run.

We could use that to figure out if we should run ANALYZE again because the number of records has changed significantly since the last time it was run.
I might have a go at adapting Aaron's patch to do that, if the weather stays bad this weekend ;)
Comment 13 Gabriel Burt 2009-03-05 21:10:40 UTC
Nice, yeah, we can also check that the table even exists (doesn't when you first create a db, afaict).  Assuming we've ANALYZE'd everything before, we can actually run analyze for a particular table as needed when its counts differ.
Comment 14 Dave B 2009-03-06 03:38:00 UTC
Created attachment 130176 [details]
banshee --debug-sql following ANALYZE

Bertrand / Gabriel,

Attached is the log from banshee --debug-sql after I ran "sqlite3 banshee.db" with "VACUUM; ANALYZE" options.
Comment 15 Bertrand Lorentz 2009-03-07 16:13:32 UTC
Created attachment 130236 [details] [review]
Analyze the DB if needed

This patch modifies Aaron's patch to implement the idea in comment #12 and the suggestion in comment #13.
I've explicitly listed the tables that have indexes, instead of looking them up in sqlite_master, for the following reasons :
- there might be useless indexes left in the DB, from 0.X for example.
- I wanted the tables listed with the most relevant ones first, to decide as quickly as possible if ANALYZE is needed.
The ANALYZE is always done for the whole DB, I don't think doing it for each table separately would bring much.

On my system, this adds about 5ms to a 2s startup time when ANALYZE is not run.
Comment 16 Bertrand Lorentz 2009-03-26 20:35:16 UTC
Committed, with a few style changes suggested by Gabriel.