After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 714134 - periodic database cleanup
periodic database cleanup
Status: RESOLVED FIXED
Product: geary
Classification: Other
Component: engine
master
Other All
: High normal
: 0.10.0
Assigned To: Geary Maintainers
Geary Maintainers
Depends on:
Blocks:
 
 
Reported: 2011-10-08 01:43 UTC by Jim Nelson
Modified: 2014-12-19 01:04 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Charles Lindsay 2013-11-21 20:25:30 UTC


---- Reported by jim@yorba.org 2011-10-07 18:43:00 -0700 ----

Original Redmine bug id: 4236
Original URL: http://redmine.yorba.org/issues/4236
Searchable id: yorba-bug-4236
Original author: Jim Nelson
Original description:

We should have some mechanism to perform periodic clean-up of the database. It
should be non-intrusive to the user.

The primary reason this can happen is because Gmail allows for messages to
exist in multiple folders, meaning if we simply delete a message when it's no
longer present in one folder we'll have to re-download it later if the user
browses the other one(s).

Another reason is to workaround #4235, although I'm hoping if that's fixed
this won't be a problem at all.

Related issues:
related to geary - 4235: Empty rows in MessageTable (Fixed)
related to geary - Feature #6073: Allow user to specify oldest mail stored in
database (Open)
related to geary - Feature #6372: Allow user to specify deletion of old mail (Duplicate)
related to geary - 6460: "Database locked" errors (Fixed)
duplicated by geary - 6184: Vacuum database periodically (Duplicate)



---- Additional Comments From geary-maint@gnome.bugs 2013-07-11 15:17:00 -0700 ----

### History

####

#1

Updated by Adam Dingle over 1 year ago

  * **Tracker** changed from _Bug_ to _Feature_
  * **Subject** changed from _Database cleanup_ to _periodic database cleanup_
  * **Target version** deleted (<strike>_0.1_</strike>)

####

#2

Updated by Jim Nelson 11 months ago

  * **Category** set to _engine_
  * **Target version** set to _0.3.0_

(From #6184):

Geary should vacuum the database periodically so performance doesn't degrade:
http://sqlite.org/lang_vacuum.html

Auto-vacuum is tempting, but as the above link indicates, even if we used it,
it wouldn't solve all performance problems. It may be we need to do a
combination of using auto-vacuum and then periodically using vacuum.

####

#3

Updated by Jim Nelson 9 months ago

  * **Target version** changed from _0.3.0_ to _0.4.0_

####

#4

Updated by Jim Nelson 9 months ago

  * **Assignee** set to _Jim Nelson_
  * **Target version** changed from _0.4.0_ to _0.3.0_

####

#5

Updated by Jim Nelson 9 months ago

  * **Target version** changed from _0.3.0_ to _0.4.0_

####

#6

Updated by Jim Nelson 7 months ago

Also worth reading: http://jeff.ecchi.ca/blog/2011/12/24/investigating-
lifereas-startup-performance/

####

#7

Updated by Jim Nelson 6 months ago

  * **Assignee** deleted (<strike>_Jim Nelson_</strike>)

####

#8

Updated by Jim Nelson 4 months ago

  * **Target version** changed from _0.4.0_ to _0.5.0_



--- Bug imported by chaz@yorba.org 2013-11-21 20:25 UTC  ---

This bug was previously known as _bug_ 4236 at http://redmine.yorba.org/show_bug.cgi?id=4236

Unknown milestone "unknown in product geary. 
   Setting to default milestone for this product, "---".
Setting qa contact to the default for this product.
   This bug either had no qa contact or an invalid one.
Resolution set on an open status.
   Dropping resolution 

Comment 1 Jim Nelson 2014-03-20 18:49:43 UTC
This is something that would be great to get in 0.8.  As it stands, deleted mail is simply piling up in Geary's database, including discarded drafts and trashed or spam messages auto-discarded by the server.
Comment 2 Jim Nelson 2014-12-19 01:04:28 UTC
Geary now will periodically delete unlinked messages from the database and their on-disk attachment files (if any).

By unlinked messages, I mean email that is not associated with any folder on the server.  Geary can't immediately delete an email when it's no longer in a folder because it may be stored in another folder (i.e. All Mail).  What's more, Geary never knows at any point in time if the local database is fully synchronized with the server.  So, garbage collection only occurs on unlinked email that's more than 30 days old.

Garbage collection happens in the background while Geary is executing.  It only runs once every 10 days.

In addition, Geary will vacuum the local database if it detects enough email has been deleted.  Because this operation locks the database, we want to do this very infrequently and only if we have some indication that he benefits outweigh the costs.  As configured now, a vacuum will occur if Geary garbage collects more than 10,000 messages.  Even then, a vacuum can only occur once every 30 days.

For sample metrics, my Geary database for my Yorba email account was holding 99,716 email messages.  It's configured to hold email 2 years back, but because I've been using the database almost continuously since Geary's early days, it's really holding more like 3+ years of email.

Start
-----
database: 2.7G
attachments: 1.2G
total: 3.9G

Garbage collection reaped 49,726 unlinked emails and 3300 unlinked attachment files.  After a complete GC pass:

Post-GC (no vacuum)
-------------------
database: 2.7G
attachments: 611M
total: 3.3G

And after running a vacuum pass:

Post-GC w/ vacuum
-----------------
database: 1.4G
attachments: 611M
total: 2.0G

In aggregate, these two steps cut Geary's disk usage in half.

This is all from email that Geary no longer sees on the server.  This does *not* delete old email still present remotely; that's bug #714101.  (Even if it does remove still-live email, Geary's engine will merely re-download the email when it detects it on the server, but we'd like to prevent that if possible.)

With a vacuum being triggered by 10,000 deleted emails, this means long-term users with a heavy email load will have a vacuum on their second run of Geary after this commit.  (The first run will be for garbage collection.)  After that, even on an active mailbox, I'm guessing the user won't see another vacuum for 6 - 12 months -- but only if they're collecting 25 - 50 unlinked emails per day on average.

The various trigger values can be adjusted, but I think for now they're a happy medium.

Pushed to master, commit 23511d