After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 680172 - Tracker can end up eating >3.8 GiB of disk for meta.db
Tracker can end up eating >3.8 GiB of disk for meta.db
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: General
unspecified
Other Linux
: Normal normal
: ---
Assigned To: tracker-general
Jamie McCracken
Depends on:
Blocks:
 
 
Reported: 2012-07-18 13:57 UTC by D.S. (Spider) Ljungmark
Modified: 2012-08-10 13:58 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Proposed patch (3.68 KB, patch)
2012-07-23 22:27 UTC, Jürg Billeter
committed Details | Review

Description D.S. (Spider) Ljungmark 2012-07-18 13:57:20 UTC
For some unknown reason, .config/user-dirs.dirs was pointing all dirs to be $HOME/.

When org.freedesktop.Tracker.Miner.Files  index-recursive-directories
is ['&DESKTOP', '&DOCUMENTS', '&DOWNLOAD', '&MUSIC', '&PICTURES',  '&VIDEOS']
And index-single-directories is  ['$HOME'],  this caused  tracker to index massive amounts of data, including jhbuild checkouts and other projects.


In turn, this leads to a situation where ~/.cache/tracker Exceeds 3.8 GiB of data.

There should be some kind of safeguard against this situation as to not consume too much data.
Comment 1 Jürg Billeter 2012-07-18 14:06:57 UTC
It may make sense to simply skip the XDG directories that are set to $HOME.
Comment 2 Martyn Russell 2012-07-18 14:27:20 UTC
Yes, I agree. I must admit, we do depend (currently) on the XDG locations being sane. I guess some distros don't have directories for each type of file.
Comment 3 D.S. (Spider) Ljungmark 2012-07-18 14:46:01 UTC
Even more likely that this will bite old-time users that upgrade from a distribution that didn't have the files, or users who drank the Kool-Aid back in the day of using $HOME as ~/Desktop ( Spatial Nautilus, gnome 2.2x days ?)
Comment 4 Martyn Russell 2012-07-18 15:56:07 UTC
Yea :)

It's actually a hard problem to solve nicely. You see if we don't index $HOME recursively then we miss files (in your above situation).

We normally don't index $HOME recursively but then we do for the other defined folders.

Also, we should be filtering duplicates so if all those locations are $HOME we on't do it once anyway. So the solution might be to just make sure we DON'T recursively index any XDG locations set to $HOME.
Comment 5 Jürg Billeter 2012-07-23 22:27:04 UTC
Created attachment 219537 [details] [review]
Proposed patch

The attached patch should fix this bug. Testing and review appreciated.
Comment 6 Martyn Russell 2012-07-24 08:19:33 UTC
Comment on attachment 219537 [details] [review]
Proposed patch

Looks good to me.

The usual test for this is to save/load the tracker-preferences with different targets and make sure the the config file isn't messed up and the list of target locations is sane in the miner-fs debugging. But AFAICS, this patch looks quite good!
Comment 7 Jürg Billeter 2012-07-24 09:23:17 UTC
commit 06f7ac0928e75a1a63f115f9d8cfd7e3096d1c60
Author: Jürg Billeter <j@bitron.ch>
Date:   Tue Jul 24 00:20:37 2012 +0200

    tracker-miner-fs: Ignore XDG directories set to $HOME
    
    This prevents accidental recursive indexing of $HOME.
Comment 8 D.S. (Spider) Ljungmark 2012-07-24 22:56:54 UTC
Question,
 What happens if ~/Documents is a symlink to ~  When Documents is set to that?
Comment 9 D.S. (Spider) Ljungmark 2012-07-24 22:59:15 UTC
The correct question is probably:  how does g_get_user_special_dir () resolve symlinks.

You might want to go a step further and make sure that you use g_file_read_link on the entry, rather than blindly comparing strings.
Comment 10 Martyn Russell 2012-08-10 13:58:21 UTC
(In reply to comment #9)
> The correct question is probably:  how does g_get_user_special_dir () resolve
> symlinks.

It doesn't. The function returns a string based on a value in a file and that's it. It doesn't have to exist at all.
 
> You might want to go a step further and make sure that you use g_file_read_link
> on the entry, rather than blindly comparing strings.

We already do this since 2009-12-11 12:44:05 (and before even) in commit 74286929:

        g_file_enumerate_children_async (ed->dir_file,
                                         attrs,
                                         G_FILE_QUERY_INFO_NOFOLLOW_SYMLINKS,
                                         G_PRIORITY_LOW,
                                         ed->cancellable,
                                         file_enumerate_children_cb,
                                         ed);