After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 659025 - Disable crawler by default
Disable crawler by default
Status: RESOLVED WONTFIX
Product: tracker
Classification: Core
Component: General
unspecified
Other Linux
: Normal normal
: ---
Assigned To: tracker-general
Jamie McCracken
Depends on: 613256
Blocks:
 
 
Reported: 2011-09-14 09:31 UTC by Bastien Nocera
Modified: 2011-12-15 11:43 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Bastien Nocera 2011-09-14 09:31:02 UTC
+++ This bug was initially created as a clone of Bug #613256 +++

Or at least make it opt-in, otherwise applications cannot drive the indexing, and will rely on the tracker indexer getting to their data "at some point", and with no easy way to tell the indexer to index specific locations, or ignore some others.

---


Seeing as my original request was just provided with an option in a text file. The crawler is still enabled by default, and seems far too aggressive on desktop installations, causing people to resent tracker greatly.
Comment 1 Carlos Garnacho 2011-09-14 10:03:28 UTC
Worth noting, Tracker 0.12 uses GSettings for this stuff
Comment 2 Martyn Russell 2011-09-14 10:05:30 UTC
(In reply to comment #0)
> +++ This bug was initially created as a clone of Bug #613256 +++
> 
> Or at least make it opt-in, otherwise applications cannot drive the indexing,

This is something I expect distributions to specify. I will admit, recently I
have been indexing EVERYTHING I have available which includes a number of
external HDs and > 130k files and fine it is noticeable when logging in. We
have some ideas about how to improve that, for example:

 - Disabling indexing of removable media by default
 - Don't add monitors at all for removable media (or basically anything not in
the config)

This would improve the start up speed definitely, but at some point we would
need another mechanism to check the DB is in sync, perhaps by checking when the
user is away from the machine (like at lunch). This is a feature request we've
had for some time.

> and will rely on the tracker indexer getting to their data "at some point",

Applications can use the APIs available over D-Bus if they want to get their
file indexed as a matter of priority, we've tried and tested this with Nokia
devices so we know it works.

> with no easy way to tell the indexer to index specific locations, or ignore some others.

That's quite untrue. The tracker-preferences already has a way to add/remove
locations to index and to ignore directories/files with glob pattern
recognition too. It's been this way for some time.

> Seeing as my original request was just provided with an option in a text file.
> The crawler is still enabled by default, and seems far too aggressive on
> desktop installations, causing people to resent tracker greatly.

Do you have any empirical evidence of the aggressive nature here? We've not had
people generally complain about this for some time.

What sort of dataset are you using?

I think "aggressive" really depends on your set up, I am seeing something
semi-aggressive here too but I have a LOT of content and the most aggressive
thing is setting up inotify monitors for removable media (which I have been
thinking is quite useless as a broad and general rule).

We've made a number of improvements over the past months like not doing any
mtime checks on start up if we shut down cleanly. This makes a HUGE difference
but with a lot of directories (on removable media for example) the crawling
stage can take a bit longer than necessary and isn't that important for me at
least.

I think clearly we have different views on the role of Tracker, you seem to
believe that it should be application driven and I believe it should be both
application driven and detected in real time (with emphasis on the later).
Comment 3 Bastien Nocera 2011-09-14 10:23:24 UTC
(In reply to comment #2)
> (In reply to comment #0)
> > +++ This bug was initially created as a clone of Bug #613256 +++
> > 
> > Or at least make it opt-in, otherwise applications cannot drive the indexing,
> 
> This is something I expect distributions to specify. I will admit, recently I
> have been indexing EVERYTHING I have available which includes a number of
> external HDs and > 130k files and fine it is noticeable when logging in. We
> have some ideas about how to improve that, for example:
> 
>  - Disabling indexing of removable media by default
>  - Don't add monitors at all for removable media (or basically anything not in
> the config)

I think the defaults should be changed upstream, and a bucket list of things to be done to make the experience more interesting file as a tracker bug. This doesn't change the fact that the default should be to _not_ index right now.

> This would improve the start up speed definitely, but at some point we would
> need another mechanism to check the DB is in sync, perhaps by checking when the
> user is away from the machine (like at lunch). This is a feature request we've
> had for some time.
> 
> > and will rely on the tracker indexer getting to their data "at some point",
> 
> Applications can use the APIs available over D-Bus if they want to get their
> file indexed as a matter of priority, we've tried and tested this with Nokia
> devices so we know it works.
> 
> > with no easy way to tell the indexer to index specific locations, or ignore some others.
> 
> That's quite untrue. The tracker-preferences already has a way to add/remove
> locations to index and to ignore directories/files with glob pattern
> recognition too. It's been this way for some time.

Did you realise that you were reading a cloned bug? This was in the original bug.

> > Seeing as my original request was just provided with an option in a text file.
> > The crawler is still enabled by default, and seems far too aggressive on
> > desktop installations, causing people to resent tracker greatly.
> 
> Do you have any empirical evidence of the aggressive nature here? We've not had
> people generally complain about this for some time.
> 
> What sort of dataset are you using?

Hackers, with tons of data.

Tracker got dragged into the default installation through grilo-plugins (used by Totem) and gnome-documents. Fedora 16 (beta) users have been complaining that it:
- hogs the disk I/O and the CPU
- starts indexing external hard disks

> I think "aggressive" really depends on your set up, I am seeing something
> semi-aggressive here too but I have a LOT of content and the most aggressive
> thing is setting up inotify monitors for removable media (which I have been
> thinking is quite useless as a broad and general rule).

The problem is that you've already got the data indexed. Wipe your tracker database, and start from your existing home directory. I don't expect it to be a pleasant experience.

> We've made a number of improvements over the past months like not doing any
> mtime checks on start up if we shut down cleanly. This makes a HUGE difference
> but with a lot of directories (on removable media for example) the crawling
> stage can take a bit longer than necessary and isn't that important for me at
> least.
> 
> I think clearly we have different views on the role of Tracker, you seem to
> believe that it should be application driven and I believe it should be both
> application driven and detected in real time (with emphasis on the later).

The problem being that tracker causes performance problems right now, and will likely do for any GNOME 3.2 user where the distribution follows the upstream defaults.
Comment 4 Bastien Nocera 2011-09-14 10:23:40 UTC
(In reply to comment #1)
> Worth noting, Tracker 0.12 uses GSettings for this stuff

Goodie.
Comment 5 Jürg Billeter 2011-09-14 10:57:51 UTC
As far as I can tell, disabling crawling by default breaks applications using grilo and gnome-documents (for local files). This would definitely avoid performance problems, but I fail to see how this would still allow browsing your local media files and documents.

If you're saying that it just doesn't work well enough yet to be enabled by default, the distro can choose to remove the default directories from the indexed locations or not install tracker by default. For distros that ship tracker with the intention that everything works out of the box, the tracker upstream defaults make sense.

Personally, I'd like to see us moving away from storing data in traditional directory structures and instead, using a higher level API to access (read and write) media, documents, and other data usually stored in files. That is, I'd like to access media and documents using an API that is closer to libfolks or e-d-s than direct filesystem access.

This would completely avoid the need for crawling on startup and recursive directory monitoring at the cost of changes in lots of applications - although FUSE could certainly help providing a compatibility layer.
Comment 6 Martyn Russell 2011-09-14 12:53:07 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #0)
> > > +++ This bug was initially created as a clone of Bug #613256 +++
> > > 
> > > Or at least make it opt-in, otherwise applications cannot drive the indexing,
> > 
> > This is something I expect distributions to specify. I will admit, recently I
> > have been indexing EVERYTHING I have available which includes a number of
> > external HDs and > 130k files and fine it is noticeable when logging in. We
> > have some ideas about how to improve that, for example:
> > 
> >  - Disabling indexing of removable media by default
> >  - Don't add monitors at all for removable media (or basically anything not in
> > the config)
> 
> I think the defaults should be changed upstream, and a bucket list of things to
> be done to make the experience more interesting file as a tracker bug.

Please do that then. Baseless claims about performance don't go anywhere.

> This doesn't change the fact that the default should be to _not_ index right now.

I think Jürg's reply speaks for itself on this matter.

What happens when application A wants to use data which isn't in Tracker, do we tell them - sorry but developers who have a lot of crap most users don't have on their machine didn't want an aggressive start for their desktop so you have to wait for that data to be available?

That's arse-about-face logic IMO. Instead we should avoid indexing development code/environments/trees/etc.
 
> > This would improve the start up speed definitely, but at some point we would
> > need another mechanism to check the DB is in sync, perhaps by checking when the
> > user is away from the machine (like at lunch). This is a feature request we've
> > had for some time.
> > 
> > > and will rely on the tracker indexer getting to their data "at some point",
> > 
> > Applications can use the APIs available over D-Bus if they want to get their
> > file indexed as a matter of priority, we've tried and tested this with Nokia
> > devices so we know it works.
> > 
> > > with no easy way to tell the indexer to index specific locations, or ignore some others.
> > 
> > That's quite untrue. The tracker-preferences already has a way to add/remove
> > locations to index and to ignore directories/files with glob pattern
> > recognition too. It's been this way for some time.
> 
> Did you realise that you were reading a cloned bug? This was in the original
> bug.

Yes, why did you do that? Why not just create a bug stating what you wanted to have happen. The previous bug was about there being no choice, this is about changing the default choice. I don't understand why you cloned a bug for that.
 
> > > Seeing as my original request was just provided with an option in a text file.
> > > The crawler is still enabled by default, and seems far too aggressive on
> > > desktop installations, causing people to resent tracker greatly.
> > 
> > Do you have any empirical evidence of the aggressive nature here? We've not had
> > people generally complain about this for some time.
> > 
> > What sort of dataset are you using?
> 
> Hackers, with tons of data.

Let's see it then. So far I haven't see any statistics about how much data these people have. You can do this easily with tracker-stats and some output from the lines printed by the miner-fs (e.g. how long it takes to index or crawl on first start up). Those would be good to understand where we can improve things.
 
> Tracker got dragged into the default installation through grilo-plugins (used
> by Totem) and gnome-documents. Fedora 16 (beta) users have been complaining
> that it:
> - hogs the disk I/O and the CPU

Perhaps on initial start up, after that it shouldn't be anywhere near as bad. There is also a throttling facility in the config, have you tried that to avoid the disk / CPU use?

> - starts indexing external hard disks

Yes see comment #1.
 
> > I think "aggressive" really depends on your set up, I am seeing something
> > semi-aggressive here too but I have a LOT of content and the most aggressive
> > thing is setting up inotify monitors for removable media (which I have been
> > thinking is quite useless as a broad and general rule).
> 
> The problem is that you've already got the data indexed. Wipe your tracker
> database, and start from your existing home directory. I don't expect it to be
> a pleasant experience.

Please don't judge my development environment without actually knowing what you're talking about. Given I have to release 4 different branches at the moment (0.10, Harmattan, 0.12 and at some point master or 0.13), I have to build, distcheck (which requires make install) and switch between those every day and test bugs in each case, you can hardly say I don't know how the experience fares.

If you think I am using a mediocre set up you're wrong:

$ tracker-stats | grep -e nfo:FileDataObject -e nfo:Image -e nmm:MusicPiece -e nfo:Folder -e nfo:Document
  nfo:Document = 65432
  nfo:FileDataObject = 127421
  nfo:Folder = 9295
  nfo:Image = 25669
  nmm:MusicPiece = 22238

127k files is hardly nothing.
9.2k folders to set up monitors on is not just a 2 second operation either.

> > We've made a number of improvements over the past months like not doing any
> > mtime checks on start up if we shut down cleanly. This makes a HUGE difference
> > but with a lot of directories (on removable media for example) the crawling
> > stage can take a bit longer than necessary and isn't that important for me at
> > least.
> > 
> > I think clearly we have different views on the role of Tracker, you seem to
> > believe that it should be application driven and I believe it should be both
> > application driven and detected in real time (with emphasis on the later).
> 
> The problem being that tracker causes performance problems right now, and will
> likely do for any GNOME 3.2 user where the distribution follows the upstream
> defaults.

So why don't you help us fix those issues instead of just recommending we disable it? If you car doesn't work properly do you just start walking?
Comment 7 Martyn Russell 2011-09-14 12:55:52 UTC
(In reply to comment #6)
> (In reply to comment #3)
> > Tracker got dragged into the default installation through grilo-plugins (used
> > by Totem) and gnome-documents. Fedora 16 (beta) users have been complaining
> > that it:
> > - hogs the disk I/O and the CPU
> 
> Perhaps on initial start up, after that it shouldn't be anywhere near as bad.

Just to clarify, I meant first time index.
Comment 8 Martyn Russell 2011-09-15 16:18:01 UTC
Bastien,

I spoke to Emmanuele Bassi earlier today about Tracker indexing /home recursively. He suspected it was because gvfs exports /home possibly as a removable media mount point. This would also explain why config to index only some directories (those without code) was useless (tracker will override subdirectories if included in higher level directories).

We have pushed a patch today to disable indexing of removable media by default.
You can also use tracker-control -rcs (this will remove your config and use a new default, reindex and start the miners).

You can also use tracker-preferences to disable removable devices being indexed.

Let me know if this is related to what you were concerned with earlier this week.
Comment 9 Aleksander Morgado 2011-12-15 11:43:43 UTC
Will wontfix this, as there are multiple reasons why we shouldn't disable it by default; and the original issue from the reporter seems it was triggered by /home wrongly being detected as removable.