After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 169222 - Remove files from index when a FSQ root is removed
Remove files from index when a FSQ root is removed
Status: RESOLVED WONTFIX
Product: beagle
Classification: Other
Component: General
unspecified
Other Linux
: Normal normal
: Milestone 2
Assigned To: Beagle Bugs
Beagle Bugs
gnome[unmaintained]
Depends on:
Blocks:
 
 
Reported: 2005-03-04 18:56 UTC by Wouter Bolsterlee (uws)
Modified: 2018-07-03 09:52 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Root dropping (5.93 KB, patch)
2005-07-11 22:59 UTC, Daniel Drake
needs-work Details | Review
Exclude pattern dropping (4.18 KB, patch)
2005-07-17 23:42 UTC, Daniel Drake
committed Details | Review
HitIsValid on entire index (5.02 KB, patch)
2005-07-31 17:21 UTC, Daniel Drake
none Details | Review
Recrawl all directories on exclude pattern removal (4.13 KB, patch)
2005-08-11 17:32 UTC, Daniel Drake
none Details | Review
Root dropping fix (2.13 KB, patch)
2005-08-11 19:50 UTC, Daniel Drake
none Details | Review
Forget about newly excluded paths (1.66 KB, patch)
2005-08-11 22:53 UTC, Daniel Drake
none Details | Review
Expire inotify watches when we drop a root or add an exclude path (1.03 KB, patch)
2005-08-12 16:00 UTC, Daniel Drake
none Details | Review
New watch dropping stuff (1.93 KB, patch)
2005-08-30 00:14 UTC, Daniel Drake
committed Details | Review

Description Wouter Bolsterlee (uws) 2005-03-04 18:56:57 UTC
Beagle should filter out hits that are in .noindex directories. This situation
arises when you decide to add a .noindex file to a directory after it was
indexed by beagle (because it has to many irrelevant hits, like a directory
somewhere lying around in your downloads tree containing dictionary files).

Currently, Beagle doesn't remove those files from the index.

Jon Trowbridge about this issue in #dashboard:

19:32:21 < uws> Will beagle delete stuff from the index after adding a .noindex
file to a
                directory after that directory was indexed?
19:45:03 < trow> uws: No, it won't.  We really should filter out hits on
.noindexed files at
                 query-time... that would cause us to queue up a delete for that
index item.
19:45:20 < uws> trow: "should"
19:45:26 < uws> trow: That means it's not implemented yet :(
19:45:35 < trow> uws: Exactly.  Could you file a bug about that?
19:45:45 < trow> uws: It wouldn't really be that hard to do.
19:45:54 < trow> uws: I'll just forget about it if it isn't in bugzilla.
19:46:06 < uws> trow: But it involves querying the filesystem for each hit...
19:47:10 < trow> We already have to query file filesystem for every (filesystem)
hit,
                 because we don't want to show hits for files that no longer
exist.  And we
                 cache a lot of the .noindex information in memory, so I think
we can do the
                 check pretty efficiently.

Thanks.
Comment 1 Daniel Drake 2005-03-05 01:29:14 UTC
I've had a go at this. The fix is relatively simple but it highlights a problem
with our caching. Will investigate more sometime soon!
Comment 2 Daniel Drake 2005-03-08 23:30:26 UTC
Just to update this bug with progress, me and Fredrik are working on producing
some sort of config file for configuring beagle, and dropping the .noindex /
.neverindex functionality altogether to reduce complexity.
Comment 3 Jon Trowbridge 2005-03-09 19:22:46 UTC
Once you get the .noindex stuff revamped, feel free to tackle this if you have
any extra time.
Comment 4 Daniel Drake 2005-06-02 18:24:55 UTC
There are a number of situations that FSQ needs to handle appropriately. Here's
a list of things I've come up with so far.

We need to be able to remove roots on-the-fly.

If a root is removed we need to forget all the indexes on that root
(Or do we just modify HitIsValid to check that a hit is within a root? I think
the former...)

If we ignore a pattern, we need to forget all the indexes on affected files
(Or, again, do we do this through HitIsValid? Not sure what is best here..)

If we unignore a pattern, we need to recrawl stuff. How much? Everything?

If a root is added, we need sanity checks, like:
1. does it have EA's
2. is it inside another root
3. is another root inside it
Comment 5 Daniel Drake 2005-07-06 20:47:49 UTC
When removing a root, or adding an ignore pattern/path, we should immediately
drop all indexes on matching hits. (This may be easiest to achieve by firing a
special query, and letting HitIsValid do the hard work)

When removing an ignore pattern, we should mark the entire tree as dirty.

When removing an ignore path, we should mark the affected path as dirty.
Comment 6 Daniel Drake 2005-07-11 22:59:51 UTC
Created attachment 48990 [details] [review]
Root dropping

When we drop a root, immediatetly remove it from the indexes and remove the
file attributes
Comment 7 Daniel Drake 2005-07-17 23:35:20 UTC
With regard to the root dropping patch, we decided that flooding the scheduler
is a bad idea, we should create some sort of periodic optimization routine instead:

<trow> yeah, just some regularly-scheduled index maintenance
<dsd> so basically... go over every file in the index, check HitIsValid, remove
if not valid
<trow> Yeah, something like that.

Comment 8 Daniel Drake 2005-07-17 23:42:12 UTC
Created attachment 49336 [details] [review]
Exclude pattern dropping

- When we drop an exclude pattern, we need to recrawl the entire FS tree to
  pick up those files that we previously ignored
- DirectoryPrivate.SetAllToUnknown_Unlocked needs to consider the situation
  where there are no children
- When examining directory children, if ScanOne_Unlocked finds that we already
  know about the child, it should check the state of the child to see if it
  needs a scan anyway.
- FSM.SetAllToUnknown should fire off a scan request so that everything gets
  rescanned/recrawled

The end result of this is that FSM.SetAllToUnknown now does the right thing,
rather than not doing much at all - this means that inotify queue overflows
will now be handled correctly.
Comment 9 Daniel Drake 2005-07-31 17:21:08 UTC
Created attachment 50021 [details] [review]
HitIsValid on entire index

ValidateContents should be invoked periodically when beagled isn't busy. This
is untested. I'm not sure how to decide when ValidateContents should be
invoked.

Something like this combats root dropping, exclude path adding, and exclude
pattern adding.

(against branch)
Comment 10 Daniel Drake 2005-08-11 17:32:53 UTC
Created attachment 50581 [details] [review]
Recrawl all directories on exclude pattern removal
Comment 11 Daniel Drake 2005-08-11 19:50:42 UTC
Created attachment 50586 [details] [review]
Root dropping fix

Fix the DirectoryModel.FullName exception that appeared when you queried for
hits on a removed root.
Comment 12 Daniel Drake 2005-08-11 22:53:11 UTC
Created attachment 50598 [details] [review]
Forget about newly excluded paths
Comment 13 Daniel Drake 2005-08-12 16:00:07 UTC
Created attachment 50622 [details] [review]
Expire inotify watches when we drop a root or add an exclude path
Comment 14 Daniel Drake 2005-08-30 00:14:49 UTC
Created attachment 51529 [details] [review]
New watch dropping stuff

Revamp of earlier work which only takes effect on exclude/remove (not rename)
Comment 15 Joe Shaw 2005-11-04 20:53:42 UTC
What's the state of this bug?
Comment 16 Daniel Drake 2005-11-05 01:58:27 UTC
Need to restart my efforts now that FSQ has become less of a moving target.
Here's my list of cases we need to account for. Some are handled already, some
are not.

1. Add exclude pattern
Recursively drop matching internal directory references.
Wait for ValidateContents to remove from index.

2. Remove exclude pattern
Mark entire fs as dirty, recrawl. 

3. Add exclude path
Recursively drop internal directory references.
Wait for ValidateContents to remove from index.

4. Remove exclude path
Insert back into internal structure and mark for crawling.

5. Add root
Add to internal structure and crawl.

6. Delete root
Recursively drop internal directory structure.
Wait for ValidateContents to remove from index.

7. Should not allow addition of root-inside-root

8. Should handle new root which is parent of existing

Comment 17 Kevin Kubasik 2006-09-22 03:25:29 UTC
Anyone have any idea for status on any of this? looks pretty old...
Comment 18 Joe Shaw 2006-11-29 04:57:57 UTC
Reopening this bug; not sure why it was closed, but I just ran into it today.  When roots are deleted, the files underneath the root are not removed from the index.
Comment 19 Joe Shaw 2007-02-07 18:40:28 UTC
*** Bug 405317 has been marked as a duplicate of this bug. ***
Comment 20 Joe Shaw 2007-02-07 18:42:11 UTC
Comment #16 is a pretty accurate state of things, I'm changing the summary of the bug to be "Remove files from index when a FSQ root is removed"
Comment 21 stephan.hegel 2007-02-10 04:53:53 UTC
Marking 405317 as duplicate in #19 leaves out the issues in beagle-search with the wrong calculation of the total matches. If an item is excluded it disappears almost instantly from beagle-search but even after implementing #16/3 the update of the total number of matches depends on the time to drop the directory references. 
Comment 22 Wouter Bolsterlee (uws) 2008-09-17 21:13:50 UTC
This bug is quite old and perhaps obsolete. If so, please close this, maintainers.
Comment 23 Joe Shaw 2008-09-17 22:58:45 UTC
This is still an issue, unfortunately.
Comment 24 André Klapper 2018-07-03 09:52:46 UTC
Beagle is not under active development anymore and had its last code changes in early 2011. Its codebase has been archived (see bug 796735):
https://gitlab.gnome.org/Archive/beagle/commits/master

"tracker" is an available alternative.

Closing this report as WONTFIX as part of Bugzilla Housekeeping to reflect
reality. Please feel free to reopen this ticket (or rather transfer the project
to GNOME Gitlab, as GNOME Bugzilla is deprecated) if anyone takes the
responsibility for active development again.