Bug 334277 – Static Index rebuilding daemon

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 334277 - Static Index rebuilding daemon


Summary:	Static Index rebuilding daemon


Status:	RESOLVED WONTFIX

Product:	beagle
Classification:	Other
Component:	General
Version:	unspecified
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	Beagle Bugs
QA Contact:	Beagle Bugs

URL:
Whiteboard:	gnome[unmaintained]

Depends on:
Blocks:

Reported:	2006-03-12 02:03 UTC by Alexander Macdonald
Modified:	2018-07-03 09:52 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
beagle-static-index-daemon.cs (4.71 KB, text/plain) 2006-03-12 02:04 UTC, Alexander Macdonald	Details

Description Alexander Macdonald 2006-03-12 02:03:51 UTC

maintaing a static index manually is a bit boring so i put together this little
app which runs beagle-build-index if there is any activity in the target
directories. An activity timeout lets you specify how long to wait after the
last bit of activity before running the indexer.

some points:
1 - would it be better to build this into beagle-build-index itself?
2 - its currently just an app, i dont know how you make it run as a daemon with c#
3 - shouldn't this functionality be part of the main beagle daemon itself?

Comment 1 Alexander Macdonald 2006-03-12 02:04:46 UTC

Created attachment 61113 [details]
beagle-static-index-daemon.cs

Comment 2 Debajyoti Bera 2006-03-12 02:24:24 UTC

Maybe I am not reading it correctly. How is this different from adding the directory(ies) as roots to beagle's file system backend ? Also, why do you want to call it a static index when you are monitoring it actively and updating regularly ?

Comment 3 Alexander Macdonald 2006-03-12 10:52:23 UTC

(In reply to comment #2)
> Maybe I am not reading it correctly. How is this different from adding the
> directory(ies) as roots to beagle's file system backend ? Also, why do you want
> to call it a static index when you are monitoring it actively and updating
> regularly ?
> 

well this is because beagle cant do networked searches. I have a server with a directory called /foo that is nfs exported to my other machines as /foo, instead of having them all index it over nfs (slow) i just have the server create a static index in /foo/index which all the client machines use... unfortunately the contents of /foo are not that static and i want the index to be as uptodate as possible, this seems to be a better solution than scheduling a cron job every x minutes

Comment 4 Debajyoti Bera 2006-04-02 20:25:06 UTC

Can you try the following:

In the server start beagled as usual
$ beagled ...

In the client machines (where the exported path is same as that on the server),
$ beagled --disable-scheduler

That would allow the full daemon to run in the server, doing updates as usual, using the Files backend. And on the clients machines beagled would run only in "query mode" i.e. would perform no indexing/crawling. Theoretically, lucene is machine-safe i.e. same index can be used across different machines but there might additional complications due to something in beagle. I dont have the resources to try myself, but if this works, it might be very useful for people who have their bulk data in some shared server.

Could you give it a try and post what happens (e.g. if the roof crashes down ;) )?

Comment 5 Alexander Macdonald 2006-04-02 22:26:49 UTC

On the server i exported the following folders as read only:

/home/alex/.beagle/Indexes
/home/alex/.beagle/TextCache

I then mounted these in the same place on the client, unfortunately beagle on the client says "Warn: Likely sqlite database version mismatch trying to read from /home/alex/.beagle/TextCache/TextCache.db.  Purging."

even though i have the same version of sqlite on each machine... and obviously it cant purge the file because its read only. It complains about this a few times before settling down and i can query the beagle daemon but it returns no results because it obviously isnt using the textcache

any ideas?

Comment 6 Debajyoti Bera 2006-04-02 22:40:02 UTC

Ok. So its halfway working.

What do you mean by "i can query the beagle daemon but it returns no
results because it obviously isnt using the textcache" ... it crashes, doesnt return any snippets ? (try with beagle-query)

It might also want config (or might otherwise shout), as of now export .beagle/ to be safe.

The sqlite error is worrying. You are absolutely sure that you have the same sqlite versions on both machines - right ? Which version ... 2 or 3 ?

Comment 7 Debajyoti Bera 2006-04-02 22:43:53 UTC

From the code, it looks like server is using sqlite 2 and client is using sqlite 3. Can you confirm that both are same version ? (If you have command like sqlite/sqlite3 programs then you can use that too ... or I think the first line of a sqlite db file says which version it is).

Comment 8 Debajyoti Bera 2006-04-03 03:32:38 UTC

Sqlite faq claims that sqlite might not work correctly if used over nfs. There are some issues with locking. (http://www.sqlite.org/faq.html#q7)

Does anyone know the status of linux nfs implementation wrt locking ?

Comment 9 Debajyoti Bera 2006-04-12 13:15:13 UTC

(In reply to comment #0)
> maintaing a static index manually is a bit boring so i put together this little
> app which runs beagle-build-index if there is any activity in the target
> directories. An activity timeout lets you specify how long to wait after the
> last bit of activity before running the indexer.
> 
> some points:
> 1 - would it be better to build this into beagle-build-index itself?
> 2 - its currently just an app, i dont know how you make it run as a daemon with
> c#
> 3 - shouldn't this functionality be part of the main beagle daemon itself?
> 

I recommend this be included in the distribution as something like beagle-monitor-index
Searching in nfs server is suddenly the new craze in town (possibly due to the sudden exposure as a part of FC5). Beagle provides a way of building static in client with periodic scanning and copying the index back and forth. Alex's tool will help building static index in server. Sounds useful.
To make it useful, I have a suggestion:
- creating a beagle-build-index everytime is expensive, maybe add an option "--threshold num" to fire the process after every num changes ?

Also, beagle-build-index doesnt currently handle deletions of files (its tricky and expensive). So, a warning message might be printed to inform the user about this. This doesnt cause any usability problem since, IIRC, beagle wont return query results if the file doesnt exist even if the lucene query returned deleted files.

Comment 10 André Klapper 2018-07-03 09:52:13 UTC

Beagle is not under active development anymore and had its last code changes in early 2011. Its codebase has been archived (see bug 796735):
https://gitlab.gnome.org/Archive/beagle/commits/master

"tracker" is an available alternative.

Closing this report as WONTFIX as part of Bugzilla Housekeeping to reflect
reality. Please feel free to reopen this ticket (or rather transfer the project
to GNOME Gitlab, as GNOME Bugzilla is deprecated) if anyone takes the
responsibility for active development again.