GNOME Bugzilla – Bug 334277
Static Index rebuilding daemon
Last modified: 2018-07-03 09:52:13 UTC
maintaing a static index manually is a bit boring so i put together this little app which runs beagle-build-index if there is any activity in the target directories. An activity timeout lets you specify how long to wait after the last bit of activity before running the indexer. some points: 1 - would it be better to build this into beagle-build-index itself? 2 - its currently just an app, i dont know how you make it run as a daemon with c# 3 - shouldn't this functionality be part of the main beagle daemon itself?
Created attachment 61113 [details] beagle-static-index-daemon.cs
Maybe I am not reading it correctly. How is this different from adding the directory(ies) as roots to beagle's file system backend ? Also, why do you want to call it a static index when you are monitoring it actively and updating regularly ?
(In reply to comment #2) > Maybe I am not reading it correctly. How is this different from adding the > directory(ies) as roots to beagle's file system backend ? Also, why do you want > to call it a static index when you are monitoring it actively and updating > regularly ? > well this is because beagle cant do networked searches. I have a server with a directory called /foo that is nfs exported to my other machines as /foo, instead of having them all index it over nfs (slow) i just have the server create a static index in /foo/index which all the client machines use... unfortunately the contents of /foo are not that static and i want the index to be as uptodate as possible, this seems to be a better solution than scheduling a cron job every x minutes
Can you try the following: In the server start beagled as usual $ beagled ... In the client machines (where the exported path is same as that on the server), $ beagled --disable-scheduler That would allow the full daemon to run in the server, doing updates as usual, using the Files backend. And on the clients machines beagled would run only in "query mode" i.e. would perform no indexing/crawling. Theoretically, lucene is machine-safe i.e. same index can be used across different machines but there might additional complications due to something in beagle. I dont have the resources to try myself, but if this works, it might be very useful for people who have their bulk data in some shared server. Could you give it a try and post what happens (e.g. if the roof crashes down ;) )?
On the server i exported the following folders as read only: /home/alex/.beagle/Indexes /home/alex/.beagle/TextCache I then mounted these in the same place on the client, unfortunately beagle on the client says "Warn: Likely sqlite database version mismatch trying to read from /home/alex/.beagle/TextCache/TextCache.db. Purging." even though i have the same version of sqlite on each machine... and obviously it cant purge the file because its read only. It complains about this a few times before settling down and i can query the beagle daemon but it returns no results because it obviously isnt using the textcache any ideas?
Ok. So its halfway working. What do you mean by "i can query the beagle daemon but it returns no results because it obviously isnt using the textcache" ... it crashes, doesnt return any snippets ? (try with beagle-query) It might also want config (or might otherwise shout), as of now export .beagle/ to be safe. The sqlite error is worrying. You are absolutely sure that you have the same sqlite versions on both machines - right ? Which version ... 2 or 3 ?
From the code, it looks like server is using sqlite 2 and client is using sqlite 3. Can you confirm that both are same version ? (If you have command like sqlite/sqlite3 programs then you can use that too ... or I think the first line of a sqlite db file says which version it is).
Sqlite faq claims that sqlite might not work correctly if used over nfs. There are some issues with locking. (http://www.sqlite.org/faq.html#q7) Does anyone know the status of linux nfs implementation wrt locking ?
(In reply to comment #0) > maintaing a static index manually is a bit boring so i put together this little > app which runs beagle-build-index if there is any activity in the target > directories. An activity timeout lets you specify how long to wait after the > last bit of activity before running the indexer. > > some points: > 1 - would it be better to build this into beagle-build-index itself? > 2 - its currently just an app, i dont know how you make it run as a daemon with > c# > 3 - shouldn't this functionality be part of the main beagle daemon itself? > I recommend this be included in the distribution as something like beagle-monitor-index Searching in nfs server is suddenly the new craze in town (possibly due to the sudden exposure as a part of FC5). Beagle provides a way of building static in client with periodic scanning and copying the index back and forth. Alex's tool will help building static index in server. Sounds useful. To make it useful, I have a suggestion: - creating a beagle-build-index everytime is expensive, maybe add an option "--threshold num" to fire the process after every num changes ? Also, beagle-build-index doesnt currently handle deletions of files (its tricky and expensive). So, a warning message might be printed to inform the user about this. This doesnt cause any usability problem since, IIRC, beagle wont return query results if the file doesnt exist even if the lucene query returned deleted files.
Beagle is not under active development anymore and had its last code changes in early 2011. Its codebase has been archived (see bug 796735): https://gitlab.gnome.org/Archive/beagle/commits/master "tracker" is an available alternative. Closing this report as WONTFIX as part of Bugzilla Housekeeping to reflect reality. Please feel free to reopen this ticket (or rather transfer the project to GNOME Gitlab, as GNOME Bugzilla is deprecated) if anyone takes the responsibility for active development again.