Bug 361320 – Beagle should index labyrinth notes

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 361320 - Beagle should index labyrinth notes


Summary:	Beagle should index labyrinth notes


Status:	RESOLVED FIXED

Product:	beagle
Classification:	Other
Component:	General
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	Kevin Kubasik
QA Contact:	Beagle Bugs

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-10-11 02:30 UTC by Kevin Kubasik
Modified:	2006-10-14 01:06 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Add labyrinth support to beagle (9.41 KB, patch) 2006-10-11 02:30 UTC, Kevin Kubasik	none	Details \| Review
Updated Labyrinth Patch (9.98 KB, patch) 2006-10-13 10:28 UTC, Kevin Kubasik	needs-work	Details \| Review
Update, Indexable Generator and Closing the Streams (11.46 KB, patch) 2006-10-13 19:45 UTC, Kevin Kubasik	needs-work	Details \| Review
More Updates (9.67 KB, patch) 2006-10-13 22:07 UTC, Kevin Kubasik	needs-work	Details \| Review
More Updates (9.33 KB, patch) 2006-10-13 23:19 UTC, Kevin Kubasik	rejected	Details \| Review

Description Kevin Kubasik 2006-10-11 02:30:23 UTC

Labyrinth is a mindmapping/notetaking tool similar to tomboy.

Comment 1 Kevin Kubasik 2006-10-11 02:30:57 UTC

Created attachment 74452 [details] [review]
Add labyrinth support to beagle

Let me know what you think.

Comment 2 Kevin Kubasik 2006-10-11 02:31:51 UTC

Bug against labyrinth which allows us to open notes based on a uri scheme.
http://code.google.com/p/labyrinth/issues/detail?id=3&can=2&q=

Comment 3 Joe Shaw 2006-10-11 20:19:00 UTC

Generally looks okay, although I think that maybe doing this as a backend and a filter would make more sense.

Comment 4 Kevin Kubasik 2006-10-12 01:50:33 UTC

I thought about that, and doing something like what the tomboy backend does (ie. making a 'Note' object that we then pull data from in the filter) but I'm having a hard time seeing the reasoning behind this redundancy, and it has come back to bite us in the ass with the IM filters, so I wasn't really sure. If you want, I could retractor that stuff out pretty easily, but since there really aren't any properties included in the notes (just the basics) theres not much to extract.

However, once we get all that sexy new metadata stuff in beagle, labyrinth 'links' notes and can create some pretty awesome/dynamic relationships, so that might be worth looking at, but at the moment, we really don't have the metadata system in place for such relationships, although its very possible I could hack a way to do that, I'd rather wait to see what comes of it.

Comment 5 Joe Shaw 2006-10-12 20:29:53 UTC

The reason I brought it up is because of code like this:

+				if (doc.NodeType == XmlNodeType.Text) {
+					sb.Append (doc.ReadString());
+					continue;

We essentially buffer all the contents in memory and then send them over to the index helper process, which is wasteful of memory and can be a big problem if the text is large.  By moving this into the filter we (a) move the entire process into the index helper process and (b) have access to the underlying streams and can feed the text straight into the indexer without buffering it all ahead of time.

If the amount of text is small, then this probably doesn't matter.

(BTW, I think the problem with the IM filters is more because the code is crappy rather than any split between the backend and the filter.  The mail split is an example of one that works pretty well.)

Comment 6 Kevin Kubasik 2006-10-13 08:24:52 UTC

Ok, sounds good, I can make the split happen, theres probably not a major performance hit that we can find in 99.9% of the use cases with something like this, (since almost no one is going to have pages and pages of text in a simple one-pane note taking application) but as we have all learned, there will always be those 3 users who push our app to the limit and beyond. 

Besides anything that keeps mem usage down is good in my book. My only concern originally was that I was making some unneeded code complexity/mem use, but I guess thats really the other way around. I'll attach a patch later tonight.

Comment 7 Kevin Kubasik 2006-10-13 10:28:04 UTC

Created attachment 74626 [details] [review]
Updated Labyrinth Patch 

Ok, this one has a separate Filter and Backend, let me know what you think.

Comment 8 Joe Shaw 2006-10-13 17:37:59 UTC

Another thing I didn't notice the first time through:

+			foreach (FileInfo file in dir.GetFiles ()) {
+				if (file.Extension == ".map") {
+					IndexNote (file, Scheduler.Priority.Delayed);
+					++count;
+				}
+			}

We should stop using this construct and start using DirectoryWalker and IndexableGenerators because this doesn't scale.  Once you have more than 50 or so files to index this starts to become a performance bottleneck.  See the Gaim log backend for an example of this (it previously worked this way and was recently switched to being an indexable generator).

Mime type should probably be "x-beagle/x-labrynth-note".

The filter leaks a file descriptor if it encounters an exception.  I suggest moving the doc.Close() out into a finally block.

I'd like there to be naming consistency between the backend and the filter; I suggest renaming the filter to FilterLabrynth (or FilterLabrynthNote if there are other types of Labrynth data to index).

Comment 9 Kevin Kubasik 2006-10-13 19:45:00 UTC

Created attachment 74651 [details] [review]
Update, Indexable Generator and Closing the Streams

Ok, I think I took care of everything, let me know what you think!

Comment 10 Joe Shaw 2006-10-13 21:27:31 UTC

The "FilterLabNote.cs" comment at the top of the file needs fixing. :)

You don't want to call Finished() in the finally block, because that block will *always* be executed regardless of whether or not you hit an exception, which means that you would call both Error() and Finished() in the error case.  Move that back into the try block.

The crawling stuff doesn't make sense in the backend.  The reason why it's all there in the Gaim backend is to build up a list of directories and attach inotify watches to them.  You don't need any of that for these notes since they're all in one directory.  If you want to keep crawling around in the non-inotify case, that's fine, but this code isn't useful for that.  The old code probably made more sense there.

IndexNoteNow() doesn't make sense as the method name, since all it does is basically convert a note to an indexable; it doesn't index it at all.  I suggest changing it to NoteToIndexable().

The HitIsValid() method probably isn't necessary.

Comment 11 Kevin Kubasik 2006-10-13 22:07:16 UTC

Created attachment 74662 [details] [review]
More Updates

Think I got all that. ;)

Comment 12 Joe Shaw 2006-10-13 23:08:51 UTC

You need to move Finished() into the try block; that's the only way to guarantee it's run only in the non-error case.  And now that I look at it, the finally block isn't necessary.  You can just move doc.Close() outside the try/catch blocks entirely.  (ie, where Finished() is now)

+			foreach (FileInfo file in DirectoryWalker.GetFileInfos (lab_dir))
+				if (file.Name.EndsWith("map"))
+					IndexNote (file, Scheduler.Priority.Delayed);

You want to remove this block completely; the entire point of the indexable generator is so that you don't walk the list of files creating indexables and scheduler tasks for every file up front. :)  You can just move the inotify watch code back into StartWorker() then too.

You don't need IsUpToDate(), as it's defined on the parent LuceneFileQueryable class.

IndexNote() isn't used outside of the inotify function, you might want to just fold the code in.

Comment 13 Kevin Kubasik 2006-10-13 23:19:44 UTC

Created attachment 74664 [details] [review]
More Updates

Just a updating away! ;) Lemme know how it looks, I'm thinking once we get this in working order, I'm gonna go back and refractor the Tomboy to act similarly.

Comment 14 Joe Shaw 2006-10-14 00:28:55 UTC

Finished() still needs to be moved into the try block.

In the queryable, polling_interval_in_seconds isn't used.  You probably want to multiply it by 1000 and pass it into GLib.Timeout.Add().

Once those are fixed, it's fine to go in.

Comment 15 Kevin Kubasik 2006-10-14 01:06:31 UTC

Ok, fixed that and committed, let me know if theres anything I missed, but at least its working here.