GNOME Bugzilla – Bug 319412
Archive filter
Last modified: 2006-11-24 21:49:05 UTC
Beagle should be able to filter "files inside archives".
Created attachment 53730 [details] [review] Archive filter + FilterFactory fix suggested by Jon. Archive filter and a fix to "enable" transient files to be filtered using their "path". The fix, as suggested by Jon, will enable the Transient files to be indexed using their path. NOTE: Transient files should set their "indexable.Timestamp" to that of their parent to keep the consistency. In archive filter, the temporary files that are created will actually have the "Parent's" timestamp.
You may also want to look at <a href="show_bug.cgi?id=315056">this</a> fix, which is a must, when operating on reader/stream of the indexable.
Ok, here is the correct link.. http://bugzilla.gnome.org/show_bug.cgi?id=315056
So I tested out the patch, and it looks like the files are being decompressed fine, but they're not being indexed at all.
hmm, they work with beagle-extract-content, but not in the daemon. i'll try to track down further.
Ok, found it. The FIXME in the FSQ isn't addressed. Look at PreChildAddHook() in FileSystemQueryable.cs. Maybe trow can comment on this. Also, based on beagle-extract-content, the fixme:name property of the child files use the tmpfile instead of the real file name. The tmpfile isn't very useful data to index. :) Lastly, it'd be nice to clean up the camelCase to conform to under_scores that we're using as our coding convention now.
Joe: Thanks for your comments. I am going to post a new archive filter by keeping Archive.cs, as FilterOpenOffice and FilterK(word/spread/*) will use archive filter instead of directly using the SharpZipLib APIs. In the meantime, you may want to look at the path posted http://bugzilla.gnome.org/show_bug.cgi?id=315056 by Daniel that enables child-indexables in the FSQ.
Created attachment 53941 [details] [review] Archive filter + FilterFactory fix suggested by Jon. Hmm.. I tried overriding the ZipFile.GetEntry () stuff using GetNextEntry () for tar also and it didn't work as expected. So, I think, its better we remove Archive.cs from Util/ directory. Also, I have corrected the FilterArchive.cs according to Joe's comments. Here is the updated filter.
You have FilterMusic.cs listed twice in the Makefile.am. Other than that, looks fine to me. We need the changes to the FSQ before this can be checked in, though.
Definatly something we should work on.
(reopening; verified means that the fix has been verified)
Verified means fix has been verified. "New" is the status most likely you meant. http://bugzilla.gnome.org/page.cgi?id=bug-status.html#status And yes, this bug is in pipeline. Its sort of mostly done (rather, was sort of mostly done some months back) and is awaiting the FSQ changes (which is a rather non-trivial one). I'd suggest you read the list of comments before marking/adding comments to bugs. That'll give you a clear idea of the current status of a bug. And thanks for periodically scanning the bugs; it definitely reduces the number of bugs, keep everybody on their toes and makes all aware of what needs to be fixed.
Created attachment 67643 [details] [review] FilterArchive.cs Revival Ok, I'm not entirely sure why, but massive amounts of ChildIndexable stuff seems to have seeped into the FSQ over time, I dunno if it was supposed to be there or not of what, but here is a somewhat updated and i think mildly working patch. There is nothing on the UI side, but it does work fine. Let me know what you think.
Almost forgot, didn't diff the Makefile.am in Util, just keep that in mind before building. You need to add FilterArchive.cs.
There are some issues which remain to be resolved with your patch (which seems to have been derived from mine in bug 315056), such as handling of children inside children and the definition of the lucene fields. Jon also raised some more important issues which I have forgotten, something related to handling of children when the parent gets modified or goes away..?
(In reply to comment #15) > There are some issues which remain to be resolved with your patch (which seems > to have been derived from mine in bug 315056), Yeah, just an update to keep it applying cleanly so we might be able to encourage people to hack on it. > such as handling of children > inside children and the definition of the lucene fields. Jon also raised some > more important issues which I have forgotten, something related to handling of > children when the parent gets modified or goes away..? Yeah, we don't seem to do anything real intelligent when this happens, in a perfect world, we could just query for all the children, then iterate through them and update properties. (or even better, just change the properties in a metadata store) but at the moment, we lack the metadata support to make this really feasible (I think). My guess is once we get the new metadata system in place, we should have a lot more options when it comes to dealing with this, and I hope we can make it work then, as we seem to be coming to a point were beagle should be mature enough to handle archived files.
*** Bug 364706 has been marked as a duplicate of this bug. ***
I checked in slight modified versions of these patches and some more stuff to CVS. Marking this closed. Unfortunately the right thing to do depends somewhat on use cases - please open new bugzilla entry for future enhancement requests and bugs.