GNOME Bugzilla – Bug 315056
Child indexables on the filesystem
Last modified: 2006-11-24 21:54:27 UTC
Please describe the problem: The attached patch contains a beagle filter for archive files. Archives are opened and the individual entries are treated as Child indexables. When indexing child indexables, they will have a preset BinaryStream or TextReader. So, the GetTextReader () or GetBinaryStream () should check for the existance of a textreader or binary_stream and in the absence of both, should try to create one from the contenturi. Steps to reproduce: 1. 2. 3. Actual results: Expected results: Does this happen every time? Other information:
Created attachment 51672 [details] [review] Propose fix and FilterArchive Patch contains a proposed fix for the problem mentioned in the bug description and FilterArchive that triggered the bug. ;-)
Created attachment 53729 [details] [review] Indexable fix Splitting up the patch in to : 1) Archive filter 2) Indexable fix. Attaching the indexable fix here and creating a new bug for filter archive to attach the "archive filter"
Created attachment 53756 [details] [review] Enable child indexables on the filesystem This enables child indexables on the filesystem. I'm not too sure about the changes to RemapUri (adding "parent:" to property keys). Say we had a maildir file with attachment inside a zip file, the email attachment would have a Uri such as: archive.zip#email#0 The lucene document could either have parent:beagle:ExactFilename of "archive.zip" or "email". Depends whether we want this to be the real 'physical' parent, or the actual parent. My hack here assumes it would be archive.zip (but I haven't tested any doubly-nested files yet). Alternatively we could always ensure beagle:ExactFilename points to the physical parent file?
Jon, can you fill us in on the current state of things?
The patch is forthcoming, but my schedule got extremely covoluted --- among other things, my start date at Google got moved up a week. (I'm sitting in new employee orientation even as a type...)
These patches have archival value, but no longer apply or compile, check out the FilterArchive bug for a patch that includes these and more.
Initial implementation in CVS, based on patches here and in the archive-filter bug. It handles nested child indexables, marking indexables done only after all children are done, remapping child uri to return the correct uri fullpath#child1#child2#nested.document to clients. Requesting testing. Marking RESOLVED.