GNOME Bugzilla – Bug 169822
[PATCH]Indexer should giveup on files that it fails to index after trying for some "n" times
Last modified: 2005-11-04 20:42:29 UTC
The daemon keeps on trying to index a .doc file continously. (This is probably the last file that isnt indexed on my system). We should rather give up if it cant be indexed. DEBUG: Flushing... DEBUG: - file:///home/lukas/Desktop/The most interesting person.doc DEBUG: + file:///home/lukas/Desktop/The most interesting person.doc ERROR: Could not open OLE Meta data stream from /home/lukas/Desktop/The most interesting person.doc DEBUG: Helper Size: VmRSS=32,6 MB, size=3,24, 56,1% DEBUG: Helper Size: VmRSS=33,0 MB, size=3,28, 57,0% DEBUG: Helper Size: VmRSS=32,6 MB, size=3,24, 56,1% DEBUG: CloseIfQueued on FileSystemIndex DEBUG: Flush Complete! DEBUG: CloseIfQueued on FileSystemIndex DEBUG: CloseIfQueued on FileSystemIndex DEBUG: Close on FileSystemIndex DEBUG: worker removed: name=FileSystemIndex DEBUG: worker added: name=FileSystemIndex refcount=1 DEBUG: Launching flush thread! DEBUG: Flushing... DEBUG: - file:///home/lukas/Desktop/The most interesting person.doc DEBUG: + file:///home/lukas/Desktop/The most interesting person.doc ERROR: Could not open OLE Meta data stream from /home/lukas/Desktop/The most interesting person.doc DEBUG: Helper Size: VmRSS=33,0 MB, size=3,28, 57,0% DEBUG: CloseIfQueued on FileSystemIndex DEBUG: Flush Complete! This loops to infinity :-)
Retitling using other description. Lukas it would be helpful if you could provide us with the .doc file.
According to Veerapuram Varadhan in Bug 169758 "Fixed in CVS. I have tested the fix against the Bruce-eckel's TICPP ebook version (600 pages approx.) and it takes between 93 and 110 seconds as compared to 3400+ seconds. Jon: Can you verify this fix against your special ;-) doc?" *** This bug has been marked as a duplicate of 169758 ***
OK Lukas you may want to test the fix.
This is not a duplicate of <a href=http://bugzilla.gnome.org/show_bug.cgi?id=169822>169822 </a> bug. This bug is related to "ignore" or "skip" the document in question from being "scheduled" for indexing, may be forever.
Ah!! sorry, i missed the quotation marks in the markup!!.. Anyway.. reopening the bug.
The file its trying to open is actually broken, I can't open it in OO or Abi, but the point I'm trying to make is we should give up on a file if it can't be indexed. varadhan has got the right idea.
Lukas: can you attach that doc to the bug? So that I can verify it at my end. :) Also, I suspect it could be a duplicate of <a href="http://bugzilla.gnome.org/show_bug.cgi?id=169691">169691</a>.
Created attachment 38505 [details] Corrupt file The filter handles this beautifly but we just keep looping on it after everything gets indexed.
I don't think this is a duplicate of bug 169691.
I'm not able to reproduce this. Do extended attributes get set on the file? (You can check with 'getfattr -d /path/to/file') Can you find it with beagle-query? (Try searching by the name)
OK, I know what this is: it is a race caused by the fact that wv1 opens .doc files read-write, so an inotify event is generated when the file is closed. Varadhan: I know you looked at this some before. Any chance we can ship a patched wv1 to avoid these sorts of problems?
Jon: I already have a "patched-wv" with me. However, is it possible to distribute that along with Beagle? I mean any licensing issues?
I've talked to Dom, and there is not a license problem.
Ok. I am attaching two patches here. One for wv1 and the other for beagle. Can you review the wv1 patch to see whether its okay to do that way?
Created attachment 46893 [details] [review] patched-wv1 to allow files to be opened in readonly mode
Created attachment 46894 [details] [review] bug fix patch uses patched-wv1 to open files in readonly mode.
Has this been accepted upstream?
A new wv1 release came out which fixes this; we now support it in CVS, and we've been linking to the wv1 patch for some time. Closing this as FIXED