GNOME Bugzilla – Bug 393766
RTF filter does not work on lots of documents
Last modified: 2007-01-10 01:30:49 UTC
Please describe the problem: RTF filter does not work on lots of RTF files. Also, it is quite slow. It definitely needs a rework. Steps to reproduce: I will attach a couple of RTF files which has text that should be extracted but they arent. Actual results: Expected results: Does this happen every time? Other information: Mono internally an RTF parser. Currently an RTF filter based included in a branch http://svn.gnome.org/svn/beagle/branches/beagle-cutting-edge-branch/Filters/ The modified filter is very fast and correct in text extraction but there is no code to extract metadata information. The code can be merged to trunk once the metadata extraction is written.
Created attachment 79597 [details] a simple rtf file which doesnt work as expected with the current filter
Created attachment 79598 [details] this one has some metadata - doesnt work either
Do we just need testing before you updated one goes into the trunk?
Nops. I need to add code to fetch the author/title etc metadata. It took me sometime to understand the code, but I think I have it mostly figured out. The RTF parser is based on the work of Paul DuBois - most of the rtf related code on the web is based on his legendary parser. I am confident that the parser will do a better job of extraction than the current filter. I will run a few tests after I implement the above and then I will merge it to the trunk.
Great. It would be nice if you could add these RTF files to the test suite: https://forgesvn1.novell.com/svn/beagle/trunk/beagle-test/filters/RTF/files/ And test the (sadly, one) file that's currently in there when you feel it's ready.
Novel forge requires authentication :-(. If its only one file, could you attach it here ? And vice versa, copy these files there. Thanks in advance.
Created attachment 79763 [details] gzipped test doc Bleh, that's annoying. :( (I had to gzip it; otherwise bugzilla wouldn't let me attach it)
I didnt realize this would be closed so soon :-). New RTF Filter is in trunk, r3243, r3244.