After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 393766 - RTF filter does not work on lots of documents
RTF filter does not work on lots of documents
Status: RESOLVED FIXED
Product: beagle
Classification: Other
Component: General
0.2.14
Other All
: Normal major
: ---
Assigned To: Beagle Bugs
Beagle Bugs
Depends on:
Blocks:
 
 
Reported: 2007-01-07 02:22 UTC by Debajyoti Bera
Modified: 2007-01-10 01:30 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
a simple rtf file which doesnt work as expected with the current filter (1.87 KB, application/rtf)
2007-01-07 02:24 UTC, Debajyoti Bera
Details
this one has some metadata - doesnt work either (16.35 KB, application/rtf)
2007-01-07 02:24 UTC, Debajyoti Bera
Details
gzipped test doc (82.53 KB, application/octet-stream)
2007-01-08 16:54 UTC, Joe Shaw
Details

Description Debajyoti Bera 2007-01-07 02:22:39 UTC
Please describe the problem:
RTF filter does not work on lots of RTF files. Also, it is quite slow. It definitely needs a rework.

Steps to reproduce:
I will attach a couple of RTF files which has text that should be extracted but they arent.

Actual results:


Expected results:


Does this happen every time?


Other information:
Mono internally an RTF parser. Currently an RTF filter based included in a branch
http://svn.gnome.org/svn/beagle/branches/beagle-cutting-edge-branch/Filters/

The modified filter is very fast and correct in text extraction but there is no code to extract metadata information. The code can be merged to trunk once the metadata extraction is written.
Comment 1 Debajyoti Bera 2007-01-07 02:24:11 UTC
Created attachment 79597 [details]
a simple rtf file which doesnt work as expected with the current filter
Comment 2 Debajyoti Bera 2007-01-07 02:24:54 UTC
Created attachment 79598 [details]
this one has some metadata - doesnt work either
Comment 3 Joe Shaw 2007-01-08 16:23:15 UTC
Do we just need testing before you updated one goes into the trunk?
Comment 4 Debajyoti Bera 2007-01-08 16:30:26 UTC
Nops. I need to add code to fetch the author/title etc metadata. It took me sometime to understand the code, but I think I have it mostly figured out.

The RTF parser is based on the work of Paul DuBois - most of the rtf related code on the web is based on his legendary parser. I am confident that the parser will do a better job of extraction than the current filter. I will run a few tests after I implement the above and then I will merge it to the trunk.
Comment 5 Joe Shaw 2007-01-08 16:41:57 UTC
Great.  It would be nice if you could add these RTF files to the test suite:

https://forgesvn1.novell.com/svn/beagle/trunk/beagle-test/filters/RTF/files/

And test the (sadly, one) file that's currently in there when you feel it's ready.
Comment 6 Debajyoti Bera 2007-01-08 16:51:43 UTC
Novel forge requires authentication :-(.
If its only one file, could you attach it here ? And vice versa, copy these files there.
Thanks in advance.
Comment 7 Joe Shaw 2007-01-08 16:54:07 UTC
Created attachment 79763 [details]
gzipped test doc

Bleh, that's annoying. :(

(I had to gzip it; otherwise bugzilla wouldn't let me attach it)
Comment 8 Debajyoti Bera 2007-01-10 01:30:49 UTC
I didnt realize this would be closed so soon :-).
New RTF Filter is in trunk, r3243, r3244.