After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 380950 - Scribus Filter
Scribus Filter
Status: RESOLVED FIXED
Product: beagle
Classification: Other
Component: General
0.2.13
Other All
: Normal enhancement
: ---
Assigned To: Beagle Bugs
Beagle Bugs
Depends on:
Blocks:
 
 
Reported: 2006-11-30 16:40 UTC by Alexander Macdonald
Modified: 2006-12-06 20:32 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
the filter code (3.81 KB, text/x-csharp)
2006-11-30 16:42 UTC, Alexander Macdonald
  Details
patch to get the filter compiled (1.17 KB, patch)
2006-11-30 16:43 UTC, Alexander Macdonald
accepted-commit_now Details | Review
the filter code (3.69 KB, text/x-csharp)
2006-12-02 12:21 UTC, Alexander Macdonald
  Details
scribus file created with 1.3.4 cvs (45.82 KB, application/x-scribus)
2006-12-06 12:15 UTC, Alexander Macdonald
  Details
scribus file created with version 1.2 (36.52 KB, application/x-scribus)
2006-12-06 12:16 UTC, Alexander Macdonald
  Details
Scribus Filter (4.03 KB, text/x-csharp)
2006-12-06 20:17 UTC, Alexander Macdonald
  Details

Description Alexander Macdonald 2006-11-30 16:40:49 UTC
this filter adds support for "application/x-scribus" files created in scribus (http://www.scribus.net/) version 1.3.4 onwards. It indexes the text, metadata and also the page count of the document.

It will break on scribus files created with a version less than 1.3.4 beacuse they aren't valid XML documents. I'm not sure exactly how to check this in a robust manner though. Is there a nice way in c# to check if the string "1.3.3" is "less" than the string "1.3.4cvs" for example? or "0.9" is less than "1.3.5.5" ?

Other than that it should be fine
Comment 1 Alexander Macdonald 2006-11-30 16:42:19 UTC
Created attachment 77429 [details]
the filter code
Comment 2 Alexander Macdonald 2006-11-30 16:43:25 UTC
Created attachment 77430 [details] [review]
patch to get the filter compiled
Comment 3 Alexander Macdonald 2006-12-02 12:21:25 UTC
Created attachment 77533 [details]
the  filter code

This version is now compatible with all released versions of scribus, including the new version in cvs. Please commit!!
Comment 4 Debajyoti Bera 2006-12-02 18:37:00 UTC
As you mentioned in the mailing list, could you replace the XmlTextReader creations using the new style of XmlReader.Create() ? It would be nice to have a clean code without any deprecated methods :).

For future reference, could you attach a few scribus files - pre-1.4.3, current and "the new version in cvs".
Thanks.
Comment 5 Alexander Macdonald 2006-12-06 12:14:18 UTC
(In reply to comment #4)
> As you mentioned in the mailing list, could you replace the XmlTextReader
> creations using the new style of XmlReader.Create() ? It would be nice to have
> a clean code without any deprecated methods :).
> 
> For future reference, could you attach a few scribus files - pre-1.4.3, current
> and "the new version in cvs".
> Thanks.
> 

since beagle's compile system seems to force compilation under .net 1.0 I can't really test the filter using the new method so I think it would be better to get the filter in now and then patch all the uses of XmlTextReader after the switch to net 2.0

grepping for "new XmlTextReader" shows the following places where the deprecated method is still used:

beagled/AkregatorQueryable/AkregatorQueryable.cs:262:
beagled/KonqBookmarkQueryable/KonqBookmarkQueryable.cs:275:
beagled/BlamQueryable/BlamQueryable.cs:183:
beagled/LifereaQueryable/LifereaQueryable.cs:233:
Filters/FilterSvg.cs:61:
Filters/FilterXslt.cs:52:
Filters/FilterSvg.cs.diff:33:
Filters/HtmlAgilityPack/HtmlWeb.cs:783:
Filters/FilterDocbook.cs:75:
Filters/FilterSpreadsheet.cs:95:
Filters/FilterOpenOffice.cs:522:
Filters/FilterOpenOffice.cs:530:
Filters/FilterOpenOffice.cs:554:
Filters/FilterOpenOffice.cs:555:
Filters/FilterLabyrinth.cs:50:
Filters/FilterAbiword.cs:327:
Filters/FilterSvg.cs.new:53:
Filters/FilterRPM.cs:84:
Util/ImLog.cs:329:
Util/SemWeb/XmlParser.cs:41:
Util/SemWeb/XmlParser.cs:44:
Util/Note.cs:94:

I'll certainly attach some scribus test files though
Comment 6 Alexander Macdonald 2006-12-06 12:15:14 UTC
Created attachment 77804 [details]
scribus file created with 1.3.4 cvs
Comment 7 Alexander Macdonald 2006-12-06 12:16:05 UTC
Created attachment 77805 [details]
scribus file created with version 1.2
Comment 8 Joe Shaw 2006-12-06 19:29:56 UTC
When I run beagle-extract-content on the old one, I ran into some issues.

First, when running on the new file, it's detected as a Scribus file and processed correctly.  But when I run it on the old file, it prefers the plain text filter for some reason:

[joe@posthaste ~/cvs/beagle/beagled]$ ./beagle-extract-content ~/scribus-new.sla
*** Running uninstalled ExtractContent.exe ***
Filename: file:///home/joe/scribus-new.sla
Debug: Loaded 53 filters from /home/joe/cvs/beagle/Filters/Filters.dll
Filter: Beagle.Filters.FilterScribus
MimeType:

[joe@posthaste ~/cvs/beagle/beagled]$ ./beagle-extract-content ~/scribus-old.sla
*** Running uninstalled ExtractContent.exe ***
Filename: file:///home/joe/scribus-old.sla
Debug: Loaded 53 filters from /home/joe/cvs/beagle/Filters/Filters.dll
Filter: Beagle.Filters.FilterText
MimeType: text/plain

Any idea why this is?

Secondly, when I force the MIME type on the old one, the content is a little broken:

[joe@posthaste ~/cvs/beagle/beagled]$ ./beagle-extract-content --mimetype=application/x-scribus ~/scribus-old.sla
*** Running uninstalled ExtractContent.exe ***
Filename: file:///home/joe/scribus-old.sla
Debug: Loaded 53 filters from /home/joe/cvs/beagle/Filters/Filters.dll
Filter: Beagle.Filters.FilterScribus
MimeType: application/x-scribus

Properties:
  Timestamp = 2006-12-06 19:24:02 -05:00
  dc:author = Alex Mac
  dc:description = a scribus document for testing
  dc:keywords = test
  dc:title = beagle test file
  fixme:pagecount = 2
  fixme:scribus-version = 1.2.2

Content:
H ere is a test file created in an old version of scribus w ow, its cool isn't it

Note the spaces, "H ere", "w ow", etc.  Those don't happen on the new file, and will probably break searching for those words.
Comment 9 Alexander Macdonald 2006-12-06 20:16:00 UTC
(In reply to comment #8)
> When I run beagle-extract-content on the old one, I ran into some issues.
> 
> First, when running on the new file, it's detected as a Scribus file and
> processed correctly.  But when I run it on the old file, it prefers the plain
> text filter for some reason:
...
> Any idea why this is?

I have no idea how programs like beagle calculate the mimetypes, maybe you need to install scribus for the right mimetype info to be stored in the mimetype database? it works fine on my system (ubuntu edgy with scribus 1.2.x and 1.3.4.x installed)

running "file" on them I see the following:

file ~/scribus-*
/home/alex/scribus-new.sla:  XML 1.0 document text
/home/alex/scribus-old.sla:  ASCII text, with very long lines

the old format is detected as text because it is not strict XML and lacks the <?xml... processing instruction at the start of the file, but that doesn't seem to stop my beagle from picking it up as application/x-scribus ....

> Secondly, when I force the MIME type on the old one, the content is a little
> broken:

Yeah I just noticed that. I've attached a new version that sorts out the problem. For some reason in the old file format making a letter upper case caused a new ITEXT element to be created. I was coding on the assumption that that indicated a new paragraph.

The fix means that for old files words may get run together, but I'm guessing thats better than having them seperated. I can't really see a way of fixing it perfectly, the old file format is just a bit wonky unfortunately.

I'm not really a heavy scribus user so I may ask on the scribus mailing list if people can offer up some real world files for testing. Other than that the only way is to get it in cvs and let people play with it.
Comment 10 Alexander Macdonald 2006-12-06 20:17:57 UTC
Created attachment 77857 [details]
Scribus Filter
Comment 11 Joe Shaw 2006-12-06 20:32:09 UTC
You're right, I don't have it installed.  I don't think it's a big issue, and the filter works for me on the old files too.  Committing this now, thanks!