GNOME Bugzilla – Bug 380950
Scribus Filter
Last modified: 2006-12-06 20:32:09 UTC
this filter adds support for "application/x-scribus" files created in scribus (http://www.scribus.net/) version 1.3.4 onwards. It indexes the text, metadata and also the page count of the document. It will break on scribus files created with a version less than 1.3.4 beacuse they aren't valid XML documents. I'm not sure exactly how to check this in a robust manner though. Is there a nice way in c# to check if the string "1.3.3" is "less" than the string "1.3.4cvs" for example? or "0.9" is less than "1.3.5.5" ? Other than that it should be fine
Created attachment 77429 [details] the filter code
Created attachment 77430 [details] [review] patch to get the filter compiled
Created attachment 77533 [details] the filter code This version is now compatible with all released versions of scribus, including the new version in cvs. Please commit!!
As you mentioned in the mailing list, could you replace the XmlTextReader creations using the new style of XmlReader.Create() ? It would be nice to have a clean code without any deprecated methods :). For future reference, could you attach a few scribus files - pre-1.4.3, current and "the new version in cvs". Thanks.
(In reply to comment #4) > As you mentioned in the mailing list, could you replace the XmlTextReader > creations using the new style of XmlReader.Create() ? It would be nice to have > a clean code without any deprecated methods :). > > For future reference, could you attach a few scribus files - pre-1.4.3, current > and "the new version in cvs". > Thanks. > since beagle's compile system seems to force compilation under .net 1.0 I can't really test the filter using the new method so I think it would be better to get the filter in now and then patch all the uses of XmlTextReader after the switch to net 2.0 grepping for "new XmlTextReader" shows the following places where the deprecated method is still used: beagled/AkregatorQueryable/AkregatorQueryable.cs:262: beagled/KonqBookmarkQueryable/KonqBookmarkQueryable.cs:275: beagled/BlamQueryable/BlamQueryable.cs:183: beagled/LifereaQueryable/LifereaQueryable.cs:233: Filters/FilterSvg.cs:61: Filters/FilterXslt.cs:52: Filters/FilterSvg.cs.diff:33: Filters/HtmlAgilityPack/HtmlWeb.cs:783: Filters/FilterDocbook.cs:75: Filters/FilterSpreadsheet.cs:95: Filters/FilterOpenOffice.cs:522: Filters/FilterOpenOffice.cs:530: Filters/FilterOpenOffice.cs:554: Filters/FilterOpenOffice.cs:555: Filters/FilterLabyrinth.cs:50: Filters/FilterAbiword.cs:327: Filters/FilterSvg.cs.new:53: Filters/FilterRPM.cs:84: Util/ImLog.cs:329: Util/SemWeb/XmlParser.cs:41: Util/SemWeb/XmlParser.cs:44: Util/Note.cs:94: I'll certainly attach some scribus test files though
Created attachment 77804 [details] scribus file created with 1.3.4 cvs
Created attachment 77805 [details] scribus file created with version 1.2
When I run beagle-extract-content on the old one, I ran into some issues. First, when running on the new file, it's detected as a Scribus file and processed correctly. But when I run it on the old file, it prefers the plain text filter for some reason: [joe@posthaste ~/cvs/beagle/beagled]$ ./beagle-extract-content ~/scribus-new.sla *** Running uninstalled ExtractContent.exe *** Filename: file:///home/joe/scribus-new.sla Debug: Loaded 53 filters from /home/joe/cvs/beagle/Filters/Filters.dll Filter: Beagle.Filters.FilterScribus MimeType: [joe@posthaste ~/cvs/beagle/beagled]$ ./beagle-extract-content ~/scribus-old.sla *** Running uninstalled ExtractContent.exe *** Filename: file:///home/joe/scribus-old.sla Debug: Loaded 53 filters from /home/joe/cvs/beagle/Filters/Filters.dll Filter: Beagle.Filters.FilterText MimeType: text/plain Any idea why this is? Secondly, when I force the MIME type on the old one, the content is a little broken: [joe@posthaste ~/cvs/beagle/beagled]$ ./beagle-extract-content --mimetype=application/x-scribus ~/scribus-old.sla *** Running uninstalled ExtractContent.exe *** Filename: file:///home/joe/scribus-old.sla Debug: Loaded 53 filters from /home/joe/cvs/beagle/Filters/Filters.dll Filter: Beagle.Filters.FilterScribus MimeType: application/x-scribus Properties: Timestamp = 2006-12-06 19:24:02 -05:00 dc:author = Alex Mac dc:description = a scribus document for testing dc:keywords = test dc:title = beagle test file fixme:pagecount = 2 fixme:scribus-version = 1.2.2 Content: H ere is a test file created in an old version of scribus w ow, its cool isn't it Note the spaces, "H ere", "w ow", etc. Those don't happen on the new file, and will probably break searching for those words.
(In reply to comment #8) > When I run beagle-extract-content on the old one, I ran into some issues. > > First, when running on the new file, it's detected as a Scribus file and > processed correctly. But when I run it on the old file, it prefers the plain > text filter for some reason: ... > Any idea why this is? I have no idea how programs like beagle calculate the mimetypes, maybe you need to install scribus for the right mimetype info to be stored in the mimetype database? it works fine on my system (ubuntu edgy with scribus 1.2.x and 1.3.4.x installed) running "file" on them I see the following: file ~/scribus-* /home/alex/scribus-new.sla: XML 1.0 document text /home/alex/scribus-old.sla: ASCII text, with very long lines the old format is detected as text because it is not strict XML and lacks the <?xml... processing instruction at the start of the file, but that doesn't seem to stop my beagle from picking it up as application/x-scribus .... > Secondly, when I force the MIME type on the old one, the content is a little > broken: Yeah I just noticed that. I've attached a new version that sorts out the problem. For some reason in the old file format making a letter upper case caused a new ITEXT element to be created. I was coding on the assumption that that indicated a new paragraph. The fix means that for old files words may get run together, but I'm guessing thats better than having them seperated. I can't really see a way of fixing it perfectly, the old file format is just a bit wonky unfortunately. I'm not really a heavy scribus user so I may ask on the scribus mailing list if people can offer up some real world files for testing. Other than that the only way is to get it in cvs and let people play with it.
Created attachment 77857 [details] Scribus Filter
You're right, I don't have it installed. I don't think it's a big issue, and the filter works for me on the old files too. Committing this now, thanks!