GNOME Bugzilla – Bug 309916
Akregator backend
Last modified: 2005-08-19 16:47:07 UTC
Akregator is a KDE RSS reader. I wrote a backend for akregator - thought it might be useful to other KDE users.
Created attachment 48882 [details] [review] Akregator backend Akregator backend. Put this in beagled/AkregatorQueryable/AkregatorQueryable.cs and update the makefiles. This is a first attempt and it works for me.
Overall this looks very good. I have only a couple minor comments: * Instead of p/invoking into libgmime, you should probably use the method in the managed GMime method. Ie, add "using GMime;" and then use "GMime.Utils.HeaderDecodeDate (item.PubDate, out offset);" * The newline thing in the StreamReader seems suspect. It could be a bug elsewhere in the code. Can you find out if the data is failing at the filtering stage, or somewhere else? * I know the FIXME is copied from the Liferea feed, but does it apply? How long does Akregator store feeds for? If the files are never going to be that big, let's drop the comment.
Thanks for the comments. I am sorry - but which FIXME are you talking about ? The de-serializer one ? BTW, I didnt understand that comment - isnt serializer some kind of stream processor ? One more question, regarding the backends which write their data in one, large (XML) file. Whenever the application adds/changes some item in the xml file, it seems the whole file is re-read and re-parsed (due to inotify) - which is clearly inefficient. Is the IIndexableGenerator anything to help this feature ?
This FIXME: // De-serialization classes // FIXME: Change to standard stream parsing for performance? The serializer is a stream parser, but it has to create objects for everything that it comes across, which might be a bit expensive. Plus, in reality we're not doing anything smart about the streaming: we're just creating potentially hundreds of objects at once. That caused a real performance problem for us with the Blam backend in the past. As to your other question, I'm not sure there's a way to fix the problem. You get notification about the whole file. There's no way you can tell specifically what's changed without reparsing it. We could definitely be smarter about what is reindexed -- and if you want to code that up, please feel free -- but reparsing is unavoidable. The IIndexableGenerator won't help here... what you need is a cache of the previous state and just compare between the two.
Created attachment 49499 [details] [review] patch for a possible bug in filterhtml Seems there is a possible bug in filterhtml. I should say that I couldnt reproduce the newline fix (i had upgraded mono and beagle in the meantime) I suggested earlier. Have a look at the patch. In brief, if an HTML piece is given for filtering and it doesnt start with any element the filter fails. I am not sure if this is the correct way to fix it (in the sense of expected behaviour of an HTML parser).
Created attachment 49500 [details] [review] Updated backend source file Updated AkregatorQueryable (after Joe's suggestions). Also dropped the newline fix (see my earlier comment).
I still need to implement deletions of feed items. According to the current behaviour (which should be similar to Liferea), when the xml file is rewritten after some deletions, the deleted items are not getting removed from the index.
For reference, the FilterHtml bug is related to bug #171469 and fix *can be* similar to it (FilterHtml.cs rev:1.16). To give examples, this is an Akregator feed: today I am trying to <i>submit</i> a <b>patch</b> <br/> for <em>Akregator</em> And this is a liferea feed (i am sure blam is something similar): <div ...>today I am trying to <i>submit</i> a <b>patch</b> <br/> for <em>Akregator</em> </div> both of which are part of legitimate HTML code but neither in itself is an HTMLdocument. Therefore, there is a design issue here: ideally FilterHtml should parse complete and valid Html documents. In case backends wants to parse HTML snippets, they can always add harmless dummy <head>,<body>tags around it to make it legit HTML. Any comments... otherwise, we will end up making many allowances in Filters which might cause other bugs.
Thanks for your work! This is very useful for beagle&Akregator users. Just want you to know that we changed the archive backend from the old storage in RSS format to a plugin-based storage backend, with a metakit (http://www.equi4.com/metakit.html) implementation as default. (the advantages are vastly improved startup times and less memory consumption). So your plugin won't work with kdepim 3.5. It seems there are no C# bindings for metakit (which is a C++ library), so I don't know how much work it would be to access metakit files from C#. Frank Osterfeld Akregator maintainer
Created attachment 49601 [details] [review] minor improvements Since this backend wont be compatible with kdepim-3.5, its time to move on and work on better stuff :-) Submitting the latest improvement that I worked on. This disregards feeds which are already deleted before indexing began (didnt have time to implement deletion of items from index) and uses a local fix for the filterhtml problem With the original backend, words from the description of the feed wont be searchable - so this patch is needed for the backend to work.
Created attachment 50156 [details] [review] akregator patch against latest version dsd: apply the filterhtml patch and this akregator patch - they are against latest versions.
Created attachment 50198 [details] [review] combined patch for akregatorqueryable.cs and filterhtml.cs there was a syntax error in the last patch :P - and i also added the filterhtml change needed in this patch. so this is all that is needed.
Backend in CVS.