Bug 309916 – Akregator backend

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 309916 - Akregator backend


Summary:	Akregator backend


Status:	RESOLVED FIXED

Product:	beagle
Classification:	Other
Component:	General
Version:	0.0.x
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	Beagle Bugs
QA Contact:	Beagle Bugs

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2005-07-09 21:44 UTC by Debajyoti Bera
Modified:	2005-08-19 16:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Akregator backend (9.24 KB, patch) 2005-07-09 21:47 UTC, Debajyoti Bera	none	Details \| Review
patch for a possible bug in filterhtml (514 bytes, patch) 2005-07-21 04:50 UTC, Debajyoti Bera	none	Details \| Review
Updated backend source file (8.84 KB, patch) 2005-07-21 04:52 UTC, Debajyoti Bera	none	Details \| Review
minor improvements (2.61 KB, patch) 2005-07-22 23:33 UTC, Debajyoti Bera	none	Details \| Review
akregator patch against latest version (2.65 KB, patch) 2005-08-02 23:58 UTC, Debajyoti Bera	none	Details \| Review
combined patch for akregatorqueryable.cs and filterhtml.cs (3.41 KB, patch) 2005-08-03 22:09 UTC, Debajyoti Bera	none	Details \| Review

Description Debajyoti Bera 2005-07-09 21:44:53 UTC

Akregator is a KDE RSS reader. I wrote a backend for akregator - thought it
might be useful to other KDE users.

Comment 1 Debajyoti Bera 2005-07-09 21:47:44 UTC

Created attachment 48882 [details] [review]
Akregator backend

Akregator backend. Put this in beagled/AkregatorQueryable/AkregatorQueryable.cs
and update the makefiles.

This is a first attempt and it works for me.

Comment 2 Joe Shaw 2005-07-15 18:22:17 UTC

Overall this looks very good.  I have only a couple minor comments:

* Instead of p/invoking into libgmime, you should probably use the method in the
managed GMime method.  Ie, add "using GMime;" and then use
"GMime.Utils.HeaderDecodeDate (item.PubDate, out offset);"

* The newline thing in the StreamReader seems suspect.  It could be a bug
elsewhere in the code.  Can you find out if the data is failing at the filtering
stage, or somewhere else?

* I know the FIXME is copied from the Liferea feed, but does it apply?  How long
does Akregator store feeds for?  If the files are never going to be that big,
let's drop the comment.

Comment 3 Debajyoti Bera 2005-07-18 17:50:20 UTC

Thanks for the comments.   
I am sorry - but which FIXME are you talking about ? The de-serializer   
one ? BTW, I didnt understand that comment - isnt serializer some kind of   
stream processor ?   
   
One more question, regarding the backends which write their data in one, large  
(XML) file. Whenever the application adds/changes some item in the xml file,  
it seems the whole file is re-read and re-parsed (due to inotify) - which is  
clearly inefficient. Is the  IIndexableGenerator anything to help this  
feature ?

Comment 4 Joe Shaw 2005-07-19 14:35:18 UTC

This FIXME:

	// De-serialization classes
	// FIXME: Change to standard stream parsing for performance?

The serializer is a stream parser, but it has to create objects for everything
that it comes across, which might be a bit expensive.  Plus, in reality we're
not doing anything smart about the streaming: we're just creating potentially
hundreds of objects at once.  That caused a real performance problem for us with
the Blam backend in the past.

As to your other question, I'm not sure there's a way to fix the problem.  You
get notification about the whole file.  There's no way you can tell specifically
what's changed without reparsing it.  We could definitely be smarter about what
is reindexed -- and if you want to code that up, please feel free -- but
reparsing is unavoidable.  The IIndexableGenerator won't help here... what you
need is a cache of the previous state and just compare between the two.

Comment 5 Debajyoti Bera 2005-07-21 04:50:21 UTC

Created attachment 49499 [details] [review]
patch for a possible bug in filterhtml

Seems there is a possible bug in filterhtml. I should say that I couldnt
reproduce the newline fix (i had upgraded mono and beagle in the meantime) I
suggested earlier.

Have a look at the patch. 
In brief, if an HTML piece is given for filtering and it doesnt start with any
element the filter fails. I am not sure if this is the correct way to fix it
(in the sense of expected behaviour of an HTML parser).

Comment 6 Debajyoti Bera 2005-07-21 04:52:49 UTC

Created attachment 49500 [details] [review]
Updated backend source file

Updated AkregatorQueryable (after Joe's suggestions).
Also dropped the newline fix (see my earlier comment).

Comment 7 Debajyoti Bera 2005-07-21 04:55:23 UTC

I still need to implement deletions of feed items. According to the current 
behaviour (which  
should be similar to Liferea), when the xml file is rewritten after some  
deletions, the deleted items are not getting removed from the index.

Comment 8 Debajyoti Bera 2005-07-21 20:16:46 UTC

For reference, the FilterHtml bug is related to bug #171469 and fix *can be*  
similar to it (FilterHtml.cs rev:1.16).  
 
To give examples, this is an Akregator feed: 
today I am trying to <i>submit</i> a <b>patch</b> <br/> for <em>Akregator</em> 
 
And this is a liferea feed (i am sure blam is something similar): 
<div ...>today I am trying to <i>submit</i> a <b>patch</b> <br/> for 
<em>Akregator</em> </div> 
 
both of which are part of legitimate HTML code but neither in itself is an 
HTMLdocument. 
 
Therefore, there is a design issue here: ideally FilterHtml should parse 
complete and valid 
Html documents. In case backends wants to parse HTML snippets, they can always  
add harmless dummy <head>,<body>tags around it to make it legit HTML. Any  
comments... otherwise, we will end up making many allowances in Filters which  
might cause other bugs.

Comment 9 Frank Osterfeld 2005-07-22 17:20:25 UTC

Thanks for your work! This is very useful for beagle&Akregator users.  
 
Just want you to know that we changed the archive backend from the old storage  
in RSS format to a plugin-based storage backend, with a metakit   
(http://www.equi4.com/metakit.html) implementation as default. (the advantages  
are vastly improved startup times and less memory consumption).  
So your plugin won't work with kdepim 3.5.  
It seems there are no C# bindings for metakit (which is a C++ library), so I 
don't know how much work it would be to access metakit files from C#. 
  
Frank Osterfeld  
Akregator maintainer

Comment 10 Debajyoti Bera 2005-07-22 23:33:34 UTC

Created attachment 49601 [details] [review]
minor improvements

Since this backend wont be compatible with kdepim-3.5, its time to move on and
work on better stuff :-)

Submitting the latest improvement that I worked on. This disregards feeds which
are already deleted before indexing began (didnt have time to implement
deletion of items from index) and uses a local fix for the filterhtml problem
With the original backend, words from the description of the feed wont be
searchable - so this patch is needed for the backend to work.

Comment 11 Debajyoti Bera 2005-08-02 23:58:49 UTC

Created attachment 50156 [details] [review]
akregator patch against latest version

dsd: apply the filterhtml patch and this akregator patch - they are against
latest versions.

Comment 12 Debajyoti Bera 2005-08-03 22:09:30 UTC

Created attachment 50198 [details] [review]
combined patch for akregatorqueryable.cs and filterhtml.cs

there was a syntax error in the last patch :P - and i also added the filterhtml
change needed in this patch. so this is all that is needed.

Comment 13 Debajyoti Bera 2005-08-19 16:47:07 UTC

Backend in CVS.