GNOME Bugzilla – Bug 307612
Beagle uses large amounts of memory when indexing Blam
Last modified: 2005-11-04 18:22:54 UTC
Please describe the problem: When indexing Blam blog entries beagled slowly takes all memory it can get. Running for 5 minutes takes about 500 MB memory. It does not get killed or restarted as its memory usage monitor does not work apparently. Steps to reproduce: 1. Subscribe to some blogs in Blam 2. Start beagled 3. wait some minutes Actual results: mono-beagled takes > 500 MB memory, rising. Expected results: I'd expect Beagle not to consume that much memory, or at lest shut down itself and restart when using more than, say 50 MB. Does this happen every time? Yes. Other information:
Can you run beagled with "--debug --allow-backend blam" and attach the output of the logs? If you can narrow down what file/blog entry is causing it, it'll be much easier to fix.
Created attachment 47791 [details] IndexHelper Logfile
Created attachment 47792 [details] Beagle Logfile Memory usage rises mainly during the "INFO: Scanning Weblogs / INFO: Found 714 items in 40 weblogs in ,10s" loop, e.g. 11.45.29.95 (these 15 seconds took about 200 MB memory)
inotify support, or no?
also, when you are running beagle, are you running/using Blam? All those rescans imply that the ~/.gnome2/blam/collection.xml file is being constantly changed.
Ok, this is the situation, the Blam backend is a total bunch of crap. Since Blam uses a singe file for all feeds, thus triggering events every time any feed is updated, we reindex all of the feeds every time anything happens. We should probably either keep track of when we've indexed certain feeds, or move to a IndexingService based solution inside Blam.
It would also be very good to move to an IndexableGenerator-based backend.
I just checked in code which switches it to IndexableGenerator, which will keep it from beating on the scheduler quite so much, but I'm not sure it'll lower the memory usage. We probably need to be smarter about what we index. Nico, how many channels and items do you have in your ~/.gnome2/blam/collection.xml file? Can you try the code in CVS to see if it helps things at all?
Beagle's method of accessing the collection.xml file isn't great; it could create a huge amount of memory usage if your collection.xml file is huge, but it's basically exactly the same as Blam itself. Does Blam use up a tremendous amount of memory as well?
I can confirm this bug, I have the same problems. My collection.xml is 1.3M and this is blam's memory usage: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10203 martin 16 0 117m 40m 22m S 0.0 4.0 0:11.36 mono Which is rather sane ... I got about 30 something feeds within blam. I also experience that beagles memory consumption does not get that high immediately (probably not before indexing the feeds?), but once it is up at 500M it stays there, making the system more or less unusable.
This is beagles memory usage on indexing. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11688 martin 15 0 563m 486m 10m S 0.0 48.0 0:42.58 mono I noticed that beagle-info --status shows roughly about 1000 single items, one for each entry in each feed. Is that intended?
The 1000 single items should be fixed in CVS as of about a week ago. But it is not in the 0.0.11.1 release. It would be good if you can compare the memory usage between the release and CVS.
What is the current status of the problem ? Did anybody try the comparison Joe suggested (I have small xml files so I am not getting useful results). On a different note, I wrote a stream based reader/parser for Akregator (which has similar xml file format as Blam). Technically with the new reader, beagled should take less memory (which doesnt show up on my system since I have so few number of feeds). If anybody can post a blam collection.xml (please post a small one) file here, I can port BlamQueryable to use the new stream based parser. If nothing, this will fix two "FIXME" in blam and liferea backend :-). Finally, attaching the new AkregatorQ files (bulk of the work happens in the indexable generator).
Created attachment 52198 [details] [review] + beagled/AkregatorQueryable/AkregatorQueryable.cs
Created attachment 52199 [details] [review] + beagled/AkregatorQueryable/FeedIndexableGenerator.cs
Created attachment 52295 [details] [review] Modified LifereaQuerayble.cs Found some liferea feeds on my computer - so here is the modified lifereaqueryable with stream parsing. Technically this one shouldnt create tons of objects while indexing.
Created attachment 52867 [details] blam 1.8 feeds (~251) its a blam 1.8 xml file (from .gnome2/blam) it has around 251 items in it. (dBera requested).
edit: beagle is now reporting it as having 503 items
sham, the attached file has 254 items in it (check using grep). if beagle is reporting anything more or less, then its time for another bug!
sham, the attached file has exactly 254 items in it (check using grep "<Item" /path/to/collectiom.xml). if beagled is reporting anything more or less, its time to file another bug :)
Created attachment 52919 [details] [review] patch for BlamQueryable.cs to use stream parser Using stream parsing with the collection.xml from sham didnt give me any benefit (~400k reduction on memory usage).
Stream parsing patch in CVS. Though stream parsing is being used, beagled nevertheless parses the whole file whenever something changes. The behaviour can be somewhat improved by using a cache file (something similar to evo-mail backend) to keep track of what items have not changed since last parsing.
I'm going to close this as FIXED.