GNOME Bugzilla – Bug 321568
HEAD fails to compile with CHM support
Last modified: 2006-06-15 21:30:25 UTC
Version details: HEAD Distribution/Version: Fedora Core Rawhide Get this when trying to compile CVS HEAD with CHM support. ./FilterChm.cs(57,21): error CS0103: The name `WalkChildNodesForText' does not exist in the context of `Beagle.Filters.FilterChm' ./FilterChm.cs(160,5): error CS0103: The name `WalkNodes' does not exist in the context of `Beagle.Filters.FilterChm' I'm running chmlib-0.37.4-1 Compiled fine when I removed chmlib.
Oops. FilterChm needs to be changed due to the recent changes in FilterHtml. Didnt knew that there was a customer of FilterHtml :P.
Created attachment 65742 [details] [review] FilterChm.cs ChmFile.cs update
ooops :P I've updated the FilterChm and ChmFile classes (see above the patch). before this update FilterChm only indexed the toc file and the default page. Now it filters all the html files inside an chm file pretty fast thanks to the improvements made by D Bera to the HtmlDocument in the HtmlAgilityPack. I also update the glue to libchm which was obsolete due to changes in the libchm API.
diff -Naur beagle/Filters/HtmlAgilityPack/HtmlDocument.cs beagle-mike/Filters/HtmlAgilityPack/HtmlDocument.cs --- beagle/Filters/HtmlAgilityPack/HtmlDocument.cs 2006-04-26 20:03:07.000000000 -0500 +++ beagle-mike/Filters/HtmlAgilityPack/HtmlDocument.cs 2006-05-17 19:55:24.000000000 -0500 @@ -1928,7 +1928,7 @@ _lastparentnode.AppendChild(_currentnode); } - ReadDocumentEncoding(_currentnode); + //ReadDocumentEncoding(_currentnode); Why don't we want to call the ReadDocumentEncoding ()?
jeje I forgot to mention that! Well, It's because it raises a NullPointer Exception. I don't know why, but it's related to the modifications made by D Bera to HtmlDocument. I've tested the filter with this line commented out and it work properly. The FilterHtml don't use this interfece to parse the Html, use another overloaded function, so It wont affect the Html Filter. As a note, when I tried to detect the encoding by hand it also raised the same exception, so i commented out that line. I think HtmlDocument will use the default ASCCI encoding. I think the only one who can clarfify this to us is D Bera.
From what I see on the CHM files I've tried, leaving the ReadDocumentEncoding() line causes an EncodingFoundException to be thrown. In other places in the HtmlAgilityPack code, that exception is caught and then the file is reparsed with the detected encoding. In looking through the code, I'd suggest that the function passed into ChmFile.ParseContents() not take a TextReader, but instead take a string, since you are just converting a string into a StringReader anyway. That way, you can properly catch the EncodingFoundException and re-call doc.Load() with the given encoding. (Because you can't pass in an encoding to doc.Load(TextReader), bleh) Once that's fixed, I think it's probably fine to go in.
Sorry for the long silence. I was out of town. The last time I checked, I was unable to use FilterChm due to some library problems. But if the exception is due to EncodingFoundException, then it should be fixed by the commit I made just now. That should also fix the occassional EncodingFoundExceptions I saw in log files but didnt know where it was coming from. Now you can either use your Load(TextReader) code or use the suggestion by Joe.
Well, Joe, what should I do? Should modify the source or it's fine the way it is?
If my commits removed the crashes/exceptions (which should be the case but I was not able to test), then I'd suggest using the current code. All the logic for detecting encoding etc. is (supposed to be) handled correctly by Load(TextReader). The logic for detecting encoding separately and then calling Load() with the encoding is used in Html filter for different reasons; it shouldnt be needed for Chm files.
This looks good to check in sans the HtmlDocument.cs change.
Checked this in without the HtmlDocument.cs change. Thanks!