GNOME Bugzilla – Bug 94479
despeckle could use more efficient algorithm
Last modified: 2005-03-11 13:51:03 UTC
Despeckling uses a median filter, but the median is found by sorting all of the data values and then picking the middle one. It would probably be far faster to use a quicksort modified to select the kth largest element; this is O(N) rather than O(N log N). It is described in Knuth Sorting & Searching exercise 5.2.2-31 (of the new edition): essentially, do a quicksort, but after splitting the elements into two parts by comparing with the element a, if the location of a is <k, focus attention on the right half only, if it's >k, focus attention on the left half; if it's =k, we're done. The only disadvantage is that this will not give exactly the correct answer if there are an even number of points in the region; if there are many points, this should not be an issue, if there are few (say <=6), a simple sort could be used instead.
Changes at the request of Dave Neary on the developer mailing list. I am changing many of the bugzilla reports that have not specified a target milestone to Future milestone. Hope that is acceptable.
Before we decide to implement this (probably pretty trivial) it would be nice to know if this is really a bottleneck in the despeckle plug-in.
Is there a particular advantage in using the median instead of the mean of the sample here?
The median is insensitive to outliers, which are typical when despeckling. Imagine that we're getting rid of black dust on a pure white image, and we come across a 3*3 block like this: 255 255 255 0 255 255 0 255 255 The mean is 7*255 / 9 = 199, which gives a dirty grey. To see what I mean use the convolution filter with .11, .11, .11, etc for a 3x3 kernel. That means you're spreading the dust, rather than getting rid of it. The median of the kernel is pristine white, 255, though. Basically, with input data like this, a median filter ends up doing a similar effect to an erode filter and shrinks the dust.
Such a situation would be handled by changing the white and (in your case) black level accordingly. This filters out the "extreme" noise (black dust points etc.) - indeed, this many times suffices.
Sorry about the ambiguity ("...would be handled...") - what I meant was: The "extreme" values (below black level or above white level) are - even in the current version of the plug-in - simply ignored when computing the median anyway (with default values, dust is not spread in the example you give, everything remains unchanged). It therefore seems reasonable to me to use the mean of the remaining pixels instead of the median.
I don't think mean and median are interchangable here. Median is the basic reason for existence and median filter is the synonim for despeckle operation. See bug #72862
With the changes that have been applied recently (see bug #72862), I think we can close this report as FIXED. Unless someone can show that the sort algorithm could be done significantly faster. In that case, please reopen. But make sure you have read and understood http://ndevilla.free.fr/median/median.pdf, a document that describes the currently used algorithm and compares it with other approaches.