Efficiently calculating boundary-adapted neighbourhood average

问题

I have an image with values ranging from 0 to 1. What I like to do is simple averaging.
But, more specifically, for a cell at the border of the image I'd like to compute the average of the pixels for that part of the neighbourhood/kernel that lies within the extent of the image. In fact this boils down to adapt the denominator of the 'mean formula', the number of pixels you divide the sum by.

I managed to do this as shown below with scipy.ndimage.generic_filter, but this is far from time-efficient.

def fnc(buffer, count):
    n = float(sum(buffer < 2.0))
    sum = sum(buffer) - ((count - b) * 2.0)
    return (sum / n)

avg = scipy.ndimage.generic_filter(image, fnc, footprint = kernel, \
                                   mode = 'constant', cval = 2.0,   \
                                   extra_keywords = {'count': countkernel})

Details

kernel = square array (circle represented by ones)
Padding with 2's and not by zeroes since then I could not properly separate zeroes of the padded area and zeroes of the actual raster
countkernel = number of ones in the kernel
n = number of cells that lie within image by excluding the cells of the padded area identified by values of 2
Correct the sum by subtracting (number of padded cells * 2.0) from the original neighbourhood total sum

Update(s)

1) Padding with NaNs increases the calculation with about 30%:

    def fnc(buffer):
        return (numpy.nansum(buffer) / numpy.sum([~numpy.isnan(buffer)]))

    avg = scipy.ndimage.generic_filter(image, fnc, footprint = kernel, \
                                       mode = 'constant', cval = float(numpy.nan)

2) Applying the solution proposed by Yves Daoust (accepted answer), definitely reduces the processing time to a minimum:

    def fnc(buffer):
        return numpy.sum(buffer)

    sumbigimage = scipy.ndimage.generic_filter(image, fnc, \
                                               footprint = kernel, \
                                               mode = 'constant', \
                                               cval = 0.0)
    summask     = scipy.ndimage.generic_filter(mask, fnc, \
                                               footprint = kernel, \
                                               mode = 'constant', \
                                               cval = 0.0)
    avg = sumbigimage / summask

3) Building on Yves' tip to use an additional binary image, which in fact is applying a mask, I stumbled upon the principle of masked arrays. As such only one array has to be processed because a masked array 'blends' the image and mask arrays together.
A small detail about the mask array: instead of filling the inner part (extent of original image) with 1's and filling the outer part (border) with 0's as done in the previous update, you must do vice versa. A 1 in a masked array means 'invalid', a 0 means 'valid'.
This code is even 50% faster then the code supplied in update 2):

    maskedimg = numpy.ma.masked_array(imgarray, mask = maskarray)

    def fnc(buffer):
        return numpy.mean(buffer)

    avg = scipy.ndimage.generic_filter(maskedimg, fnc, footprint = kernel, \
                                       mode = 'constant', cval = 0.0)

--> I must correct myself here!
I must be mistaken during the validation, since after some calculation runs it seemed that scipy.ndimage.<filters> cannot handle masked_arrays in that sense that during the filter operation the mask is not taken into account.
Some other people mentioned this too, like here and here.

The power of an image...

grey: extent of image to be processed
white: padded area (in my case filled with 2.0's)
red shades: extent of kernel
- dark red: effective neighbourhoud
- light red: part of neighbourhood to be ignored

How can this rather pragmatical piece of code be changed to improve performance of the calculation?

Many thanks in advance!

回答1:

Unsure if this will help, as I am not proficient in scipy: use an auxiliary image of 1's in the gray area and 0's in the white area (0's too in the source image). Then apply the filter to both images with a simple sum.

There is some hope of a speedup if scipy provides a specialized version of the filter with a built-in function for summing.

This done, you will need to divide both images pixel by pixel.

回答2:

I'm not sure how efficient this is, but I'm using a simpler formulation with nan's that handles both borders and masks.

No mask case:

avg = scipy.ndimage.generic_filter(image, np.nanmean, mode='constant', cval=np.nan, footprint=kernel)

Mask case:

masked_image = np.where(mask, image, np.nan)
avg = scipy.ndimage.generic_filter(masked_image, np.nanmean, mode='constant', cval=np.nan, footprint=kernel)

You can use all numpy the nan functions.

来源：https://stackoverflow.com/questions/10683596/efficiently-calculating-boundary-adapted-neighbourhood-average

标签

python

performance

image-processing

filter

boundary