Different 2D convolution results between keras and scipy

后端 未结 2 2002
栀梦
栀梦 2021-01-22 21:04

I found some results difficult to understand when trying to debug my neural network. I tried to do some computations offline using scipy (1.3.0), and I am not havin

2条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-22 21:22

    I don't know for certain without reading the source code for these two libraries, but there is more than one straightforward way to write a convolution algorithm, and evidently these two libraries implement it in different ways.

    One way is to "paint" the kernel onto the output, for each pixel of the image:

    from itertools import product
    
    def convolve_paint(img, ker):
        img_w, img_h = len(img[0]), len(img)
        ker_w, ker_h = len(ker[0]), len(ker)
        out_w, out_h = img_w + ker_w - 1, img_h + ker_h - 1
        out = [[0]*out_w for i in range(out_h)]
        for x,y in product(range(img_w), range(img_h)):
            for dx,dy in product(range(ker_w), range(ker_h)):
                out[y+dy][x+dx] += img[y][x] * ker[dy][dx]
        return out
    

    Another way is to "sum" the contributing amounts at each pixel in the output:

    def convolve_sum(img, ker):
        img_w, img_h = len(img[0]), len(img)
        ker_w, ker_h = len(ker[0]), len(ker)
        out_w, out_h = img_w + ker_w - 1, img_h + ker_h - 1
        out = [[0]*out_w for i in range(out_h)]
        for x,y in product(range(out_w), range(out_h)):
            for dx,dy in product(range(ker_w), range(ker_h)):
                if 0 <= y-dy < img_h and 0 <= x-dx < img_w:
                    out[y][x] += img[y-dy][x-dx] * ker[dy][dx]
        return out
    

    These two functions produce the same output. However, notice that the second one has y-dy and x-dx instead of y+dy and x+dx. If the second algorithm is written with + instead of -, as might seem natural, then the results will be as if the kernel is rotated by 180 degrees, which is as you've observed.

    It's unlikely that either library uses such a simple algorithm to do convolution. For larger images and kernels it's more efficient to use a Fourier transform, applying the convolution theorem. But the difference between the two libraries is likely to be caused by something similar to this.

提交回复
热议问题