How is a convolution calculated on an image with three (RGB) channels?

后端 未结 4 447
独厮守ぢ
独厮守ぢ 2020-12-09 09:31

Say we have a single channel image (5x5)

A = [ 1 2 3 4 5
      6 7 8 9 2
      1 4 5 6 3
      4 5 6 7 4
      3 4 5 6 2 ]

And a filter K (

相关标签:
4条回答
  • 2020-12-09 09:49

    For RGB-like inputs, the filter is actually 2*2*3, each filter corresponse to one color channel, resulting three filter response. These three add up to one flowing by bias and activation. finally, this is one pixel in the output map.

    0 讨论(0)
  • 2020-12-09 09:52

    They will be just the same as how you do with a single channel image, except that you will get three matrices instead of one. This is a lecture note about CNN fundamentals, which I think might be helpful for you.

    0 讨论(0)
  • 2020-12-09 09:59

    Lets say we have a 3 Channel (RGB) image given by some matrix A

    
        A = [[[198 218 227]
              [196 216 225]
              [196 214 224]
              ...
              ...
              [185 201 217]
              [176 192 208]
              [162 178 194]]
    
    

    and a blur kernal as

    
        K = [[0.1111, 0.1111, 0.1111],
             [0.1111, 0.1111, 0.1111],
             [0.1111, 0.1111, 0.1111]]
    
        #which is actually 0.111 ~= 1/9
    
    

    The convolution can be represented as shown in the image below

    As you can see in the image, each channel is individually convoluted and then combined to form a pixel.

    0 讨论(0)
  • 2020-12-09 09:59

    If you're trying to implement a Conv2d on an RGB image this implementation in pytorch should help.

    Grab an image and make it a numpy ndarray of uint8 (note that imshow needs uint8 to be values between 0-255 whilst floats should be between 0-1):

    link = 'https://oldmooresalmanac.com/wp-content/uploads/2017/11/cow-2896329_960_720-Copy-476x459.jpg'
    
    r = requests.get(link, timeout=7)
    im = Image.open(BytesIO(r.content))
    pic = np.array(im)
    

    You can view it with

    f, axarr = plt.subplots()
    axarr.imshow(pic)
    plt.show()
    

    Create your convolution layer (initiates with random weights)

    conv_layer = nn.Conv2d(in_channels=3, 
               out_channels=3,kernel_size=3, 
               stride=1, bias=None)
    

    Convert input image to float and add an empty dimension because that is the input pytorch expects

    pic_float = np.float32(pic)
    pic_float = np.expand_dims(pic_float,axis=0)
    

    Run the image through the convolution layer (permute changes around the dimension location so they match what pytorch is expecting)

    out = conv_layer(torch.tensor(pic_float).permute(0,3,1,2))
    

    Remove the extra first dim we added (not needed for visualization), detach from GPU and convert to numpy ndarray

    out = out.permute(0,2,3,1).detach().numpy()[0, :, :, :]
    

    Visualise the output (with a cast to uint8 which is what we started with)

    f, axarr = plt.subplots()
    axarr.imshow(np.uint8(out))
    plt.show()
    

    You can then change the weights of the filters by accessing them. For example:

    kernel = torch.Tensor([[[[0.01, 0.02, 0.01],
                         [0.02, 0.04, 0.02],
                         [0.01, 0.02, 0.01]]]])
    
    kernel = kernel.repeat(3, 3, 1, 1)
    conv_layer.weight.data = kernel
    
    0 讨论(0)
提交回复
热议问题