Keras Conv2D and input channels

前端未结

关注

 3  1543

The Keras layer documentation specifies the input and output sizes for convolutional layers: https://keras.io/layers/convolutional/

Input shape: (samples, channels


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  悲哀的现实        
                
              
                            
                2021-01-30 17:00
              
            
            
                                                                       
I was also wondering this, and found another answer here, where it is stated (emphasis mine):


  Maybe the most tangible example of a multi-channel input is when you have a color image which has 3 RGB channels. Let's get it to a convolution layer with 3 input channels and 1 output channel. (...) What it does is that it calculates the convolution of each filter with its corresponding input channel (...). The stride of all channels are the same, so they output matrices with the same size. Now, it sums up all matrices and output a single matrix which is the only channel at the output of the convolution layer.


Illustration:



Notice that the weights of the convolution kernels for each channel are different, which are then iteratively adjusted in the back-propagation steps by e.g. gradient decent based algorithms such as stochastic gradient descent (SDG).

Here is a more technical answer from TensorFlow API.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不思量自难忘°        
                
              
                            
                2021-01-30 17:12
              
            
            
                                                                       
I also needed to convince myself so I ran a simple example with a 3×3 RGB image.

# red    # green        # blue
1 1 1    100 100 100    10000 10000 10000
1 1 1    100 100 100    10000 10000 10000    
1 1 1    100 100 100    10000 10000 10000


The filter is initialised to ones:

1 1
1 1




I have also set the convolution to have these properties:


no padding
strides = 1
relu activation function
bias initialised to 0


We would expect the (aggregated) output to be:

40404 40404
40404 40404


Also, from the picture above, the no. of parameters is 

3 separate filters (one for each channel) × 4 weights + 1 (bias, not shown) = 13 parameters



Here's the code.

Import modules:

import numpy as np
from keras.layers import Input, Conv2D
from keras.models import Model


Create the red, green and blue channels:

red   = np.array([1]*9).reshape((3,3))
green = np.array([100]*9).reshape((3,3))
blue  = np.array([10000]*9).reshape((3,3))


Stack the channels to form an RGB image:

img = np.stack([red, green, blue], axis=-1)
img = np.expand_dims(img, axis=0)


Create a model that just does a Conv2D convolution:

inputs = Input((3,3,3))
conv = Conv2D(filters=1, 
              strides=1, 
              padding='valid', 
              activation='relu',
              kernel_size=2, 
              kernel_initializer='ones', 
              bias_initializer='zeros', )(inputs)
model = Model(inputs,conv)


Input the image in the model:

model.predict(img)
# array([[[[40404.],
#          [40404.]],

#         [[40404.],
#          [40404.]]]], dtype=float32)


Run a summary to get the number of params:

model.summary()



                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤城傲影        
                
              
                            
                2021-01-30 17:14
              
            
            
                                                                       
It might be confusing that it is called Conv2D layer (it was to me, which is why I came looking for this answer), because as Nilesh Birari commented:


  I guess you are missing it's 3D kernel [width, height, depth]. So the result is summation across channels.


Perhaps the 2D stems from the fact that the kernel only slides along two dimensions, the third dimension is fixed and determined by the number of input channels (the input depth).

For a more elaborate explanation, read https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/

I plucked an illustrative image from there:


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复