Why input is scaled in tf.nn.dropout in tensorflow?

前端未结
关注
 4  1958
自闭症患者 2021-01-30 13:31
I can\'t understand why dropout works like this in tensorflow. The blog of CS231n says that, \"dropout is implemented by only keeping a neuron active with some probability

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   一向
                                             
                
                
                (楼主)
            
              
              
                2021-01-30 13:51
              

            
            
                        
Here is a quick experiment to disperse any remaining confusion. 

Statistically the weights of a NN-layer follow a distribution that is usually close to normal (but not necessarily), but even in the case when trying to sample a perfect normal distribution in practice, there are always computational errors.

Then consider the following experiment:

DIM = 1_000_000                      # set our dims for weights and input
x = np.ones((DIM,1))                 # our input vector
#x = np.random.rand(DIM,1)*2-1.0     # or could also be a more realistic normalized input

probs = [1.0, 0.7, 0.5, 0.3]         # define dropout probs

W = np.random.normal(size=(DIM,1))   # sample normally distributed weights
print("W-mean = ", W.mean())         # note the mean is not perfect --> sampling error!

# DO THE DRILL
h = defaultdict(list)
for i in range(1000):
  for p in probs:
    M = np.random.rand(DIM,1)
    M = (M < p).astype(int)
    Wp = W * M
    a = np.dot(Wp.T, x)
    h[str(p)].append(a)

for k,v in h.items():
  print("For drop-out prob %r the average linear activation is %r (unscaled) and %r (scaled)" % (k, np.mean(v), np.mean(v)/float(k)))


Sample output:

x-mean =  1.0
W-mean =  -0.001003985674840264
For drop-out prob '1.0' the average linear activation is -1003.985674840258 (unscaled) and -1003.985674840258 (scaled)
For drop-out prob '0.7' the average linear activation is -700.6128015029908 (unscaled) and -1000.8754307185584 (scaled)
For drop-out prob '0.5' the average linear activation is -512.1602655283492 (unscaled) and -1024.3205310566984 (scaled)
For drop-out prob '0.3' the average linear activation is -303.21194422742315 (unscaled) and -1010.7064807580772 (scaled)


Notice that the unscaled activations diminish due to the statistically imperfect normal distribution.

Can you spot an obvious correlation between the W-mean and the average linear activation means?
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复