How inverting the dropout compensates the effect of dropout and keeps expected values unchanged?

南笙酒味 提交于 2020-05-16 04:42:25

问题


I'm learning regularization in Neural networks from deeplearning.ai course. Here in dropout regularization, the professor says that if dropout is applied, the calculated activation values will be smaller then when the dropout is not applied (while testing). So we need to scale the activations in order to keep the testing phase simpler.

I understood this fact, but I don't understand how scaling is done. Here is a code sample which is used to implement inverted dropout.

keep_prob = 0.8   # 0 <= keep_prob <= 1
l = 3  # this code is only for layer 3
# the generated number that are less than 0.8 will be dropped. 80% stay, 20% dropped
d3 = np.random.rand(a[l].shape[0], a[l].shape[1]) < keep_prob

a3 = np.multiply(a3,d3)   # keep only the values in d3

# increase a3 to not reduce the expected value of output
# (ensures that the expected value of a3 remains the same) - to solve the scaling problem
a3 = a3 / keep_prob  

In the above code, why the activations are divided by 0.8 or the probability of keeping a node in a layer (keep_prob)? Any numerical example will help.


回答1:


I got the answer by myself after spending some time understanding the inverted dropout. Here is the intuition:

We are preserving the neurons in any layer with the probability keep_prob. Let's say kepp_prob = 0.6. This means to shut down 40% of the neurons in any layer. If the original output of the layer before shutting down 40% of neurons was x, then after applying 40% dropout, it'll be reduced by 0.4 * x. So now it will be x - 0.4x = 0.6x.

To maintain the original output (expected value), we need to divide the output by keep_prob (or 0.6 here).



来源:https://stackoverflow.com/questions/57193633/how-inverting-the-dropout-compensates-the-effect-of-dropout-and-keeps-expected-v

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!