Why is softmax function necessory? Why not simple normalization?
问题 I am not familiar with deep learning so this might be a beginner question. In my understanding, softmax function in Multi Layer Perceptrons is in charge of normalization and distributing probability for each class. If so, why don't we use the simple normalization? Let's say, we get a vector x = (10 3 2 1) applying softmax, output will be y = (0.9986 0.0009 0.0003 0.0001) . Applying simple normalization (dividing each elements by the sum(16) ) output will be y = (0.625 0.1875 0.125 0.166) . It