We all knows that vanishing gradient problem occurs when we are using deep neural network with sigmoid and if we use relu , it solves this problem but it creates dead neuron