I\'m writing a neural-network classifier in TensorFlow/Python for the notMNIST dataset. I\'ve implemented l2 regularization and dropout on the hidden layers. It works fine as
I had the same problem and reducing the batch size and learning rate worked for me.