I tried to write a custom implementation of basic neural network with two hidden layers on MNIST dataset using *TensorFlow 2.0 beta* but I\'m not sure what went wrong here but m
I tried to write a custom implementation of basic neural network with two hidden layers on MNIST dataset using tensorflow 2.0 beta but I'm not sure what went wrong here but my training loss and accuracy seems to stuck at 1.5 and around85's respectively.
Where is the training part? Training of TF 2.0 models either Keras' syntax or Eager execution with tf.GradientTape()
. Can you paste the code with conv and dense layers, and how you trained it?
Other questions:
1) How to add a Dropout layer in this custom implementation? i.e (making it work for both train and test time)
You can add a Dropout() layer with:
from tensorflow.keras.layers import Dropout
And then you insert it into a Sequential() model just with:
Dropout(dprob) # where dprob = dropout probability
2) How to add Batch Normalization in this code?
Same as before, with:
from tensorflow.keras.layers import BatchNormalization
The choise of where to put batchnorm in the model, well, that's up to you. There is no rule of thumb, I suggest you to make experiments. With ML it's always a trial and error process.
3) How can I use callbacks in this code? i.e (making use of EarlyStopping and ModelCheckpoint callbacks)
If you are training using Keras' syntax, you can simply use that. Please check this very thorough tutorial on how to use it. It just takes few lines of code. If you are running a model in Eager execution, you have to implement these techniques yourself, with your own code. It's more complex, but it also gives you more freedom in the implementation.
4) Is there anything else in the code that I can optimize further in this code? i.e (making use of tensorflow 2.x @tf.function decorator etc.)
It depends. If you are using Keras syntax, I don't think you need to add more to it. In case you are training the model in Eager execution, then I'd suggest you to use the @tf.function
decorator on some function to speed up a bit.
You can see a practical TF 2.0 example on how to use the decorator in this Notebook.
Other than this, I suggest you to play with regularization techniques such as weights initializations, L1-L2 loss, etc.
5) Also I need a way to extract all my final weights for all layers after training so I can plot them and check their distributions. To check issues like gradient vanishing or exploding.
Once the model is trained, you can extract its weights with:
weights = model.get_weights()
or:
weights = model.trainable_weights
If you want to keep only trainable ones.
6) I also want help in writing this code in a more generalized way so I can easily implement other networks like convolutional network (i.e Conv, MaxPool etc.) based on this code easily.
You can pack all your code into a function, then . At the end of this Notebook I did something like this (it's for a feed-forward NN, which is much more simple, but that's a start and you can change the code according to your needs).
UPDATE:
Please check my TensorFlow 2.0 implementaion of a CNN classifier. This might be a useful hint: it is trained on the Fashion MNIST dataset, which makes it very similar to your task.