I noticed that tensorflow\'s model.fit trains the model faster than training the model using custom training loop (by making use of gradient tape). How do I get higher train