I am attempting to adapt TensorFlow\'s transformer tutorial to work on multiple GPUs using there distributed training tutorial,
Transformer: https://www.tensorflow.org/tu