I\'m working with transformer based model, but unable to formulate this learning rate scheduler with PyTorch
lrate = d_