I wanted to get masked word predictions for a few bert-base models. I am converting the pytorch models to the original bert tf format using this by modifying the code to loa