Based on pytorch-a2c-ppo-acktr-gail and tf-a2c-ppo I based my implementation of PPO in tensorflow. The A2C and PPO share the same model which converges perfectly fine for A2C.