I have implemented an advantage actor critic (a2c) algorithm to try and play some game I made, but I am not exactly sure how to implement entropy loss to my cost function. I