Converting to Python scalars

后端未结

关注

 0  1294

I am implementing a SARSA reinforcement learning function which chooses an action following the same current policy updates its Q-values.

This throws me the following