In the Reinforcement learning algorithm, I need to build and train an NN with multiple outputs, where each output is a probability vector of dimension 7. I used categorical