With regards to the thread topic, could anyone help to advise what is wrong with the loss_function() computation logic (especially policy_output_discrete and
policy_output_discrete