As far as I know, the most common approach to train neural networks is to minimize the KL divergence between the data distribution and the output of the model distribution which