I read some papers about state-of-the-art semantic segmentation models and in all of them, authors use for comparison F1-score metric, but they did not write wh