I trained a model and now want to evaluate its performance on a test set. The test set is loaded as tf.data.TFRecordDataset object (from multiple TFRecords with mul
tf.data.TFRecordDataset