Measuring how a new sample contributes to the diversity of a dataset

问题

I am working with grayscale images dataset. Is there a way to determine a new grayscale image can contribute to the diversity of a greyscale images dataset? I would like to prevent the dataset of having too many similar samples.

回答1:

Well, what do you see when you look at it? If you have information about the images in this dataset, you yourself can probably assess whether this new sample is a repetition of some pattern that is already included in the dataset, or if it is something unique.

Another idea might be to compare the images analytically. Depending on the case, you may want to look at the individual pixel averages (each should be between 0 and 255) of your training set and compare it with the pixel values of this sample image. Similarly, other measures may also work.

What I would do is, if you have a model trained on your current dataset, to use the model to predict/classify the sample image, see how well it performs, and with what confidence it performs. This way, perhaps you can assess whether your model (and the dataset you trained it with) have something to learn from this new sample image.

来源：https://stackoverflow.com/questions/55365075/measuring-how-a-new-sample-contributes-to-the-diversity-of-a-dataset

标签

machine-learning

information-theory

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!