Machine Learning Training & Test data split method

故事扮演 提交于 2019-12-06 22:37:34

When you have unequal number of data points in each classes in training set, the baseline (random prediction) changes.

By noisy data, I think you want to mean that number of training points for class 1 is more than other. This is not really called noise. It is actually bias.

For ex: You have 10000 data point in training set, 8000 of class 1 and 2000 of class 0. I can predict class 0 all the time and get 80% accuracy already. This induces a bias and baseline for 0-1 classification will not be 50%.

To remove this bias either you can intentionally balance the training set as you did or you can change the error function by giving weight inversely proportional to number of points in training set.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!